Knowledgebase

How big boys host billions of uploaded files?

Posted by ServerDiscovery, 01-30-2017, 01:26 AM
Hi i'm a web developer and i used linux dedicated servers for years... since 10-11 years ago. all web sites/web apps which i developed at the past was small businesses and i never needed to host more then 5-10 TB of data. but recently i started a mobile startup, which is a social media platform, and suddenly it have getting popular (at least in compare with my previous works, this ones). in november 2016, users uploaded around 10TB of images and videos. and a day I noticed mysql is not work any more, and after investigation i figured out the server storage is full and that is why mysql is no functional anymore. so i rented two other linux dedicated servers each one with 20TB of HDD, and mounted them to the main server with NFS mount. after two months, all 3 servers storage is reached quickly! because of too many images/ video clips is uploading by users. now I would like to know how big players like "instagram", "whatsapp" or "telegram" is hosting the user files? do I have to rent a new server each month for just add another 20TB to my storage by mount another nfs mount? then if yes, after few months i have to pay for large number of dedicated servers... which seems impossible in compare with the app income. is there any cheaper way which big boys using it? how can a free app like "whatsapp" or "telegram" can host billions of "voice" or "image" or "video" files?

Posted by gagah, 01-30-2017, 01:33 AM
Instagram, Whatsapp, are owned by Facebook who managed their own data centers and designed their own servers, before that, they are backed by Silicon Valley VCs, million dollars of investment, and very high burn rate. Most of the services you mentioned operate at a very high loss for years (in twitter's case, they're still losing money), or sold data to 3rd parties to get money for free service. So in short, if your userbase is growing at a very fast pace, get a VC investment, or you'll be spending tens of thousands on infrastructure from pocket.

Posted by LeapWH, 01-30-2017, 08:00 AM
That's the way it is, adding servers to suit your needs. You may check OVH, they offer 12*6TB SAS drives server for a reasonable price

Posted by gnusys, 01-30-2017, 08:03 AM
You should change your code and make way for storage like s3 . every big cloud provider has a s3 like file storage the use of which exactly match your requirement.

Posted by user54321, 01-30-2017, 10:16 AM
the big players have lots of servers like this https://www.youtube.com/watch?v=MyK7ZF-svMk

Posted by steven99, 01-30-2017, 12:23 PM
ZFS and a huge SAN array for the win.

Posted by ibee, 01-30-2017, 01:03 PM
they use load balancing and cloud architecture to store data separately from web files and DB.

Posted by UNIXy, 01-30-2017, 01:19 PM
You've described yourself as a technical guy/gal and you seem to have that part of the project well covered so far. But you need to work on your business acumen, which is what you need to keep your app afloat. The good news is you seem to have good traction (it solves almost all problems). But not enough revenue to cover the costs. A growing social media platform could get some investment funding from investors. Go to websites like angellist and crunchbase to meet investors that can help you grow your startup. That's how the big boys dit it.

Posted by SneakySysadmin, 01-30-2017, 03:00 PM
It's not like they're hiding this information. Start Here: http://highscalability.com/youtube-architecture ... and of course, follow some of those links as well.

Posted by OPNodes, 01-31-2017, 04:50 PM
If storage is the only bottleneck, perhaps you can build your own storage server? For example, the backblaze storagepod can hold 480TB. https://www.backblaze.com/blog/open-...torage-server/

Posted by NortheBridge, 01-31-2017, 08:05 PM
I'm not sure if Instagram is still using AWS' S3 but up to the point Facebook bought Instagram and then some point afterwards, Instagram use S3 for storage. S3 is actually really good for static storage transactions and really inexpensive if you know how to manage your storage buckets correctly. When some of these apps were created it was an entirely different internet. Someone linked to something very interesting about YouTube - "stall for time." If your app is growing at an exponential rate that your revenue can't keep up to sustain an infrastructure first I want to congratulate you on being a great startup and second I want to advise you that this can end in one of two ways: it all goes down or you buy yourself enough time for revenue to catch up. Your stalling for time is essentially buying more Linux boxes and mounting their arrays with NFS. Eventually, you must work out a real storage strategy and there are many options. Continually buying and mounting drives of new servers isn't something that is naturally feasible and it probably is just throwing money out the door. Look at your storage needs, anticipate what you expect growth to be (and add 20% - the 20% rule), and formulate a real storage plan. S3 is a prebuilt option that has been proven to work. ZFS on massive arrays is another option. And, as you stall for time, find some investors to make up the difference in revenue and the cost of storage. The first couple years a business tends to run in the red. For social media platforms, most fail because they can never run in the black. It's a problem inherent to social media platforms.

Posted by lTyl, 02-01-2017, 03:40 PM
Are your users hosting a lot of duplicate files? If they are, you can cut down on the amount of files you need to store so you are only storing a single instance of each copy (Not factoring in backups of course). Maintain a server (Or collection of servers) which contains an index of hash keys. Everytime a user uploads a file, hash this file and send the resulting string to your hashing index for a comparison. If this is a new file, store it as normal. If you already have a file in storage that matches the hash, then return the url to the file you have in storage and drop the new file upload. Has some high overhead for set-up, but could cut down on your storage costs if users keep uploading the same files repeatedly. Of course, this a very simple description as to how it will work. Implementation is the hard part .



Was this answer helpful?

Add to Favourites Add to Favourites

Print this Article Print this Article

Also Read


Language:

Client Login

Email

Password

Remember Me

Search