-
Notifications
You must be signed in to change notification settings - Fork 0
How to design Youtube
Explorer edited this page Jul 27, 2018
·
15 revisions
Gainlo link: http://blog.gainlo.co/index.php/2016/10/22/design-youtube-part/
Features:
- Upload video - Y
- Download video- N
- Stream video - Y
- Like/Dislike video - Y
- Comment on video - Y
- Subscribe to a user - N
- Recommendation of videos - Y (Basic recommendation system)
- Security - N
- View Count of video - Y
Estimation:
Disk Storage Needed:
- Video Size
- Average video size - 100MB
- Number of users - 2 Billion
- Average Uploads per day(Assuming 10% of the users upload every day) - 200 Million
- Lets say the Max Upload limit for a video is - 1 GB. With these assumptions, the content size per day = (20010^6)(100*10^6) = 20 * 10^15 = 20 Peta Bytes Content size for 10 years = 20 PB * (365 * 10) = 73000 PB = 73 ExaByte
-
Image Size:
-
User Profile Size
100kb per user * 2 Billion users = (10010^3) * (210^9) = 200 * 10^12 = 200 TB.
- Number of videos streaming at the same time:
Lets say on an average 50% of the users are streaming at the same time.That is ~1Billion users are streaming at a given time.
Design Goals:
- Latency - This is critical.
- Consistency - If a few bytes are lost during streaming a video, this should be ok. As user can always go back and look at the video.
- Availability - This is critical, being able to stream a video anytime is critical for a service like youtube.
High Level design:
- APIs:
- UploadResult Upload(user_id,Content)
- FetchResult Fetch(user_id,video_id)
- LikeResult Like(user_id,video_id)
- SubscribeResult Subscribe(user_id,to_user_id)
struct UploadResult{ bool success; int failureReason; int video_id; };
struct FetchResult{ bool success; int failureReason; Content data; RecommendedContent recContent; };
struct RecommendedContent{ vector data; int video_id; }
- Message Flow for these APIs.
- From User to Application Server(Upload/Like/Subscribe) Client<-> LoadBalancer<->ApplicationServer<->Cache<->DB
- From Application Server to User(Fetch) Client<-> LoadBalancer<->ApplicationServer<->Cache<->DB
Deep Dive: Fault tolerance and Load distribution:
Database:
- Is Sharding Needed? Yes.
- NoSQL vs SQL? Pick NoSQL as data is distributed across several clusters.
- Highly Available Database Design. Consistency Hashing etc.
- Data storage for Images/videos? Use CDN's to reduce latency and increase availability. Videos are replicated by CDN and it reduces the hops needed to fetch the video for the end user. Popular Vs long tailed videos(videos that have just <20 views a day). CDNs can be used to replicate and store popular videos. CDN for all the long tailed videos can get expensive. long tailed videos, can be hosted directly by youtube.
Recommendation system: