- System design for Netflix with about more than 130 million subscribers and its presence in more than 250 plus countries is a company that handles a large category of movies and television content we users paid monthly rent to access this content and what that means to Netflix is that the user experience should be very smooth and enjoyable.
How Netflix Operates behind the scene?
Netflix operates in two clouds:
Both clouds must work together seamlessly to deliver endless hours of video for users. Netflix has three main components i.e.:
Now Java experts are going to mention some high-level working of Netflix and then jump right into all these three components in depth. Without the knowledge of high-level working if they go and write the components of system design for Netflix it will be pretty hard for you to understand. So, Java development team is going to mention a high-level overview of Netflix.
High Level Overview of Netflix System Design:
- First, let's understand what is a client? A client is any device from which you play the video on Netflix it could be your desktop it could be Android, or it could be iPhone it could be your Xbox, or anything like that.
- The Second thing is anything that doesn't involve video streaming is all handled in the AWS cloud that dedicated java developers already explained. Anything which involves streaming the video is completely handled by openConnect.
What is open openConnect?
- openConnect is Netflix owned CDN. In simple words, CDN is a contingent delivery network which is a network of distributive servers that are placed in different locations or different countries or a different place to serve the content much faster say. Say for example you are in India and you have a website that hosts the video in it, what if the user is requesting a video that is hosted on your website from the United States. That means the packets or the video should be traveling from the server in India to the US through C cables which are used for serving the internet.
- Now how do we solve this problem? To solve this problem what we need to do is let's have more servers placed in different countries. Say for example the main server which is an India is called the origin or original server and will have different cache servers that hold all the video copies in different countries. There may be multiple different edge servers that are placed in North America, South America and one in Russia and one in Europe and one somewhere in Indonesia. If a user is requesting the video which is on your website from the United States, the video will be served from the nearest server that's the edge server which is in the United States.
- This way the content will be delivered much faster and less bandwidth consumed between India and the United States. Now that you understand what CDN is? OpenConnect is Netflix owned CDN. What Netflix has done are they have placed a lot of servers in every country like thousands of servers in every country, so that if the user is requesting a video, then the video will be played from the very nearest server which is placed to that particular user.
What is the System Designing flow of Netflix:
It's time to understand the system design for Netflix. As I've told you earlier expect the openConnect, the rest of all of the components are situated in the AWS cloud. Just openConnect is the network of distributed servers that are maintained by Netflix.
Diagram of System Designing:
Now Java professional is going to mention individual components in the system design. The first one is the client.
Netflix supports a lot of different devices including smart TV, Android, iOS platforms big gaming consoles, etc. All these apps are written using platform-specific code. Netflix web app is written using react JS and react.js was influenced by several factors like the first one is startup speed, the second one is a runtime performance of react.js and the third one is modularity.
Elastic Load balancer:
- Let me mention elastic load balancers. Netflix uses Amazon's elastic load balancing service to route the traffic to different front-end services and these are instances that is an actual response. These are the instances or servers.
- Elastic load balance is set up in such a way that the Lord is balanced across the zones first and then the load is balanced across the instances and this scheme is called a Two-tire balancing scheme.
- So, the first tier that is this part consists of basic DNS-based round-robin load balancing. So, the request when it lands on this load balancing will first balance across these zones using round-robin.
- What are zones? Zones are a kind of logical grouping of servers there could be three different zones in the United States itself and one Zone in India. It’s a very logical way of grouping servers together and the second tier of elastic load-balanced service is an array of load balancers instances that does round-robin load balancing on the instances to distribute the requests across these instances.
- Say for example then the request comes in the first tier and distributes the load on different zones and the next tier will distribute to between these two instances.
How Netflix Onboard a Video?
- Now let me write about how Netflix on boards of video. Before a video, web series, or movie made available to the user, what Netflix does is it does do a lot of pre-processing and this pre-processing involves finding errors, converting the video into different resolutions and a different format. This process is called transcoding.
- Transcoding is a process that converts a video into a different format, and this will be optimized for a particular kind of device as you already know that Netflix supports many different kinds of devices or platforms. So, we have to convert the video into a different resolution to make the viewing experience much better.
- So, you might ask the question why we don’t just play the video as it is how we get from the production house. The problem is the original movie which we get say for example it will be of about many terabytes sometimes and sometimes it will be of about 50 GB and say a video of about one and a half hour movie will be like 50 GB and this will be pretty hard for Netflix to stream such a big file to every customer. It could be a space constraint; it could be bandwidth.
Adaptive bitrate streaming
- Netflix also creates files optimized for different network speeds. Say for example you are watching a movie on you know slow Network then you might see a movie that is played in very little resolution. If you're watching the same movie on a high-speed network, the movie might be in 4k resolution or 1080p resolution, or sometimes when the bandwidth is less, the movie suddenly changes the resolution. You might have observed like grainy kind of resolution to sometimes suddenly a high-definition resolution and this kind of switching is called adaptive bitrate streaming.
- To do that what Netflix has to do is it has to create multiple copies of the same movie in different resolutions approximately, Netflix creates about thousand two hundred different copies for a single movie just to do that and that's a lot of files to be processed then how Netflix does that? What Netflix does is uses a lot of different parallel workers to do that.
- So when they want to onboard a particular movie, they get the movie as a single file which about says 50 GB and then what they do is they break that moving to a lot of different pieces or chunks you can say and put it all into the queue and then these tasks or these individual tasks which are to be processed for each clip will be placed in the queue and then these tasks will be picked up by the different workers and they all process different chunks together and then they merge all this video or they place the different clips and upload the clips into Amazon s3.
- Now that our Amazon ec2 workers have converted the source movie to different copies of the movie of different resolutions and different formats. We have about 1200 copies of different files for the same movie. Now it's time to push all of these movies into openConnect distributed servers which are placed in different locations across the world.
- That means all these different copies will be pushed to every server in the openConnect Network. So, here what happens next? When user loads the next Netflix app on his mobile phone or smart TV or web app, what happens is all the requests like login, recommendation, homepage, search, billing or customer support etc.
- All these kind of different requests are handled by the instances which are in AWS cloud and the moment you found to find the video which you want to watch and hit the play button on the player, what happens is the application will figure out the best open connect server and the open connect server will start streaming the video to the client/user.
- The clients are so intelligent that even though when the openConnect server is streaming the video, these applications will be constantly checking for the best openConnect server which is available near to that particular application and switches dynamically based on the quality of the bandwidth to that server and load to the openConnect server.
- This is how the Netflix application gives the best viewing experience to the user without any obstruction or interruption while you are watching the video. With the information like whatever you searched, whatever you typed and your video viewing pattern, all this information will be saved in data centers in AWS and Netflix does create machine learning models using that data to understand the user choices better and to build the recommendation engine.
- Now let's learn about the next component called Zuul.
- Zuul is a gateway service that provides dynamic routing, monitoring resiliency and security. This of the whole service can also be used to do the connection management and proxying the requests. So, the main component over here is the Netty server-based proxy and this is where the requests will hit first and then this will be proxying the request to the inbound filters over here.
- The inbound filter run before proxying the requests and can be used for authentication, routing, or decorating the requests and the request goes next to the endpoint filter. The endpoint filters can be used to return the static response or to forward the request to back-end services. Once the backend service sends the response back, the inbound filter will transfer that responsibility to the outbound filters and the outbound filters run after seeing the response can be used for gzipping the content or to calculate the metrics, or to even add or remove the headers from the response.
- Once the response is written back to the Netty server and it will send back the response to the client. Now, what are the advantages of having a gateway service like this in the above diagram? See the advantages are many more. The first thing is you can share the traffic? Say for example you can have a web-tier rule somewhere that sends some traffic towards these servers and send some traffics to these kinds of servers of saying different versions. For example, you can share the traffic from having the rules set in the endpoint filter and also you can do some kind of load testing.
- Let’s say you have a new kind of server that is deployed in certain setups of the machine and you want to do a load testing on it, in that case also you can redirect a part of traffic to that particular set of services and then see that what is the load that particular service our server can take. The third one is you can test new services.
- As when you upgrade the services maybe you want to test how it behaves with the real-time API requests, instead of replacing it deploying it in deploying the new service on all of the servers what you can do is, you can deploy that particular new service or upgraded service on to one server and then you can redirect some parts of traffic some percentage of traffic to that new service and then test that service in real-time.
- Also, you can filter the bad requests. You can have custom rules set in endpoint to filter the responses based on certain conditions say a user agent of a specific kind then filter all the requests you can either have it in the endpoint filter or you can have it in the firewall.
- Let's see what Hystrix is. It is a latency and fault-tolerant library design to isolate the points of access to the remote system, services and third-party libraries. What that means is to say for example you have an endpoint “A” from which the request and response are delivered. So in micro-service architecture, this endpoint might be requesting a lot of different other microservices that are in different systems altogether.
- Say for example there can be multiple servers, and this could be a different server or this could be another third-party service called and there could be another machine this too could mean a different machine just. Because one particular call is slow, your whole endpoint might suffer a lot of latency i.e. the time to deliver the response might take more time. Or maybe one service is down and that's the reason why the errors will cascade. If one microservice call is causing some error, then this error will cascade back to the endpoint and this response could respond response with an error.
- So, these kinds of problems we can control using Hystrix which helps in stop cascading failures and also do real-time monitoring. What are the advantages of Hystrix? So here Java developers have listed out few advantages of using Hystrix. When you're configuring your Hystrix, Hystrix will be taking care of every microservices.
- What that means is you are decorating every micro service and how it helps is say for example you want to keep your quality of service to a particular endpoint to 1 second or two seconds at max which means that you should also set timeouts for every microservices. That means the first advantage is you can either gracefully kill the call to that particular microservice if the time taken to respond is greater than the present time or maybe if you are configured to give the default response then it will be given back.
- The second advantage is the thread pool for a particular microservices is full, it won't even try to accept the next request and keep waiting for that it'll just straight up and reject the call so that the other microservice handles there and continue forward or do whatever the mitigation it wants to do or it could fall back to the default response. In a similar case, the third point is it can do the same thing like not providing the default response or taking down the microservice if the error rate is greater than some percentage of errors and the fourth point is fall back to the default response as I've already mentioned.
- The fifth one is it will be much useful to collect the metrics to understand how these microservices are performing so Hystrix gathers the data about the latency, about the performance and everything and puts it in a dashboard so that you can understand what's the businesses is performing how better.
- Netflix uses microservice architecture to power all of its API needs to applications and perhaps the request calls to any endpoint go to another service and it keeps on happening. In the next advanced blog on Netflix system designing, Java programmers have mentioned advanced features like caching, Logging, Database design, etc. Please refer to all of our system designing blogs to understand more about a specific product.
Subject: Netflix System Designing
- How Zuul and Hystrix play a role in Netflix backend system
- What is the high-level flow of Netflix system designing?
- How Netflix onboard a new video into its server?
- What do you mean by ABS(Adaptive Bitrate Streaming)
- Explain about the Netflix backend system