YouTube Upload Processing has many intriguing queueing properties. This article will primarily look at the mpeg-to-swf conversion and study out the queueing properties of that process. You can also view all 40+ articles on Queueing Theory.
I’ll show the basic process of how to upload a video on Youtube, explain the Queueing mechanics that goes on behind-the-scenes, propose a few books in case you’re interested in Queueing Theory applications, and then finally show a video tutorial on an introduction to Queueing Theory and Models.
I make many assumptions in this article, so feel free to poke holes in what I’m arguing.
Assumptions and Data
Below is a very simple, high-level process map of the steps to upload a video on YouTube:
- User uploads video
- User waits while video is uploaded
- Upload is completed
- User waits for video to be converted from MPEG to SWF
- Conversion from MPEG to SWF completes
Based on a small sample of 20 uploaded videos of an average file size of 3.5 MB, I calculate the mean for uploads to be ~180 seconds per MB and ~240 seconds per MB for the conversion.
Based on YouTube’s own disclosure, we also know that there are on average 65,000 video uploads per day. We do not know the average file size.
Queueing Properties
Important items to note when studying the queueing properties of a system are the following:
- λ = Arrival Rate, or more specific, the time between arrivals. For most queues, we can assume that the arrival distribution can be approximated by a Poisson distribution; which means that the time between arrivals are not deterministic, but random.
- μ = Service Rate, or more specific the time for a arrival to be serviced.
A poisson distribution typically looks skewed to the left or to the right — that is because the mean and the standard deviation is the same. Here’s a standard picture of a poisson for server utilization:
What we see above is that as there are more simultaneous connections, there is a subsequent arrival rate batching — represented by the poisson curve above.
Given the data and notation, we can now attempt to better understand the queueing properties of the MPEG-to-SWF conversion. Remember: the data I have assumes several things and, is most likely, completely off the mark. But, it’s an attempt and, if anything, it’s fun to try.
Presuppostions Redux
So, 65,000 video uploads on a 12 hour day, gives us the following:
λ = 65,000 / 720 minutes = (90 / minute)
μ = 3.5 * 240 seconds = 840 seconds; 840 seconds / 60 = (14 conversion / minute)
Arguably, 14 conversion / minute is very low. Let’s just assume that YouTube average service rate is 200 conversions / minute. Given that, we can now learn about the queueing properties, which I describe below.
Average Number of Videos Waiting to be Converted
The equation to learn about the average number of files in the conversion process is the following:
Cw = (λ2 / μ(μ – λ))
So,
Cw = [(8100) / (200(200 – 90)] = .36
So, given the assumptions above, not even 1 video is waiting to be converted from MPEG-to-SWF.
Average Number of Videos in the Conversion Process
Cs = (μ – λ)
So,
Cs = [(90) / (200 – 90)] = .81
This means, given the assumptions above, that an any point in time, there is 1 video in the conversion system.
Average Time Spent Waiting
Tw = (λ / μ(μ – λ))
So, we get:
Tw = [(90) / (200(200 – 90))] = .004
This means as videos enter the conversion process, there is hardly any waiting — they are served almost immediately.
Average Time Spent in the System
Ts = (1 / (μ – λ))
So, we get,
Ts = (1 / (200 – 90)) = .009
This means, then, that as videos are uploaded and enter the conversion queue, they are served almost immediately, without any waiting.
Weaknesses in Analysis
Okay, the numbers above are pretty much pulled out of the clouds. But, if we had real data, then you could just plug them into the equations above. My guess is, though, that even if we had real numbers, the results would be really close to what I show above. Why? Well, for each input into the YouTube system, one could argue that it has very little impact on resources — this is a common property in telephony and in server modeling. I see the same thing going on with YouTube. The biggest challenge for YouTube is not computing resources, but storage capacity.
Become a Lean Six Sigma professional today!
Start your learning journey with Lean Six Sigma White Belt at NO COST
Don says
I am studying this area as part of my MBA program. The potentially novel use is to apply it to the area of Information security server availability and data integrity modeling.
Do you have suggestions on how successive queue modeling might be done between computer networking inputs and Operating System resource management functions? If so, defenses against distributed denial of service attacks against servers could be devised.
Sorry for geeking out, I hope this still made sense.
Best Wishes,
Don Turnblade
CISSP
MBA (Expected 2009)