Image Credit: Rachit Tank

There is an unthinkable amount of video on YouTube. How do they manage the billions of hours of video uploaded and streamed through their site? They’ve given some answers.

Have you ever wondered how they do it? Last year YouTube confirmed that 500 hours of videos are uploaded to YouTube every single minute. That’s 30,000 hours of video uploaded every hour and 720,000 hours every single day. That’s a lifetime’s worth of viewing added to YouTube every single day – a mind-boggling amount.

Put that into perspective of the storage required to hold that (safe to say my computer would be more than full up) and the server power needed to make it readily available and streamed every day and you have to think: what magic are Google harnessing to power this platform?

In a new article titled ‘Reimaging video infrastructure to empower YouTube‘, their lead software engineer Jeff Callow walks through the inception of the platform and the technology that has allowed them to do what can easily seem to be The Impossible. We’ll look at the key points that help to clear up the huge question of our title, but for much more information and a detailed explanation be sure to check out YouTube’s full article.

The Magic of Compression

Whilst they don’t delve into the no-doubt vast tracts of servers working to store and stream the incredible swathes of content delivered to YouTube, their secret to getting it out easily is in transcoding. Their efficient and refined transcoding methods allows them to compress video files as small as possible to make them easy to send around the world to many different devices, without cost to the quality.

Callow says: “An important thing to understand is that video is created and uploaded in a single format, but will ultimately be consumed on different devices – from your phone to your TV – at different resolutions.” With their transcoding processes they can ensure that the highest possible quality is available to the user’s preference – even 4K – with the smallest cost to transfer speed and data needed to download.

The article looks at how they’re making this process even better, encouraged by the huge uptick in video views in the last year. They saw a 25% uptick in video views in the first quarter of last year powered by lockdowns and social distancing, as well as a continuous move towards digital video for watching that has been taking place for over a decade now.

YouTube’s New Generation of Coding

At the recent ASPLOS conference (Architectural Support for Programming Languages and Operating Systems – what a mouthful) YouTube unveiled their new, fresh system for transcoding video. You thought they were good at it beforehand? Now they reckon their new computer chips can improve computing efficieny by 20-33x compared to their old system.

Based on an illustration that YouTube have shared to exemplify the new system (which simplifies the real technological process) instead of separate systems encoding the videos for 4K on TV, HD streams on laptop, smaller resolutions on devices, and so on; their new VCU (Video trans-Coding Unit) will be able to handle them all.

Image Credit: YouTube

New Technology for New Content

Whilst the old system has worked pretty damn well, the amount of content they’re handling is certainly not decreasing. Not only is video uploading and consumption constantly on the rise, the type of content on YouTube is also transforming. Callow revealed that in the first half of 2020, daily livestreams grew by a whopping 45%.

Livestreaming was already on the rise in a big way, then the pandemic hit and we all found ourselves spending far more time online and in many ways – I believe – livestream became a way of connecting. It was a way to be a part of something that was happening and that moment and joining a community. Whilst we were physically separated, livestreaming spiritually connected us.

That huge new content requires new technology and infrastructure to power real-time video streaming and processing around the world to hundreds, thousands, or potentially millions of people at a time. With their new system they can scale the potential up here and use a one-fits-all distribution method through their VCU.

The new system of scaling various resolutions of content from the same source started in 2015, as the demand for 1080p became widespread and the push for 4K and even 8K began. Callow says: “We saw that the broader internet wouldn’t be able to accommodate this growth unless we shifted to more data-efficient video codecs (codecs are basically different ways to compress video data).

“However, data-efficient video codecs like VP9 us more computer resources to encode than H.264. The combination of these dynamics led us to pursue a dramatically more efficient and scalable infrastructure. Here’s a comparison of the image quality in a Janelle Monaé video. The VP9 version clearly looks better than the legacy H.264, but it uses 5x more computer resources to encode.”

Left: H.264 + Right: VP9
Image Credit: YouTube
The Future of YouTube’s Video Infrastructure

Looking ahead with the amazing progress they’ve made in recent years in mind, Callow said: “One of the things about this is that it wasn’t a one-off program. It was always intended to have multiple generations of the chip with tuning of the systems in between. And one of the key things we’re doing in the next-generation chip is adding in AV1, a new advanced coding standard that compresses more efficiently than VP9, and has an even higher computation load to encode.

“As for me, I’ll be continuing my work on the project, developing future generations, which will keep me busy for a while.”