The Ten Deadly Sins of Mobile Video Calling – Sins 1-5: Technology
Since the dawn of humanity, mankind has always sought to communicate. Back in the beginning, things were simple. Grunts, pointing of fingers, clubbing on the head. Primitive, but it got the message across.
Mankind quickly discovered that it needed a way to communicate when the other person isn’t right next to you. And so began a long series of inventions over centuries of time. Cave drawings, the written word, paper, the postal system and finally the arrival of electronic communications: The telegraph, the telephone. The telephone network was virtually transformed from the inside out when it went digital in the middle of the 20th century.
The next big revolution, of course, was mobile calling and the arrival of cellular networks. Ultimately it still provided the same old service as the wired telephone network did, but you could take it with you. With the Internet, a whole host of new communications services emerged – most notably email and IM. Ultimately though, these were just different manifestations of an old idea – sending text from one person to another. The telegraph had that too, decades prior. It just wasn’t quite as easy to use. The Internet also brought the arrival of Voice over IP. In many ways, Voice over IP has been hugely successful. But when we look at it closely, it is also – to a large degree – another repackaging of what we have already been doing – voice, communicated over a distance. It got cheaper, and it got easier to use. But, was it really more – was it really better?
Not really. And that’s what’s startling here. When we look at communications, especially in modern times, what we find is that there have been huge advances in the things that surround real-time communications, but not the communications itself. Look at the mobile phone. This is a technology whose change over the last 20 years – even the last decade – is nothing short of phenomenal. Compare the original Motorola brick phone to the iPhone 4 – astounding. But as a phone – as a service for communications – you get almost the same experience. 20 years ago, we dialed numbers and we got tinny voice conversations. Today I get the same experience.
One of the things Skype is doing is trying to make this calling experience better.
With the SILK codec, we’ve introduced super-wideband voice calling to mobile devices around the world, enabling crisper conversations, easier interpretation of accents and an overall high quality voice experience. But voice is just the first step. To more fundamentally transform the communications experience, we needed to add video. And so we did.
The idea of video calling is certainly not new. The first videophone was shown at the World’s Fair in 1964 – ages ago. The technology wasn’t there yet, and it is only in recent years that video communications has gone mainstream. How mainstream? Well – let me share some of our statistics.
On average, 42% of Skype-to-Skype calls include video* and the number is probably even higher at peak times – around New Year celebrations, for example. And, Skype-to-Skype calling minutes are equivalent to approximately 20% of all global international PSTN and Skype-to-Skype calling minutes.**
Where video is going next is mobile. 2010 was undoubtedly the year that video calling arrived on mobile. Mobile video calling is also not new – we’ve seen a long line of failed mobile video products over many years. But we’re still at the beginning of mobile video. Getting mobile video right is actually really hard. Indeed, there are – in essence – ten deadly sins of mobile video, each of which, if not adequately addressed, can stop the technology dead in its tracks.
They fall into three categories, which I’ll explore in this blog post and two more later this week.
The first bunch of them are related to technology.
Sin 1: No cameras
Simple, but a big deal. Without a camera in the front of the phone, you are simply not going to have a video conference call. You might have a see-what-I-see experience using the rear camera – and Qik is great for this – but you really want both. Though phones with front facing cameras have been available outside of the US, they were never mainstream and never made their way stateside. That changed (finally) last year with the iPhone4 and iPod touch, which brought front facing cameras mainstream. Android phones have caught on now too, and we’re seeing a bunch of them roll out now with front facing cameras. Great – and fortunately for us, advancements in technology are squashing this sin.
Sin 2: Lousy screens
Video needs screen real estate. Until recently, we just didn’t have it. Prior to the arrival of devices like the iPhone and the Motorola DROID, screens were generally small and had meager resolutions too. Now, we finally have what we need – screens which are the size of the phone with resolutions that can show video crisp enough to see the smile on someone’s face.
And so, once again, general advancements in technology have addressed this problem too.
Sin 3: Slow networks
Video needs a lot more bandwidth than voice. Our iPhone app needs about 600Kb/s to make a decent video call. Until a few years ago, you just couldn’t get that kind of speed on a mobile phone. Two things have addressed this:
- The arrival of 3G cellular networks, which often (but not always) have enough bandwidth to carry a mobile video call.
- The widespread availabiity of WiFi on smartphones. WiFi is not without problems, but at least it tends to provide the bandwidth needed for a video call. Fortunately, many calls – video or otherwise – happen in either the home or the workplace. Those are the two places many users have WiFi enabled on their phones.
Put together, WiFi and 3G cellular networks mean that bandwidth is available in many more locations, making video calls possible.
Sin 4: Slow processors
Video not only requires more bandwidth than voice; it requires more CPU resources too. Encoding a QVGA video stream on a typical smartphone consumes a sizeable percentage of the CPU resources when performed in the main processor. Higher resolutions are out of reach of the CPU, and require hardware assistance from dedicated encoding chips.
To be fair, this isn’t just a problem for mobile phones – it’s still a problem for PCs. The typical modern PC is still not powerful enough to encode an HD video stream in realtime. Even VGA doesn’t work on many PCs yet. No surprise that it is barely possible on the majority of mobile devices.
The situation around hardware acceleration of video encoding and decoding is also a big problem right now. On some platforms, there is hardware accelerated functionality, but it is not available to third-party applications like Skype, and iOS is an example of this. Facetime uses hardware acceleration to improve quality, but those improvements are not available through the iOS API.
The problem isn’t just about raw CPU horsepower. It’s also about latency. Realtime communications – both voice and video – are really sensitive to delays. For an ideal experience, you want the amount of time it takes between when one person speaks to when the other person hears to be under 150 milliseconds.
Think about it like this: In order to have a mobile video call, video frames must be captured from the camera, sent through the phone hardware, and processed by the software on the phone – on both sides, all in a timely fashion.
Unfortunately, the video camera systems on many phones were designed for streaming video and recording, which has much more relaxed delay requirements. As a result, many phones on the market today have hundreds of milliseconds of delay just for capturing a video frame and making it available to the software on the phone. The problem is even more complex on Android, where the variety of different phones, each with differing hardware and designs, make life even harder for developers like Skype.
Sin 5: Poor UI
It’s amazing how easy it is to design a bad UI. The UI for mobile video has to make it dead simple to use. It’s easy to focus on the obvious stuff – selecting contacts, making the call, hanging up the call. But the harder stuff has to be handled too.
The biggest hurdle is figuring out whether the person you want to call has the right equipment in the first place. This isn’t specific to video – it has been a major complaint of users in adoption of online communication products in general. For video, we now have to factor in the question of whether the person you want to call has a camera or not. Does the device they are on even support video? How do you let the person you want to call know that everything is ‘ready’ in intuitive ways? How do you identify and find people who you can call?
Then there are other complexities – do you allow people to make voice-only calls? What about shutting off video? How does the other side know that a video shut-off is not a consequence of a problem? Should you let them know that the sender cancelled their video or does that complicate the UI? How do you let the other side know that your video is being received?
These problems are surmountable, but will require time and investment in UI.
These items – the cameras, the screens, the processors, the networks, and the UI – all of them are likely to improve over time with the never-ending improvements in technology. However, even if we eliminate all of these problems, there are others which technology itself is unlikely to solve. Those are the problems I’ll be covering in my next posts.
* For the fourth quarter 2010
** TeleGeography, January 2011