The power of the SILK codec
At Skype, we are passionate about quality. Our engineers do everything they can to make Skype work well across a range of different computers, service providers, Internet speeds, and computer equipment.
One of the key ingredients in quality is the fidelity of the speech that you hear when the other person is talking. Our aim is to make the experience as close to “the person sounds like they are in the room with me” as possible.
The High-Def Factor
In order to make it sound like the other person is in the room with you, it is necessary to capture the full frequency range of their speech, transmit it over the network, and reconstruct it at the other end. This is something that the traditional telephone networks – including mobile and landline – are not very good at. Those networks were designed to convey “just enough” of the frequency range of human speech to make the call intelligible, but not nearly enough to make it sound like they are in the same room as you.
For Skype, it has always been important to go beyond the boundaries of the telephone network, and do more.
That’s why Skype invested in creating its own speech codec. A speech codec is a piece of technology which takes human voice and converts it to a format suitable for sending over the Internet, and then on the other side, converts it back to speech.
Our codec, called SILK, was designed to capture the true richness of human speech, and to work really well on the Internet, where Internet speeds vary. Think of the telephone network as being like a standard-definition television, and SILK being like a high-definition television. The difference is striking. Once you’ve tried it, it’s hard to go back – just like HDTV. In fact, we thought that SILK was so awesome, we contributed it to the community. We’ve made our code open source, and have brought SILK forward for standardization by international standards bodies.
Hear the Difference
The two links below represent two samples of speech – one that uses a “standard-def” codec (called G.729), and another coded using our SILK codec in its highest quality setting. Listen for yourself. What do you think?
Proving the Difference
But, maybe it was just us. As experts, perhaps we could tell the difference, but our users could not. To convince ourselves of the difference, we ran a test. We selected a random subset of our users. These users were not aware of the test, ensuring it was blind. For those users, we modified the behavior of the client so that it used a specific codec. For some users, it was one of the codecs used in the telephone network – not a high-definition one. For other users, we used SILK. In fact, we tested SILK in its four different modes – from SILK NarrowBand (SILK-NB) (which uses less bandwidth and captures less fidelity), to SILK MediumBand (SILK-MB), to SILK WideBand (SILK-WB), up to its highest quality mode – SILK Super WideBand (SILK-SWB), which uses more bandwidth and provides the best fidelity. To factor out problems with network connections, the tests were only run under good network speeds. At the end of the call, we asked them to rate the call from 1 (very bad) to 5 (excellent). We also measured how long the calls were.
Differences in user rating and average call duration with different codecs.
The results were surprising. Not only was there a difference – there was a big difference. For the highest quality SILK calls (SILK-SWB), the Mean Opinion Score (the value of the average user rating) was 3.8 on a scale from 1 to 5. The low quality codec G.729 was rated 0.4 lower, at 3.4.
Even more startling were the durations of the calls. Call durations for the highest quality SILK calls (SILK-SWB) were around 31 minutes on average. For the low quality G.729 codec they were 21 minutes, or 30% shorter! The higher quality of the SILK-SWB experience made the calls easier – more natural – and that meant users felt more comfortable talking longer.
For us, this test solidified what our engineers already knew – that a high fidelity experience really does make a difference.