When you make a call using VoIP (Voice over Internet Protocol), your voice has to be turned into data so it can be transmitted to the other person. Then, so the receiver can understand what you said, that data has to be turned back into your voice.
There are several ways of turning sound into data, but some of them produce huge files that take far too much bandwidth to transmit. That’s where VoIP codecs come in. They use algorithms to compress (“co”) and decompress (“dec”) voices and digitized vocals.
Overview: What are VoIP codecs?
Think of a VoIP phone service codec like an interpreter. Except instead of two people, you have two entities: 1) the person talking and 2) the internet. The internet’s “language” consists of zeroes and ones, also known as bytes. That’s the only thing the internet can understand, so to speak.
When you talk into a VoIP phone, the codec serves as a translator, turning your or the voice of your customer into something the internet can understand, which is data. Then, for the person you’re talking to on the other end, the codec does the opposite. It turns the data back into a voice that the receiver can understand.
Codecs are used for several other kinds of “translation.” Videos are put through codecs so they can be sent from YouTube’s server to your computer, for example. Music often begins as a .WAV or .AIFF file. Although these formats sound great, they’re huge, so a codec turns them into .MP3, which results in a far smaller file that still sounds pretty good.
How do VoIP codecs work?
A VoIP codec can transform voices into packets of data, compressing them so they can be transferred using less bandwidth (more on this later). When the signal arrives at its destination, the codec then decompresses the audio, making it clear for the listening party.
A VoIP codec is the primary determining factor driving the quality of a call, as well as any latency you experience during the conversation.
There are a few factors that determine the quality of the voice.
- Sample rate: Sample rate also goes by the term “sample frequency.” The sample rate is the number of audio samples the codec takes every second. It essentially tells you how many pieces of the original audio were turned into digital information over the course of a second. Therefore, the higher the sample rate, the higher the audio quality.
- Bit rate: Bit rate refers to how much data gets turned into audio. A high bit rate means more audio was captured every second, which typically results in a better overall sound quality. A general rule of thumb is the lower your bit rate, the worse the audio quality on the other end. This is true regardless of the quality of the original sound. In other words, a low bit rate will produce a poor sound even if the original sound was recorded at a high sample rate.
- Bandwidth: Bandwidth is the speed at which you send or receive data. With a codec, this refers to the transmission rate, which determines how many samples are sent each second. Bandwidth serves as the bottleneck of the system. The VoIP codec sets a maximum limit on how much data is transferred during a call.
3 key elements to look for in audio quality
So that the audio quality is the best possible for your VoIP communication, there are some key aspects to consider.
1. Clarity of high frequencies
When people speak, they tend to produce sounds that sit somewhere between the frequencies of 125 Hz and 8000 Hz. When someone with a deep voice thoughtfully utters a “hmmm…” the sound produced is close to 125 Hz. On the other hand, when people say their t’s, f’s, and s’s, they produce higher-frequency sounds.
Often, users need the higher frequencies for the details of the consonants to be heard. Typically, this isn’t an issue with most codecs. However, in some situations, the higher frequencies may be accentuated, producing a harsh, grating sound. You want a codec that produces a crisp yet natural vocal quality.
Exceptionally high frequencies, like those of a violin, may not be rendered clearly with a codec because the compression algorithms remove too much of the sound.
2. Clarity of lower frequencies
Codecs typically do a good job with rendering lower frequencies, even though many phone and computer speakers don’t produce enough bass for users to hear the lower tones. This is because lower frequencies have longer periods in between the peaks of their soundwaves, resulting in less sonic detail.
Therefore, the codec will, most likely, grab enough of the sound’s original signal to accurately represent the original, less detailed sound. This benefits users desiring a more “human” vocal quality. But if the codec accentuates high frequencies, the lower tones may sound muddy or garbled. Therefore, codec users should test how the lower frequencies, those under 500 Hz, sound alongside those above 2500 Hz.
3. Smooth, uninterrupted sound
In VoIP calling, latency refers to the delay that often occurs between when someone says something and the listener on the other end hears what they said. In addition to compromising phone etiquette, latency can result in two issues:
- Sound can arrive at its destination so late that the natural flow of the conversation is interrupted.
- The beginning of a word or phrase can reach the destination, but, due to latency, a significant part gets cut out, resulting in an unintelligible message.
The best way to get an idea as to how much latency a system has is to test it in a variety of situations with higher and lower bandwidths available and on multiple devices.
3 types of VoIP codecs
When transmitting speech using VoIP, you are taking a rich, nuanced, and detailed thing — the human voice — and turning it into data. While it would be nice to be able to do that in a way that perfectly presents the voice exactly as it was uttered, you typically can’t have your cake and eat it, too. The data packets would be too big.
To strike a balance between VoIP call quality and bandwidth requirements, people have developed several different types of VoIP codecs.
The G.711 voice over IP codec produces highly detailed audio. But the high detail results in data-heavy transmissions. It needs 128 kbps (kilobytes per second) of bandwidth for a two-way conversation. However, the G.711 is a simple codec that requires very little processing power to run.
The G.722 codec provides high-definition sound transmissions while greatly reducing latency. It can adapt to how much bandwidth is available, making it a more convenient option for users who require less latency.
G.729 produces acceptable quality audio without consuming a lot of bandwidth. It works by separating the audio into frames and then encoding each frame. The end result is higher-quality audio that sounds more like the human voice.
G.729 is a proprietary codec, meaning you have to pay for it. The cost is absorbed in the price tag of gateways and phones that use it.
How to organize your codecs
The key when organizing your VoIP audio codecs is to choose the best one for the purposes of your specific phone system. If you’re looking for higher quality, you will want to lean toward the G.722 followed by the G.711. When considering G.729 vs. G.711, if you have strict VoIP codecs bandwidth restrictions, the G.729 may be preferable.
You will have to assess your requirements and constraints to prevent having to upgrade your network to allow for the best codec.
VoIP codecs and productivity
The average day-to-day user doesn’t typically have to worry about codecs. However, if you’re a service provider or part of a team selecting or implementing a VoIP application within an organization, it pays to know what they are and how they work.
The right codec can enhance customer communication, so getting the right one will help your teams communicate better with each other, making sure they hear the vocal detail they need. At the same time, choosing a codec that accommodates your bandwidth restrictions will avoid the interruption of other team members’ online activity, facilitating smooth, unobtrusive communications.