GSM Audio Codec / Vocoder
- Audio / voice codecs and vocoders convert the voice signals required to be transmitted over a GSM link into a compact digital format. Voice codec technologies used with GSM include LPC-RPE, EFR, Full Rate, Half Rate, AMR codec and AMR-WB codec & CELP, ACELP, VSELP, speech codec technologies.
Audio codecs or vocoders are universally used within the GSM system. They reduce the bit rate of speech that has been converted from its analogue for into a digital format to enable it to be carried within the available bandwidth for the channel. Without the use of a speech codec, the digitised speech would occupy a much wider bandwidth then would be available. Accordingly GSM codecs are a particularly important element in the overall system.
A variety of different forms of audio codec or vocoder are available for general use, and the GSM system supports a number of specific audio codecs. These include the RPE-LPC, half rate, and AMR codecs. The performance of each voice codec is different and they may be used under different conditions, although the AMR codec is now the most widely used. Also the newer AMR wideband (AMR-WB) codec is being introduced into many areas, including GSM
Voice codec technology has advanced by considerable degrees in recent years as a result of the increasing processing power available. This has meant that the voice codecs used in the GSM system have large improvements since the first GSM phones were introduced.
Vocoder / codec basics
Vocoders or speech codecs are used within many areas of voice communications. Obviously the focus here is on GSM audio codecs or vocoders, but the same principles apply to any form of codec.
If speech were digitised in a linear fashion it would require a high data rate that would occupy a very wide bandwidth. As bandwidth is normally limited in any communications system, it is necessary to compress the data to send it through the available channel. Once through the channel it can then be expanded to regenerate the audio in a fashion that is as close to the original as possible.
To meet the requirements of the codec system, the speech must be captured at a high enough sample rate and resolution to allow clear reproduction of the original sound. It must then be compressed in such a way as to maintain the fidelity of the audio over a limited bit rate, error-prone wireless transmission channel.
Audio codecs or vocoders can use a variety of techniques, but many modern audio codecs use a technique known as linear prediction. In many ways this can be likened to a mathematical modelling of the human vocal tract. To achieve this the spectral envelope of the signal is estimated using a filter technique. Even where signals with many non-harmonically related signals are used it is possible for voice codecs to give very large levels of compression.
A variety of different codec methodologies are used for GSM codecs:
- CELP: The CELP or Code Excited Linear Prediction codec is a vocoder algorithm that was originally proposed in 1985 and gave a significant improvement over other voice codecs of the day. The basic principle of the CELP codec has been developed and used as the basis of other voice codecs including ACELP, RCELP, VSELP, etc. As such the CELP codec methodology is now the most widely used speech coding algorithm. Accordingly CELP is now used as a generic term for a particular class of vocoders or speech codecs and not a particular codec.
The main principle behind the CELP codec is that is uses a principle known as "Analysis by Synthesis". In this process, the encoding is performed by perceptually optimising the decoded signal in a closed loop system. One way in which this could be achieved is to compare a variety of generated bit streams and choose the one that produces the best sounding signal.
- ACELP codec: The ACELP or Algebraic Code Excited Linear Prediction codec. The ACELP codec or vocoder algorithm is a development of the CELP model. However the ACELP codec codebooks have a specific algebraic structure as indicated by the name.
- VSELP codec: The VSELP or Vector Sum Excitation Linear Prediction codec. One of the major drawbacks of the VSELP codec is its limited ability to code non-speech sounds. This means that it performs poorly in the presence of noise. As a result this voice codec is not now as widely used, other newer speech codecs being preferred and offering far superior performance.
GSM audio codecs / vocoders
A variety of GSM audio codecs / vocoders are supported. These have been introduced at different times, and have different levels of performance.. Although some of the early audio codecs are not as widely used these days, they are still described here as they form part of the GSM system.
|Codec name||Bit rate
|AMR||12.2 - 4.75||ACELP|
|AMR-WB||23.85 - 6.60||ACELP|
GSM Full Rate / RPE-LPC codec
The RPE-LPC or Regular Pulse Excited - Linear Predictive Coder. This form of voice codec was the first speech codec used with GSM and it chosen after tests were undertaken to compare it with other codec schemes of the day. The speech codec is based upon the regular pulse excitation LPC with long term prediction. The basic scheme is related to two previous speech codecs, namely: RELP, Residual Excited Linear Prediction and to the MPE-LPC, Multi Pulse Excited LPC. The advantages of RELP are the relatively low complexity resulting from the use of baseband coding, but its performance is limited by the tonal noise produced by the system. The MPE-LPC is more complex but provides a better level of performance. The RPE-LPC codec provided a compromise between the two, balancing performance and complexity for the technology of the time.
Despite the work that was undertaken to provide the optimum performance, as technology developed further, the RPE-LPC codec was viewed as offering a poor level of voice quality. As other full rate audio codecs became available, these were incorporated into the system.
GSM EFR - Enhanced Full Rate codec
Later another vocoder called the Enhanced Full Rate (EFR) vocoder was added in response to the poor quality perceived by the users of the original RPE-LPC codec. This new codec gave much better sound quality and was adopted by GSM. Using the ACELP compression technology it gave a significant improvement in quality over the original LPC-RPE encoder. It became possible as the processing power that was available increased in mobile phones as a result of higher levels of processing power combined with their lower current consumption.
GSM Half Rate codec
The GSM standard allows the splitting of a single full rate voice channel into two sub-channels that can maintain separate calls. By doing this, network operators can double the number of voice calls that can be handled by the network with very little additional investment.
To enable this facility to be used a half rate codec must be used. The half rate codec was introduced in the early years of GSM but gave a much inferior voice quality when compared to other speech codecs. However it gave advantages when demand was high and network capacity was at a premium.
The GSM Half Rate codec uses a VSELP codec algorithm. It codes the data around 20 ms frames each carrying 112 bits to give a data rate of 5.6 kbps. This includes a 100 bps data rate for a mode indicator which details whether the system believes the frames contain voice data or not. This allows the speech codec to operate in a manner that provides the optimum quality.
The Half Rate codec system was introduced in the 1990s, but in view of the perceived poor quality, it was not widely used.
GSM AMR Codec
The AMR, Adaptive Multi-rate codec is now the most widely used GSM codec. The AMR codec was adopted by 3GPP in October 1988 and it is used for both GSM and circuit switched UMTS / WCDMA voice calls.
The AMR codec provides a variety of options for one of eight different bit rates as described in the table below. The bit rates are based on frames that are 20 millisceonds long and contain 160 samples. The AMR codec uses a variety of different techniques to provide the data compression. The ACELP codec is used as the basis of the overall speech codec, but other techniques are used in addition to this. Discontinuous transmission is employed so that when there is no speech activity the transmission is cut. Additionally Voice Activity Detection (VAD) is used to indicate when there is only background noise and no speech. Additionally to provide the feedback for the user that the connection is still present, a Comfort Noise Generator (CNG) is used to provide some background noise, even when no speech data is being transmitted. This is added locally at the receiver.
The use of the AMR codec also requires that optimized link adaptation is used so that the optimum data rate is selected to meet the requirements of the current radio channel conditions including its signal to noise ratio and capacity. This is achieved by reducing the source coding and increasing the channel coding. Although there is a reduction in voice clarity, the network connection is more robust and the link is maintained without dropout. Improvement levels of between 4 and 6 dB may be experienced. However network operators are able to prioritise each station for either quality or capacity.
The AMR codec has a total of eight rates: eight are available at full rate (FR), while six are available at half rate (HR). This gives a total of fourteen different modes.
|Full Rate (FR) /
Half rate (HR)
|AMR 7.95||7.95||FR / HR|
|AMR 7.40||7.40||FR / HR|
|AMR 6.70||6.70||FR / HR|
|AMR 5.90||5.90||FR / HR|
|AMR 5.15||5.15||FR / HR|
|AMR 4.75||4.75||FR / HR|
Adaptive Multi-Rate Wideband, AMR-WB codec, also known under its ITU designation of G.722.2, is based on the earlier popular Adaptive Multi-Rate, AMR codec. AMR-WB also uses an ACELP basis for its operation, but it has been further developed and AMR-WB provides improved speech quality as a result of the wider speech bandwidth that it encodes. AMR-WB has a bandwidth extending from 50 - 7000 Hz which is significantly wider than the 300 - 3400 Hz bandwidths used by standard telephones. However this comes at the cost of additional processing, but with advances in IC technology in recent years, this is perfectly acceptable.
The AMR-WB codec contains a number of functional areas: it primarily includes a set of fixed rate speech and channel codec modes. It also includes other codec functions including: a Voice Activity Detector (VAD); Discontinuous Transmission (DTX) functionality for GSM; and Source Controlled Rate (SCR) functionality for UMTS applications. Further functionality includes in-band signaling for codec mode transmission, and link adaptation for control of the mode selection.
The AMR-WB codec has a 16 kHz sampling rate and the coding is performed in blocks of 20 ms. There are two frequency bands that are used: 50-6400 Hz and 6400-7000 Hz. These are coded separately to reduce the codec complexity. This split also serves to focus the bit allocation into the subjectively most important frequency range.
The lower frequency band uses an ACELP codec algorithm, although a number of additional features have been included to improve the subjective quality of the audio. Linear prediction analysis is performed once per 20 ms frame. Also, fixed and adaptive excitation codebooks are searched every 5 ms for optimal codec parameter values.
The higher frequency band adds some of the naturalness and personality features to the voice. The audio is reconstructed using the parameters from the lower band as well as using random excitation. As the level of power in this band is less than that of the lower band, the gain is adjusted relative to the lower band, but based on voicing information. The signal content of the higher band is reconstructed by using an linear predictive filter which generates information from the lower band filter.
|6.60||This is the lowest rate for AMR-WB. It is used for circuit switched connections for GSM and UMTS and is intended to be used only temporarily during severe radio channel conditions or during network congestion.|
|8.85||This gives improved quality over the 6.6 kbps rate, but again, its use is only recommended for use in periods of congestion or when during severe radio channel conditions.|
|12.65||This is the main bit rate used for circuit switched GSM and UMTS, offering superior performance to the original AMR codec.|
|14.25||Higher bit rate used to give cleaner speech and is particularly useful when ambient audio noise levels are high.|
|15.85||Higher bit rate used to give cleaner speech and is particularly useful when ambient audio noise levels are high.|
|18.25||Higher bit rate used to give cleaner speech and is particularly useful when ambient audio noise levels are high.|
|23.05||Not suggested for full rate GSM channels.|
|23.85||Not suggested for full rate GSM channels, and provides speech quality similar to that of G.722 at 64 kbps.|
Not all phones equipped with AMR-WB will be able to access all the data rates - the different functions on the phone may not require all to be active for example. As a result, it is necessary to inform the network about which rates are available and thereby simplify the negotiation between the handset and the network. To achieve this there are three difference AMR-WB configurations that are available:
- Configuration A: 6.6, 8.85, and 12.65 kbit/s
- Configuration B: 6.6, 8.85, 12.65, and 15.85 kbit/s
- Configuration C: 6.6, 8.85, 12.65, and 23.85 kbit/s
It can be seen that only the 23.85, 15.85, 12.65, 8.85 and 6.60 kbit/s modes are used. Based on listening tests, it was considered that these five modes were sufficient for a high quality speech telephony service. The other data rates were retained and can be used for other purposes including multimedia messaging, streaming audio, etc.
By Ian Poole
Want more like this? Register for our newsletter