The music on hold sounds tinny through the smartphone’s built-in speakers. And even when you finally get through to a real human being, the voice at the other end of the line could barely be described as compelling. For although immense progress has been made in the development of all kinds of smartphone apps, the quality of voice transmission hasn’t improved for years.
Clear and natural as opposed to muffled and distorted
The new Enhanced Voice Services (EVS) standard promises a step change comparable with the transition from analog CRT to digital flat-screen TVs. Instead of sounding muffled and distorted, the caller’s voice is as clear and natural as in a face-to-face conversation. The impetus for developing the new codec was given by the 3rd Generation Partnership Project (3GPP), the international body that develops standards for mobile communication. A large team of researchers at the Fraunhofer Institute for Integrated Circuits IIS in Erlangen took part in this project.
The specifications for standards of this type are extremely demanding. “First of all, the codec must be capable of transmitting high-quality speech signals at relatively low data rates – so as not to compromise cost-efficiency,” says Dipl.-Ing Markus Multrus, who coordinated the software development part of the project at Fraunhofer IIS. Another requirement is that the codec should be sufficiently robust to recover from transmission errors, thereby ensuring that calls are not dropped due to poor reception. Moreover, the codec must also be able to deliver similarly high quality when processing other types of signal, such as music on hold. This challenge is anything but simple, given that speech coding and audio coding are two separate worlds. The new codec therefore analyses the flow of signals every 20 milliseconds to distinguish between voice and music transmission, enabling the appropriate algorithms to be applied.
Transmission of the entire audible frequency spectrum
From a technical point of view, what is the difference between conventional and EVS codecs? “The human ear can hear frequencies of up to 20 kilohertz,” explains Dr. Guillaume Fuchs, the research scientist who led the development of EVS at Fraunhofer. “But the frequency range of the audio signals transmitted by currently available codecs only extends to 3.4 kilohertz – any frequencies above that limit are simply cut off, which is why phone calls sound so muffled. The new codec allows frequencies of up to 16 or even 20 kilohertz to be transmitted, depending on the bit rate of the connection.” In short, it is capable of transmitting the entire audible frequency spectrum – at similar rates to today’s wireless data codecs.
Voice quality indistinguishable from normal speech
Before a new coding standard can be accepted, proof has to be provided that the codec fits the defined specifications. In numerous listening tests, the EVS codec was evaluated by several thousand test subjects throughout the world. They rated the new standard as significantly better than existing solutions. The new codec has meanwhile been approved as a 3GPP standard. “Enhanced Voice Services are already commercially available in Japan, Korea, the United States, and Germany,” reports Dipl.-Ing Stefan Döhla, who represents Fraunhofer IIS at 3GPP meetings. “It is estimated that between 50 and 100 million devices have been equipped so far with the EVS codec.”
One of this year’s Joseph von Fraunhofer Prizes went to Dipl.-Ing. Markus Multrus, Dr. Guillaume Fuchs and Dipl.-Ing. Stefan Döhla for the development of the EVS codec. They accepted the prize on behalf of the 50-strong team of researchers and engineers who worked on this project. The jury’s decision was based among other things on “the codec’s worldwide user base and its potential to generate substantial license-fee revenues.”