If a freight train rattles past, passengers usually only understand about half of an announcement such as "The train to Frankfurt am Main will be departing today from platform...". Researchers from the Oldenburg-based Project Group Hearing, Speech and Audio Technology of the Fraunhofer Institute for Digital Media Technology IDMT have developed a software that significantly improves the intelligibility of speech – even for the voices of speakers at conferences or conversations on mobile phones.
Microphone analyzes noise levels
The trick of the ADAPT DRC software is that the ambient noise is continually analyzed via a microphone, and the speech is adjusted to it in real time. "It is not enough to simply make the voice louder over the loudspeaker or mobile phone to drown out the noise," says project manager Dr. Jan Rennies-Hochmuth. Such technologies are already used today in car radios, making the voice louder, but not necessarily more easily understood, because, at high volumes, the speakers reach their limits and start to rattle. "Speech is much more complex," says Rennies-Hochmuth.
Firstly, it is important to reinforce certain pitches, the frequencies, in a targeted fashion. Vowels are relatively deep, long-drawn-out word components that are easy to understand. Consonants like "p", "t" and "k", however, are very short and have higher frequencies. Even though they are very important for understanding what is said, it is generally not easy to understand them as well in noisy environments. For example, the consonants influence whether a recipient who is listening to an announcement in German thinks he has heard the word "Kasse" or "Tasse" (in English, "checkout" or "cup"). "Our algorithms are able to prioritize certain frequencies and to reinforce, at the right time, precisely those which are particularly disturbed by the ambient noise," adds Rennies-Hochmuth.
Amplifying quiet speech components
Secondly, the software takes into account the parts of the speech signal which are of different volumes. Since spoken language is composed of loud and quiet parts, experts use the term "voice dynamics". Speech intelligibility increases particularly when loud parts are systematically subdued and quiet parts are specifically amplified. This technique is called Dynamic Range Compression (DRC). This is also of interest if, for example, you make a call using a mobile phone when you are on a noisy street.
The ADAPT DRC software has already been developed to the point of application maturity and is available to industrial partners. Since modern conference equipment or mobile phones already have built-in microphones, the devices already possess the technology which is necessary to be able to record the ambient noise. For speaker systems at railway stations or airports, additional microphones would first have to be installed.