Beyond mp3 - Karlheinz Brandenburg on his new project

The renown inventor is actually working on the perfect auditory illusion today.

Finding an inductee into the “Internet Hall of Fame” among the ranks of Fraunhofer alumni is no surprise: Prof. Dr.-Ing. Dr. rer. nat. h.c. mult. Karlheinz Brandenburg is one of the inventors of the MP3 file format, probably the most important development for the music industry since the invention of vinyl records. Over a year ago, he resigned as director of Fraunhofer IDMT to enjoy a “not-so-retired” retirement, as he puts it himself. Not content with just being a senior professor at Ilmenau University of Technology, the founder and CEO of Brandenburg Labs has also been enthusiastically working on “the next big thing”: PARty, quasi the acoustic equivalent of virtual reality. In addition to creating a perfect auditory illusion for the listener, the device serves as an acoustic magnifying glass or filter — essentially, it will allow the user to block out or listen closely to specific sounds. The possible applications are plentiful, but the project is still a long way off from being market ready. While the researchers are facing many challenges, as an investor and researcher, Brandenburg has not been scared off. As with the development of the MP3, this new project is the work of a relatively small, tightly knit team.

The mp3 team at Fraunhofer IIS in 1987
© Fraunhofer IIS
A "real team": The comparatively small but well-coordinated team around Heinz Gerhäuser (seated) successfully challenged the industry giants with mp3. From left: Harald Popp, Stefan Krägeloh, Harmut Schott, Berhard Grill, Heinz Gerhäuser, Ernst Eberlein, Karlheinz Brandenburg and Thomas Sporer. This shot was taken in 1987 at the Fraunhofer Institute for Integrated Circuits IIS in Erlangen.
Karlheinz Brandenburg is now involved as a founder and investor
© Brandenburg Labs
With the founding of his own company, Brandenburg Labs, the mp3 developer from back then is now pursuing his big dream of a technology that provides a perfect "auditory illusion".
Chips made in Germany by ITT Intermetall
© Fraunhofer IIS
First prototype of an mp3 player without moving parts. The Breisgau-based company ITT Intermetall presented this at the "Tonmeistertagung" Karlsruhe in 1994. About four years later, the first portable mp3 devices hit the market. Around 10 billion devices have been licensed to date.

You have made a major contribution to the digitization of music. However, in recent years, despite being pronounced dead multiple times, vinyl is thriving once again, contrary to all predictions. Mr. Brandenburg, when was the last time you listened to a record?

A vinyl record? It was definitely more than 25 years ago. We did too much research into this area at Fraunhofer IIS for me to give any credit to the supposed advantages. These are all psychological factors that have nothing to do with the sound quality. People can recognize records due to the background noises, and when combined with the haptics, optics, and other factors, this makes people feel good. People who like that prefer vinyl records, but not me.

But is it not true that psychological factors are very important when it comes to listening to music?

One thing we can say with certainty on this point is that what you hear is very much based on what you expect. We can’t get away from that. Our brain has a superb ability to match patterns. The ear is constantly making comparisons with things we have heard or experienced. However, other factors apart from memory also affect our perception, such as our current surroundings or our feelings. So, we evaluate variants based on personal taste.

Let’s stay on the topic of perception. You have probably listened to “Tom’s Diner” more than anyone else. Why was this song by Suzanne Vega such a major challenge in particular?

The MP3 format does not rely much on psychological processes. Instead, it uses our knowledge of how the ear works. On the format side, we draw on signal statistics for efficient coding. This is what we call masking and is more intense when music uses a wide range of frequencies or is very complex. However, this is not as strong with individual sounds. These effects cancel each other out to some extent. The brain analyzes what it hears particularly critically when it comes to speech. Speech consists of both the voice’s fundamental frequency and the harmonics, and the algorithm encodes these elements separately. Speech signals also have a very wide range or frequencies, or bandwidth. In the case of “Tom’s Diner,” the algorithm reported that it would need four times the available bit rate.

However, you and your team managed to solve even this problem in the end. But let’s take a look at the similarities between the business models for vinyl records and for MP3s. In both cases, the majority of sales was determined by the content and not by the technology. Is that not the classic marketing problem?

If you look at the role of the Fraunhofer-Gesellschaft, then you have to explicitly refute that. Perhaps we didn't outstrip Dolby’s sales, but we did achieve significant revenue from patents. On that point, I would also like to quote an American CTO of a large technology group, and a friend of mine, who declared to me in early 2000: “You know Karlheinz, you Fraunhofer guys have been the only ones in this area who understood the business models of the Internet.” At first, I took it as flattery. But with hindsight, I believe that he was not entirely wrong. According to the rules of the market at that time, companies that invested 100 times as much in their marketing budget for comparable technologies should have been the ultimate winners.

Millions of people were using this format. Did providers have to follow suit?

This was a course that was set even before the MP3 was standardized. In the late 90s, some organizations were competing for this market. As I said, they had massive marketing budgets, but most importantly, their technologies were only a little behind the MP3. Believe me, we celebrated when the last of the major players, namely Microsoft and Sony, finally adopted the MP3. What did Fraunhofer do differently? We looked at areas of the market where precedents had already been set. When it came to transferring music files, RealNetworks, a Microsoft spin-off, had created a decoder and encoder that could be used to prepare content and make it freely available on the Internet. It sounded awful to begin with. However, it did allow people to listen to music on a computer, which was not very common at this time. We followed this example. We made the decoder, that is the media player, available for commercial use at a very cheap price on PCs (not media players like MP3 players or smart phones). Our plan was to launch the encoder on the market at a very high price. However, an Australian student stole the encoder library and put it online for free — “on the dark web” as it's called today. Eventually, we dropped our prices massively and entered into contracts with shareware providers. With Thomson, we had the right industry partner on our side. At that time, there was a manufacturer in Germany called ITT Intermetall that at one point had a global market share of over 95 percent with their MP3 decoder chips. As part of an industry project, we assisted the company and profited from the success of the MP3 player through licensing revenues.

The mp3-Team in the year 2007
© Fraunhofer IIS
The mp3 team in 2007: According to Prof. Brandenburg, well-coordinated teams are extremely important. Brandenburg sees another recipe for success in the fact that teams at Fraunhofer are relatively stable for a long time. The Fraunhofer Institute for Integrated Circuits IIS, for example, is in a leading position when it comes to newer audio coding standards, with over 200 developers in this field. Another example is the Fraunhofer Heinrich Herz Institute HHI in Berlin.

The MP3 is a success story, of course, but if you talk to people involved in start-ups and new technology, you'll often hear that, in Germany in particular, the overall conditions are really not ideal for founders.

When I returned from the U.S., I started thinking about setting up my own business for the first time. Back then, starting a business meant borrowing money from your relatives and mortgaging your home. Neither was an option for me. I think that the conditions today are a massive improvement. As a founder, you can now get support from a number of different organizations, such as Fraunhofer Venture for example, or through partnerships or angel investors. However, if you look at the amount of venture capital available per company, you can see that this figure is still ten times higher in the U.S. That has not improved even in recent years.

We also have a cultural problem: in Germany, you have to be successful right from the outset. In the U.S., people say that it's not until you've run three companies into the ground that you really know your stuff. Of course, entrepreneurs in this country are also just as risk-averse as the investors.

You are involved in start-ups yourself through Brandenburg Ventures, and you invest in the development of new technology. Is there anything you'd like to share with us about your latest project, “PARty”?

It’s an old dream of mine: the perfect audio illusion. With “SpatialSound Wave,“ Fraunhofer IDMT has developed a system that provides realistic, three-dimensional sound effects through speakers. Now we have the even harder task of scientifically describing this natural audio illusion. Before now, evidence for the phenomenon only existed in the form “anecdotal evidence,” i.e. people describing a subjective sensation. Our research findings didn't always correspond with established schools of thought, but the majority of the results have since been reproduced in scientific studies.

Incidentally, PARty stands for “Personalized Auditory Reality.” One of our employees came up with this name, and I really liked it immediately. With PARty, our vision is to develop headphones that intelligently capture noise from our surroundings. The system knows where the user is and what is happening around them. Your ear not only hears a sound, but that sound really appears to come from the correct point in the space. The system matches the acoustic signals to the surroundings so that they sound as natural as possible. It's a bit like wearing a pair of glasses, you forget you have them on. And this experience can also be modified.

Can you give us an example?

Imagine you are inside a church, listening to a choir: the system would make it sound as though you were right there, in that church. Or if you were at a conference and wanted to talk to someone at a distance from you. Ambient noise would make that difficult. The system would be able to selectively turn up the volume for the people that you want to talk to and turn down the background noise.

Or, if there was music at a party, you could use the system to turn up the music and turn down the other background noises. If noise from a construction site was disrupting a presentation, the system could isolate that noise and turn it down. When in traffic, these headphones would fade out the ambient noise but pass on warning signals—these are just some of a number of possible applications.

Do you already have initial prototypes? Do the examples you just mentioned already work in practice?

We are a long way away from that. However, we are already able to demonstrate these auditory illusions under lab conditions, which is what we did recently at a presentation at a conference in Ilmenau, where we won many people over. That is a big step, because previous systems were limited in that they only worked for particular people or audio signals. And, returning to the issue of hearing and expectations, we soon run the risk here of programming ourselves to hear certain audio effects. Where one person might hear an effect, another hears no difference between that effect and a “normal” signal. That could be the result of a programmed sense of hearing. However, our system is obviously intended to enable the desired effects without prior training.

What are the greatest challenges?

There are a number of them. The system must be able to acoustically analyze the relevant surroundings in their entirety as well as the sound source. Artificial intelligence matches patterns to known or unknown sources. In addition, the sound sources have to be separable. This has been the subject of research for many decades, but the research is not yet advanced enough for our purposes. Another requirement is an internal file format that can capture the acoustics of the space, and also render and play back these acoustics. At the moment, we are recording impulse responses with a model head that rotates in the space. From this data, we interpolate values for various points within the space. Of course, this laboratory setup does not reflect actual practice, and we have been looking for alternatives for around three years now. At Brandenburg Labs, we have applied results from the group at the university, and we are also working with Fraunhofer IDMT. In a few weeks, we hope to reach a point where we can take a prototype to an exhibition for the first time. That is also a prerequisite if we are to upscale things.

What do you mean by upscale?

The process of getting research funding from the Ilmenau University of Technology took far too long for my liking. That’s when I said to myself: “I’m not done yet!” and I founded Brandenburg Labs. I invested some of my personal share of the revenue from MP3 licensing into the company. Since then, the idea that a technology like PARty is feasible has already occurred to other people. That’s why we need a team of at least 20 people if we are to have a realistic chance against larger organizations.

How do you intend to bring this technology to the market?

One of the lessons we can learn from the MP3 story is to concentrate on low-hanging fruit, but without taking our eyes off the greater goal. Back then, after all the consumer electronics providers had turned down our technology, we focused on professional systems. We could use a similar model to launch PARty on the market.

Now we've come back to the MP3 again. At the beginning, you mentioned that your team was actually very small. For your current project, you're also planning to compete with much bigger, sometimes international, organizations. What is different at Fraunhofer?

When developing the MP3, right from the outset, it was important that we were a team. I started out as a lone wolf. Then students came along very quickly. Thanks to public funding, Professor Gerhäuser was able to build up a team of additional research fellows, and we all collaborated well together. That’s extremely important. From my own experience, I know that it is not that easy to find teams like that. During my time as a postdoc at AT&T Bell Labs in the U.S., it was explained to me that as a postdoctoral researcher, I was outside the pecking order. By contrast, everyone else was trying to grab the spotlight as much as possible, so they would be in a better position come their next salary review. Obviously, this behavior contradicts the notion that “we’re building something together.”

Professor Gerhäuser’s greatest achievement was building a real team with the likes of Ernst Eberlein, Bernhard Grill, Jürgen Herre, and Harald Popp, a team in which he himself also actively worked with great enthusiasm. Because of this, we were able to sail through the critical stages really well, but we also did everything that was needed to bring the MP3 to the market. At Fraunhofer, teams also remain in place for relatively long periods of time. However, the Fraunhofer Institute for Integrated Circuits IIS is still in pole position when it comes to all the new audio coding standards, starting with MPEG Advanced Audio Coding (AAC). The Fraunhofer Heinrich Hertz Institute HHI in Berlin is another example of this, notably through its great success with the H.264 codec. But Fraunhofer HHI continues to be one of the leading organizations in this area even today.

We would never have found such an ideal combination of factors so readily anywhere else. Of course, there were commercial laboratories tackling similar topics back then as well, but unlike other institutions, successful departments reap the benefits at Fraunhofer, through the creation of new positions, for example. I'm not saying that this doesn’t exist at all anymore, but it is really very rare.

Are you still in regular contact with your fellow journeymen?

In-person social gatherings have become very rare in these coronavirus times, of course. But under normal circumstances, there are always plenty of opportunities. Recently, I've become involved in standardization again, and I meet old friends and coworkers through that as well. I have often experienced the strength of alumni networks and the kind of trust that's established when two people have both been to the same university, for example. Fraunhofer was a little late in recognizing this advantage, but there have been some developments in this area in the very recent past. Naturally, at Fraunhofer IDMT, we want to stay in touch with our former coworkers and, among other things, we regularly invite them to events.

Thank you for speaking with us, Mr. Brandenburg.