Raman Lazar

MP3

History, Implementation & Application

Music. It is definitely an art that is in itself beautiful and full of idea, expression, and emotion. It has become something that almost every human being in this world cannot live without. It is an entity that has established itself as one of the necessities of life.

Music can take many forms and be applied in many locations, as well as stored in many types of media. In computer science, computers can be used to record, play, manipulate, and even create it. What’s also true in computer science is that programmers and users are always looking for the most efficient way to complete a task using as little time and effort, letting to computer to do the grunt work very quickly. One task is storing digital music efficiently, maintaining quality yet also minimizing storage space. Take that concept, combined with the idea of music, and you will have what some might say is the most popular music innovation discovered in the realm of computer science: MP3.

MP3 is the most efficient and qualitative way to store digital music on computer media storage devices. It started out as a fantasy, but through the recent few decades has become a reality that has got everybody listening to music and using computers involved in. We will take a closer look at MP3 and what it is exactly. Starting with the history of its development, we will explore just how you take a raw, high quality sound file and compress it down to around one tenth of its original size, without any significant loss in quality. After going through the algorithm that makes this magic happen, we will look at the current state of MP3’s and some of the many applications and uses of them. By the end of this report, we will all understand much more about this extraordinary compression format and might find new ways to make MP3’s a part of our lives.

MP3 is the shortened name for MPEG-1 Audio Layer III and is an audio subset of the MPEG-1 industry standard developed by ISO (the Industry Standards Organization). MPEG-1 Layer III is an audio only compression component and is a direct descendant from MPEG-1 which is low-bandwidth video compression, the type that is used over the internet and MPEG-2 which is a high-bandwidth audio and video compression which is the standard for DVD technology.

So when did it start? Mp3 had been in development far before it actually gained its well known popularity in the recent years, starting around the late 1990’s. Back in the mid-1980’s, in Erlangen, Germany, a company called the Fraunhofer Institute started a project to develop a new audio type compression to revolutionize the way we listen to music today. A team of scientists was put together and with the help of Dieter Seitzer, a professor at the University of Erlangen, set out to create a new type of high quality, low bit-rate audio codec. This new codec was to be part of the MPEG-1 standard for digital video and audio but really needed to be sophisticated enough to surpass all current audio compression formats in size versus quality.

In April of 1989, for the team of scientists working on the project at Fraunhofer Institute, their long hours of work had paid off. Finally, an algorithm was developed and perfected enough to receive a patent in Germany for the newfound audio compression. It had not been integrated into the MPEG-1 standard yet though. It would still be a couple of years until it would be officially part of the MPEG-1 specification standard for audio compression.

Since MPEG itself wasn’t really made a separate committee until it established itself in 1988 with the title “Moving Picture Experts Group,” it took a couple of years for the new audio compression to become standardized. MPEG became subcommittee under the bigger organization ISO/IEC (International Standards Organization/International Electrotechnical Commission) which slowed the process down a bit. Finally though, in 1992, the new audio compression was submitted to the International Standards Organization (ISO), and integrated into the MPEG-1 specification.

Remember that this is all happening in Germany now and America still hasn’t really been exposed to this new audio codec just yet. It would be not until 1996 that Fraunhofer would apply and receive a patent for the MP3 compression here in the US, which would then complete the development and standardization of the MP3 format. Now that the codec was established, we would need applications to play and manipulate files that have been encoded in the new MP3 format.

Not surprisingly, Fraunhofer would also be the first to create an application that would play the MP3 files, but undoubtedly it would not be the best. His application, created in the early 1990’s, turned out to be very inefficient and not very widely used. In 1997, a developer at Advanced Multimedia Products named Tomislav Uzelac created the AMP MP3 Playback Engine. This is where the first modern and robust MP3 player came from. It wasn’t long after this engine starting circulating on the internet that 2 college students, Justin Frankel and Dmitry Boldyrev, got a hold of it and thought about adding a nice GUI interface along with some bells and whistles to make it more practical. They called it "Winamp" and hosted it as a free download from the internet in 1998.

Which bring us to today, when since 1998, people have been downloading this or many types of other MP3 players from the internet and sharing music through file sharing programs. This has created one of the biggest controversies in computer science, over copyright infringement, what’s legal and what’s not, and a plethora of other issues and legalities. Although this revolutionary audio compression has started some issues dealing with the laws in the US, there are still many applications which make convenient and practical use of MP3 compressed files which I will get to later. Now, let’s take a closer look and the “how” that makes this compression format so efficient.

We are now familiar with exactly what and MP3 file is: a sound file, that started out as RAW digital data to represent sound signals and frequencies, which has been put through an algorithm to shrink down the amount of storage needed for that sound data that we perceive when listening to. The key word here is “perceive” and that’s the same key that the algorithm uses to complete its extraordinary task of compression.

MP3 is what’s called a “perceptual” codec. This means that it’s only concerned about the data that humans will actually perceive when using their perceptual senses (in the computer science case, mostly sight and hearing). Otherwise, if there is data in the RAW, uncompressed source that will never be heard by humans, why bother wasting space to store it right? Not surprisingly, this actually turns out to be the case, especially with music recorded on CD’s and at sound studios, concerts, etc. What happens is that the recording equipment is so high quality that it is made to record at a much broader frequency range than humans can hear, as a qualitative measure. That data is then also pressed onto the CD as sound data, and along without using any compression, accounts for the huge size that RAW sound data files makeup on CD’s.

Turns out there is a whole area of research about this fact of human perception of sound called psychoacoustics, and a great deal of information is known about it. Some examples are that if there are two notes played which are very close or similar in frequencies, the human will only hear one of them, or that if two notes are played at the same time and one is much louder in decibels than the other, the human will only pick up the louder one. The wonderful thing about psychoacoustics is that most of it can be represented using mathematical formulas, tables and charts, which in turn can be stored in a computer quite easily.

This is the heart of the MP3 algorithm: these psychoacoustic tables and math formulas stored in the encoder that make the algorithm work. MP3 compression is done in a two-pass fashion: first the raw data is analyzed and compared to the tables and math models so as to drop all non-perceptual data, and then the following perceptual data is compressed by traditional means of taking out redundancies in it. These two compression types are categorized as “lossy” and “Huffman” (or “lossless”) compression, respectively. Huffman compression is not something I would like to go into detail about, because it is used on most compression formats like zip, tar, rar, etc. which most of us have been familiar with, and it is not very unique to the MP3’s.

Lossy compression is the idea behind the MP3 algorithm. Lossless compression is where the file is identical to the original after being decompressed, as needed for zipping or tarring executables or archives of programs, text, etc. In these cases, even one byte being dropped or changed will lead to a program not executing properly, which is unacceptable. On the other hand, in lossy compression formats the original doesn’t have to match the newly decompressed file, because the data lost or changed might be meaningless to the user, which is the case in sound files that contain much unnecessary data.

Before we go into more detail, we need a solid understanding of the makeup of an actual MP3 file. A regular size (around four megabytes) MP3 file consists of frames, like the ones on a movie strip. Each frame contains a 32-bit header at the start followed by the actual audio data and each file consists of an ID3 tag (this ID3 tag can be at the beginning or end of the file). Each frame only takes up a fraction of a second, exactly 26 milliseconds. This works out to 38 frames per second, and this measurement is always constant, regardless of bitrate (higher bitrates account for higher frame sizes). The table below shows the fields of the header and their significance.

Position	Purpose	Length (bits)
A	Frame Sync	11
B	MPEG Audio Version (MPEG-1, MPEG-2, etc.)	2
C	MPEG Layer (Layer I, II, III, etc.)	2
D	Protection (if on then checksum follows header)	1
E	Bitrate index (128 kbps, 64 kbps, etc., lookup table used to specify bitrate)	4
F	Sampling rate frequency ( 44.1 kHz, etc., determined in lookup table)	2
G	Padding bit (on or off, compensates for unfilled frames)	1
H	Private bit (on or off, allows for application-specific triggers)	1
I	Channel mode (stereo, joint stereo, single channel)	2
J	Mode extension (used only with joint stereo, to conjoin channel data)	2
K	Copyright (on or off)	1
L	Original (off if copy of original, on if original)	1
M	Emphasis (respects emphasis bit in the original recording)	2
		32 Total
	A \| B \| C \| D \| E \| F \| G \| H \| I \| J \| K \| L \| M \| Audio Data……..	Header Bits

Bitrate is measured in kbps (kilobits per second) and that is the deciding factor as to how much data will be stored per one second of sound from the original file. Samplerate is a measure of the frequency with which the signal is stored, measured in kilohertz, or thousands of samples per second, and the default for CDs is 44.1 kHz. In general, stored audio frequencies cannot be higher than half of the sample rate. For 44.1 kHz, the maximum a frequency will reach is 22.05 kHz, which is at the end of the human perception range. These two settings will be thought of as the “tolerance level” for the lossy compression the algorithm uses. Now understanding the two compression parts will be more understandable.

As I explained earlier, the algorithm works in two passes. The first pass is the lossy compression and is more complicated and time consuming than the second. Initially, the encoder receives a raw audio signal and breaks it down into the frames. Now each frame is an audio signal with many different frequencies present in it. The encoder now keys in on the bitrate and samplerate the user has chosen to encode with. We can refer to this setting as a tolerance level, because it will be the threshold for the data (or frequencies) that the encoder will throw out totally, or just allocate less storage space for that particular frequency sample.

Each frame is analyzed using the psychoacoustic models and formulas stored in the encoder to determine which samples and frequencies are will be most coherent to the listeners, down to ones that will be imperceptible. Keeping in mind the amount of data storage available per second of sound (the bitrate), the sound samples and frequencies are allocated storage space accordingly, more space for the most perceptible, less for the least, and no space for the imperceptible (they’re completely dropped). In the case that there definitely needs to be more room for samples and the current frame is full, the leftover data is then put into a frame that has spare room. This is because there are sometimes cases where all the necessary sound and data is encoded and the frame still has room to spare.

Each frame is then given a header with all the necessary information and linked together to form one long bitstream. The lossy compression is completed and all the unnecessary data has been discarded. This marks the start of the second pass of compression which is really a type of lossless compression. The long bitstream is ran through what’s called Huffman code, which in turn takes out only redundancies in the bitstream, without changing anything that can’t be restored. This usually saves about 20% of total file size.

Of course it’s much more complicated than that, especially the psychoacoustic methods of finding out what samples are heard, when they’re heard at certain volumes, etc. That is the most complicated and time-intensive step. The Huffman compression takes almost no time compared to the first step, the lossy compression.

The user can’t directly choose the specific samples and frequencies to throw out of the final product directly; the encoder does that using the psychoacoustic tables and mathematical formulas. What the user does specify is the samplerate and bitrate, which will do that same thing indirectly because by choosing a lower samplerate, this causes the encoder to throw out the frequencies out of the samplerate range. The lower the samplerate, the lesser the range of frequencies the encoder will store as perceptual frequencies. Also the bitrate is a custom setting which enables the user to specify more or less storage space for each second of sound. The higher the bitrate (more storage area, bigger file sizes) the more the encoder will apply the compression with leniency, and drop less samples. The lower the bitrate (less storage area, smaller file sizes) the more strictly the encoder will apply the compression and drop more samples and allot less space for them.

One final note is one selection of stereo or monophonic effects, which are compensated for in the psychoacoustic tables. Stereo effect means that at any moment, the sound volume or volume of samples can be increased or decreased out of one speaker more than the other, like the fading of sound from one side of the speakers to the other. This can also be efficiently compressed (as opposed to stereo being twice as large a file as monophonic) using the encoder. Very high and very low frequencies aren’t very noticeable coming from different locations by human ears, the algorithm combines the high and low frequencies into one track, instead of two like the frequencies in between. You would use the setting of “joint stereo” so as to keep the file size to a minimum and keep frequencies that are distinguishable in stereo intact, as 2 separate channels.

When all this complexity is done for each frame, they are linked up together and ran through a Huffman compression. We come out with a MP3 file around one tenth the size of the original sound file, with minimal quality loss. The decoder works very efficiently and doesn’t require nearly as much processing power to decode (or play) the MP3 file. As most computer programs though, encoding takes practice. The user must strike a balance between sound qualities and file size before completing a decent MP3. The bitrate, samplerate, and stereo setting must be adjusted to suit the users need or quality versus file size.

Now we have encoders, MP3 files, and decoders/players; what next? What can the concept of such a small file that can store CD quality sound do for us? The answer is: many things. It has been established now that we can store CD quality music using only around four megabytes per track, so already it is efficient and fast in terms of saving space and decoding or searching through the track. We can store many tracks on our hard drives using minimal amount of storage.

This opens a new door of opportunity. Instead of using up 600 megabytes to store a CD on a computer hard drive, now it only takes around 40 megabytes using MP3’s. In combination with our hard drives today that store insane amounts of data (200+ gigabytes), that’s a lot of albums we can store. So there is one very nice application: archiving one’s CD collection in MP3’s on his/her hard drive with instantaneous access to any song without need to change CD’s, look for the right track, or anything that comes with having a big collection of CD’s. All one needs is a decent sound card and a nice set of speakers and they have a full entertainment center on their desktop.

Undoubtedly, the internet now comes into play. How long does it take to transfer a RAW CD track over the internet using a 56K modem? Too long (roughly 3 hours, I tested it). How long does it take to transfer an MP3 over the same line? Around 15 minutes, and that turns out to be ample time to start a whole revolution of a new ideas like file sharing and MP3 broadcast stations.

Although it is technically illegal to distribute MP3’s unless authorized by their creators (song writers, not encoders), that hasn’t stopped a plethora of new files sharing applications from appearing on the internet, starting with one of the first popular ones, Napster. Since transmitting a four megabyte file takes only several minutes, starting in the late 1990’s, file sharing utilities exploded. This let anybody with access to the internet upload, download, and trade MP3’s, without the authorization of the artists. All one would need to do is open of these file sharing applications, type the name of a song, artist, album, or genre and they could download CD quality MP3’s, while also being able to take those MP3’s and turn them into RAW CD tracks and create CD’s like that. This is illegal, but yet it remains a reality and unfortunately one of the biggest applications of MP3’s.

Another application is web radio stations, which actually is legal and has to do with basically letting people tune into a live broadcast of music, in MP3 format. One example is www.Shoutcast.com, where you can pick a web radio station hosted by internet users and listen to live MP3 feeds for free, just like a radio station. In addition, depending on your internet connection speed, the user can tune into stations where the speed of their internet connection won’t be problematic. A modem user at 56 kbps can tune into a web radio hosting a 64 kbps MP3 stream with better-than-radio quality sound (not as good as CD though) without much bottleneck. Before MP3 compression, since sound data was so hefty, transferring sound on the internet was quite difficult, especially for slow connections. Now using MP3 compression, it can be done efficiently and effectively.

Many musical bands have found MP3’s to be of use to them. Starting from scratch, bands with not much money can record some of their songs and create MP3’s versions. The bands can upload those MP3’s to those file sharing programs and have people sample their music. This will help get the name of the band out faster. They can also have their music played on the internet web radio stations talked about earlier also. This could be helpful for newly found DJ’s or comedians or even talks and seminars which could be put on MP3’s and distributed over the internet.

Not long after the file sharing utilities got popular, makers like Sony and Panasonic started integrating MP3 decoders into their portable Discmans and car stereos. Companies like that figured MP3’s were getting so standard that it might be nice to have a portable CD player that could also play a CD full of MP3’s on it. This is efficient because what they advertised was that the new car stereos and Discmans would be virtually “skip-free” when used in conjunction with MP3’s. Since an MP3 is so small compared to the size of a CD track, the player could just read in the whole MP3 in memory so as never to be reading from the disc, resulting in never skipping. Most Discman now are MP3 compatible and car stereos that play MP3’s are also very nice, yet cost a little bit more than the regular CD only playing ones.

The last application to be mentioned would be the quickly growing portable MP3 player industry. They are little (some about the size of a lighter) hand-held devices that connect to a computer to have the MP3 file directly transferred onto the storage medium of the player. Starting in 1999, these little toys appeared out of nowhere, but with a very small amount of storage. Diamond Multimedia came out with the Rio, RCA came out with the Lyra, Creative Labs came out with the Nomad, yet all they had was from 32 to 64 megabytes of flash memory, either installed or on compact flash cards. These proved very convenient because transferring MP3’s didn’t require CD’s, and the MP3 players required little time to transfer MP3’s back and forth from computer to player. The only drawback was that they could only store about 10-15 MP3’s on the flash memory.

Apple released the IPOD a couple years later in 2002, which had an actual hard drive in it, holding over 20 gigabytes of data. This year Apple released a smaller version of the original IPOD called the mini IPOD, which could only hold four gigabytes but has a longer living battery and the size of the unit is smaller than the original IPOD. As for the ones that still use flash memory like the Nomad and Rio, their storage capacity has increased considerably to 512 megabytes. That’s over 100 MP3’s in one device smaller than a lighter!

The little gadgets are becoming more and more popular. Nokia has even come up with a cellular phone (model 3300) that has the ability to play MP3’s integrated into the phone. The ring tones can be MP3’s or the user can just sit there and listen to MP3’s downloaded onto the phone. Someone might be listening to a song, have the phone ring (which mutes the MP3), have a conversation, and go back to listing to the MP3. The creativeness keeps getting unbelievable.

The applications are never-ending. Everywhere digital music has an application, it seems like there’s a way to integrate MP3’s there, too. The sales are still strong for the music industry, but the medium has changed. Pretty soon CD’s will only be used to rip the music tracks onto your hard drive and encode them as MP3’s. The reason is simple: MP3’s can go many places that CD’s cannot anymore. MP3’s are more versatile now. They are found in phones, over the internet, on hard drives, in palm pilots, in MP3’s players wrapped around a jogger’s wrist and much more. Ten years ago, no one had heard of a DVD and video stores only rented out VHS tapes, but now that’s all changed, and most people have moved over to DVD’s being higher quality, smaller, more versatile. The same could be said about cassette tapes versus audio CD’s ten years ago.

That’s where MP3’s are going. No, they’re not going to the video store, but are becoming the way most people are listening to music on their computer, in their car, at the gym, etc. From the history of this revolutionary codec, to the complexity of the methods of its compression, to just a few of its applications in today’s world, it is safe to say now we have a much better understanding of the MP3. There definitely might be a lot of controversy behind MP3’s causing problems with the music industry and recording labels along with music artists, but I doubt that will be the end of it. No matter what, MP3 compression is one of the most efficient that computer science has ever seen and there is no reason to gripe about the effects of it. MP3’s have been very beneficial to people, musicians, and companies like Sony, RCA, and Creative that are pushing them into what is the standard way to listen to music today. They give us a better way to listen to music, something every one of us needs from time to time.

Bibliography

Digital Audio Systems, das.iocon.com, viewed on3-3-04, 1998-2003,

http://das.iocon.com/das.shtml

Fraunhofer Institute, www.iis.fraunhofer.de, viewed on 3-3-04, 1998-2004, http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html

Fries, Bruce, “The MP3 and Internet,” TeamCom Books, first edition, 2000, ISBN-1-928791-10-7.

Hacker, Scott, “MP3: The Definite Guide,” O'Rreilly, first edition, 2000, ISBN-1-56592-661-7.

Jones, Christopher. “MP3 Overview,” Hotwired.lycos.com, viewed on 3-3-04, July 2000,

http://hotwired.lycos.com/webmonkey/00/31/index3a.html

Rathbone, Andy, “MP3 for Dummies,” IDG Books Worldwide, first edition, 1999, ISBN-0-7645-0585-8.