How does the way digital technology compresses and stores sound affect our music listening experience and senses?

H

This article explores how digital technology compresses and stores sound and explains how different music file formats, such as MP3, have changed our experience of listening to music. It goes beyond just providing convenience, but also addresses how technology affects our sensory perception.

 

The pace of technological development in the modern world is unimaginable. Our lives were very different about 10 years ago, and it’s hard to think back even further than that. Even now, new technologies and products are coming out every day. There are many ways in which our lives have evolved technologically in those decades, but perhaps a big part of it has been the ability to shrink and compress large things into small, or heavy things into light. The calculating machine that once filled a room became a computer, then a PC, then a home, then a laptop, then a bag. What used to fill tens of thousands of sheets of paper can now be carried on a USB memory the size of a finger. Among the many applications of compression, the one I’d like to introduce in this article is sound compression. Let’s take a look at how the sounds of instruments and human singing are transformed into files in a small machine.
Advances in technology don’t just provide convenience, they also have a profound impact on the way we perceive things. For example, in the past, you needed a specific place or equipment to listen to music, but now you can access it anywhere, anytime. These changes are more than just technological advancements. It’s redefining our culture, our daily lives, and even the human sensory experience itself. Music, in particular, has taken on a new dimension of value and meaning when it can be stored, reproduced, and transmitted as a digitized file, rather than simply listened to.
Since Edison recorded sound on a phonograph in 1878, the technology to store, reproduce, and play sound has evolved as rapidly as any other field. The vinyl record came along, then tape, and in 1982, Philips and Sony developed the CD, which allowed 74 minutes of music to be played on a single disk. In the 1990s, PCs and portable playback devices became popular, allowing music to be manipulated directly in the form of files like MP3s, and today, dozens of hours of music can be packed into a smartphone the size of your palm. This evolution hasn’t just been a technological advancement, it’s also changed the way we consume and produce music. Whereas in the past, music could only be enjoyed in certain places or on certain devices, it is now easily accessible anywhere, anytime. This has made music more intertwined with our lives, and its importance has increased significantly.
The ability to compress so much sound into a small space can be explained in two steps. The first is the application of digital concepts to music and the representation of music as digital information, and the second is the development of technology to compress digital files. The scientific analysis of sound is essentially the transmission of air vibrations, and in the past, trying to physically record the shape of these air vibrations, or sound waves, as they were, required large storage media, but with the advent of digital, the size and weight of the storage media changed dramatically. Beyond simply reducing physical size and weight, advances in digital technology have opened up the possibility of manipulating sound itself more efficiently and precisely. This, in turn, has made it possible to pack more information into a smaller space, dramatically increasing the universality and accessibility of music.
It’s easy to think of digital as simply “everything in 0s and 1s,” but fundamentally, the difference between analog and digital is whether you can represent all the numbers or not. It may sound paradoxical, but when we represent things in nature with numbers, we inevitably lose some accuracy. For example, we often refer to human height in 1 cm increments, which results in lumping all the myriad different heights between 161 cm and 162 cm together as “161 cm”. Even if we were to go down to 0.1 centimeter increments, we’d still only be able to describe nature as it is, but we’d lose precision because we only have a finite number of numbers, or storage space, to represent sound waves like keys. To put it more precisely, continuous values are represented by discrete values, so no matter how detailed a numerical representation of a sound is, we can’t perfectly recreate the original sound when using these numbers to reconstruct it. This is the difference between analog and digital.
In music as well as in digital technology, the criteria for applying digital technology, i.e., where to cut off the numbers, is determined by human characteristics. When more than 20 similar pictures pass by in a second, we don’t see them as independent pictures, we see them as moving pictures, which is why video consists of 24 to 30 pictures in a second, and that’s what we mean by 24 fps, 29.97 fps, and so on. If a screen at close range has more than 300 dots per inch, humans can’t distinguish individual dots and see them as connected lines, so a 326 ppi Retina display is considered sharp. Sound is similarly judged on two criteria. The first is at what level do you need to break down the pitch of the sound wave to make it natural for humans to hear, and the second is at what level of precision do you need to represent a piece of data. We use different values depending on the context, like when we display keys, sometimes we display them up to the decimal place and sometimes we display them down to one decimal place. This is called bit depth, and in music we use 8-bit, 16-bit, and 24-bit. 16-bit means that every sound is divided into 16 binary digits, which describes the state of the sound wave at each moment in time. The other is how many sound waves should be broken up in a second to store the value so that when it is played back, the person feels that it is not broken, which is called the sampling rate. Since the human ear can hear up to 20000 Hz, or 40,000 times per second, we need to hear at least 40,000 pieces of sound wave information per second to perceive it as natural. This is the sample rate information expressed as 44.1k or 96k in the music file. In other words, a music file that records a natural sound and stores 44,100 values in one second as a binary number with 16 digits each is called a 16bit/44.1k file.
If you look closely at the information of an MP3 file, you can see numbers such as 320kbps or 192kbps in the sound quality, which is called bit rate, which expresses the size of data transmitted in one second based on the above two characteristics. Naturally, the larger the data size, the more natural, or analog, it sounds to the human ear, and the larger the file size. Also, in the sound quality information, you may see CBR or VBR, where C stands for Constant and V for Variable. CBR is a method that uses a fixed bit rate value from start to finish, while VBR varies the bit rate value according to the nature of the partial sound. The difference in sound quality is hard to tell, but files created with VBR are generally considered to be more efficient. This gives users the flexibility to choose the sound quality and file size that suits them. This further diversifies the way music is consumed to suit individual tastes and uses, and enriches the listening experience.
In this digital representation of nature’s sound waves, we sacrificed some accuracy to get a sound source that can be stored as a file. In fact, the converted file itself is so large that a five-minute pop song can be tens of megabytes, but when compressed to the user’s desired sound quality, it shrinks to less than five megabytes. When we think of compression, we often think of organizing the contents of a document into a compressed file, but the concept of MP3 compression is actually more about discarding unnecessary information. MP3 dramatically reduced the size of the music that most people around the world listen to on a daily basis by discarding the inaudible parts of the sound, and also discarding sounds that are unrecognizable to the human ear. This is called Perceptual Coding, and it takes into account how our perceptual organs respond to compression. It’s an example of how advances in digital technology can go beyond being just a technical tool, and instead rely on human senses and perception to create more efficient and user-centered outputs.
MP3 initially took some time to become a de facto standard format on PCs because the idea was patented and patent fees had to be paid to utilize it. However, as technology improved, these restrictions were gradually removed, and MP3 became more than just a file format, it became an iconic symbol of the digitization of music. MP3 became more than just a file format, it became an icon for the digitization of music. With the advent of free and open-source music files, it’s now a mix of codecs and file formats, with WAV, FLAC, AAC, and other file formats gaining popularity in addition to MP3. However, MP3 still remains a familiar format for many users, due to its historical significance.

 

About the author

Blogger

Hello! Welcome to Polyglottist. This blog is for anyone who loves Korean culture, whether it's K-pop, Korean movies, dramas, travel, or anything else. Let's explore and enjoy Korean culture together!

About the blog owner

Hello! Welcome to Polyglottist. This blog is for anyone who loves Korean culture, whether it’s K-pop, Korean movies, dramas, travel, or anything else. Let’s explore and enjoy Korean culture together!