Digital images are composed of pixels, and the higher the resolution, the larger the data capacity. To solve this problem, lossless and lossy compression techniques are used. Lossy compression is widely used in the JPEG format and reduces data through preprocessing, DCT, quantization, and encoding. These compression techniques enable efficient image storage and transmission in a variety of fields, including medical and internet.
A digital image is a digital representation of a photograph or drawing. These digital images are composed of pixels, which are the smallest unit of dots, each of which is assigned a value to represent brightness, color, etc. In general, the higher the number of pixels, the higher the resolution, but at the expense of the amount of data stored. To store and transmit these digital images efficiently, digital image compression techniques are needed to reduce the amount of data.
There are two types of digital image compression techniques: lossless compression and lossy compression. Lossless compression does not use any data loss during the compression process, so it is less efficient, but it can be restored to the same image as the original. Lossy compression, on the other hand, removes redundant or unnecessary data, making it difficult to restore the same image as the original, but it can achieve a high compression efficiency of many times to thousands of times higher than lossless compression, making it a popular compression technique.
JPEG, which we commonly use, is a representative digital image file format with lossy compression. Compression in JPEG format consists of preprocessing, DCT, quantization, and encoding. Each of these steps is designed to reduce the size of the image while maintaining a quality close to the original.
First, preprocessing involves changing the color model and “sampling. First, the color model of the digital image is changed from RGB to YCbCr. The RGB model combines the three primary colors of light to represent the color and brightness of a pixel together, while the YCbCr model separates the information of a pixel into Y, which represents brightness information, and Cb and Cr, which represent color information. When the color model is changed from the RGB model to the YCbCr model, sampling is performed to extract only some values from the pixels.
The human eye is sensitive to changes in brightness and relatively less sensitive to changes in color, so the sampling extracts all of the Y, which represents brightness information, and only part of the Cb and Cr, which represent color information, to the extent that the human eye cannot perceive changes in color. This sampling is done by extracting the information of a pixel in the ratio J:a:b from a block of pixels in a certain unit. Where J is the number of horizontal pixels in the pixel block, a is the number of pixels of information from the first row of the pixel block, and b is the number of pixels of information from the second row. For example, if you sample color information in a ratio of 4:2:0, the first row of a pixel block with 4 horizontal pixels will extract 2 pieces of color information, and the second row will extract no color information. In the end, only two of the eight colors in the 4×2 block are extracted, reducing the amount of data.
After the preprocessing, a transformation called DCT is performed. DCT is a process that converts the information in the sampled pixels into frequencies and represents them as regularly separated data in the frequency domain. For efficiency, DCT is performed on a matrix blocked into 8 pixels horizontally and 8 pixels vertically as the basic unit. When DCT is performed, low-frequency components, which represent small differences in information between neighboring pixels, are gathered at the top left of the matrix, and high-frequency components, which represent large differences, are gathered at the bottom right of the matrix, and are represented as matrix values separated along the frequency domain. The cutoff value of the low-frequency component is larger than the cutoff value of the high-frequency component.
The next step is quantization. In the quantization process, the matrix value obtained by DCT is divided by a certain preset constant and rounded. In this case, the matrix value of the low-frequency component is divided by a small constant and rounded up, but the matrix value of the high-frequency component is divided by a large constant and rounded up to make it a value of zero. This is to reduce the size of the data by reducing the absolute value of the low-frequency components and removing the high-frequency components, considering that the human eye is sensitive to low-frequency components but less sensitive to high-frequency components.
Finally, the data is encoded. Encoding is the binary representation of the quantized matrix values. Huffman encoding is typically used for this process. Huffman encoding works by assigning fewer bits to represent data that occurs frequently and more bits to represent data that occurs infrequently. As a result, the Huffman encoding process can reduce the amount of data in a digital image without losing any data.
These processes allow digital image compression techniques to store and transmit data efficiently, and are used in a variety of fields. For example, in the medical field, lossless compression of high-resolution images is important for lossless transmission. Lossy compression techniques are also widely used on the internet for fast image transfer and storage space savings. These compression techniques help us to utilize digital images more efficiently, and they will continue to evolve in the future.