Big data analyzes the vast amount of information collected from our daily lives to create useful value, but its use raises concerns about privacy and state surveillance. The level of big data utilization in South Korea is still in its infancy, and technological advancements are needed to effectively utilize it.
In the fairy tale ‘Hansel and Gretel’, the two lost siblings made their way through the forest, dropping bread crumbs to mark the path they had taken. They were intentionally leaving their mark, but what if, unbeknownst to us, something is being left behind that represents us, marking the path of our actions? It sounds far-fetched, but with smartphone penetration exceeding 60%, CCTV cameras on every corner of Seoul’s alleys, and computerized networks in every institution, we’re leaving footprints of information everywhere, all the time, without even realizing it. Moreover, these ‘footprints’ do not disappear once they are captured, but are accumulated and stored in places we do not know. The quest to harness this data and create beneficial value has been going on for a long time, and we can already see its use in our lives.
The term big data may be unfamiliar to the uninitiated, but it’s closer to our lives than you might think. The moment we leave the house and swipe our transit card on the bus to work, we have information about our route to work and how we get there. The moment we swipe our credit card for lunch, information about our menu preferences is accumulated, and at the grocery store, information about our food preferences and spending habits is entered somewhere. This accumulation of data is what we call big data. The Davos Forum released a report titled “Big Data, Big Impact: New Possibilities for International Development,” noted that ‘researchers and politicians are beginning to realize the possibilities that arise from setting the course of this flood of data.’ The phrase ‘setting the course’ implies that big data is of no value if it is understood as just a collection of data. Therefore, big data not only refers to large amounts of data accumulated over a long period of time, but also includes techniques for extracting systematic rules and trends from that data.
There are many ways to find and analyze rules from large amounts of data. Natural language processing is a technique that mechanically analyzes human speech and converts it into machine language that machines can understand, from which useful information can be extracted and processed. The social networks we use are also a great source of data. Social network analysis is a technique that analyzes the structure and strength of connections among members of a social network to track the spread and influence of information. And cluster analysis is a method that combines information with similar characteristics to eventually extract a set of information with similar characteristics. There are many other advances in extracting information from data. One of the most popular data analysis solutions that synthesizes these is Hadoop, which is used by Yahoo, Facebook, and others.
A great example of how big data is used in our daily lives is the voice recognition feature on almost every cell phone. However, no matter how smart a machine is, it is limited in its ability to accurately infer the meaning, intent, and grammatical relationships of human speech. Apple’s voice recognition technology, Siri, analyzes the grammatical structure of a user’s commands by extracting repetitive language patterns from a database that organizes and categorizes the vast amount of text floating around the internet, and uses the results as the grammatical foundation of the program – in other words, the website itself acts as the program’s brain.
In the recent U.S. presidential election, Obama’s personalized campaign utilizing big data became an issue. The Obama campaign went to the trouble of collecting and collating information representative of voters, such as credit card and loan information, the type of car they own, the kind of newspaper they subscribe to, and their religion, to analyze and categorize their tendencies, and then deliver information to them via social media that would cater to their primary interests. For example, a housewife who has children in public schools and recently sent a tweet about organic farming would receive a green message from First Lady Michelle Obama via Twitter. This aggressive use of data helped Obama win key battleground states.
Google is also one of the most successful examples of commercializing big data. Google takes a user’s search history, the sites they visit, the content of their emails, and their use of Google Plus, the company’s networking service, to catalog their interests. It analyzes this to deliver ads that are most relevant to your interests through Gmail, Google’s mailing service. Google Translate is also one of the most accurate programs of its kind, thanks in part to the vast amount of data it has and how it uses it. By analyzing the text of millions of webpages in more than 20 European languages, finding grammatical rules and using them as a basis for translation, Google Translate enjoys an absolute dominance in Europe. The potential for big data is endless. From commercial applications such as restaurant recommendations to shopping mall product lists, to criminal profiling for police, preliminary research for government policy decisions, and even power analysis in baseball, big data and the methodologies to harness it are driving huge cost savings and new possibilities.
As the potential of big data has been recognized, many organizations have been trying to apply it, but surprisingly few seem to have gotten it right. Harper Reed, the former chief technology officer of the Obama campaign, has expressed concern about the overuse of the term “big data,” rhyming with “big data is bullshit.” He noted that few of those claiming to be working with big data have enough data to qualify as “big,” and he also pointed to a lack of public understanding of big data, saying, “The word ‘big data’ has come to describe analytics tools, not the data itself.
As the use of big data grows, so do concerns about personal information being used excessively for corporate gain. It has also been argued that the proliferation of big data could make it easier for the state to control individuals, leading to a “Big Brother” society. According to a study by the Korea Information Society Agency, the average American is captured on camera more than 200 times a day, and too many institutions, including insurance companies and banks, already have our information. For example, Edward Snowden, a former employee of the U.S. National Security Agency, exposed the breadth of government surveillance, stating that “the scope of mass intelligence gathering extends to the general public.” The anxiety and security concerns that come with the advancement of the state’s intelligence gathering capabilities have been a major obstacle to the spread of big data techniques.
In South Korea, big data is still a ways off from becoming fully established. In a survey of 240 companies and public institutions in Korea conducted by the Korea Telecommunications Promotion Association, 208 organizations, or 77.1 percent, said they utilize databases, but they scored only 57.1 out of 100 when it comes to the skills to utilize the data, and only 27.0 percent when it comes to using it to make decisions. This is because big data hasn’t been introduced in Korea for a long time, so the scale of data is small compared to developed countries, and the methodology for handling it is not yet technologically mature.
The robot in the movie “Ex Machina” boasts an almost human-like ability to think and speak for itself, and the software that forms the backbone of the robot is the world’s largest search engine. In other words, all the information that people leave on the Internet is the brain itself. It may seem unrealistic, but it reflects the reality that big data is used as an important indicator of thinking in modern society. It’s up to future generations of data miners to decide what value to extract from this growing mountain of information and how to process it. If used effectively, it will be able to contribute to humanity and society in almost every field: political, economic, social, and cultural.