Late at night, at the end of the day, it’s not uncommon for us to find ourselves glued to our smartphones despite being tired. This article explores why this happens and how big data technology is influencing our daily lives with personalized content recommendations.
You’re back home from a long day at work and it’s already 11 o’clock. You’ve had a busy and hectic day, and your mind and body are exhausted. As you ride up in the elevator, you reflect on your day. A quick conversation with my boss, a joke with a coworker, and a bunch of tasks to get done. The routine is always the same, but the emotions are always different. You arrive home, open the front door, and are greeted by silence. It’s a moment of freedom and loneliness that comes with living as a single person without family.
I take a quick shower and feel refreshed, the warm water caressing my tired body and relieving me of the tiredness of the day. After my shower, I sip a cup of hot tea and gaze out the window. It’s already nighttime, and the city lights are going out one by one. But my mind is still racing. I’m happy to be in bed and ready to sleep, but I can’t fall asleep. I realize that sleeping now is the wise choice for tomorrow, but before I know it, my hand is on my smartphone. The glow of the screen lights up the darkened room, and the little break at the end of the day begins.
I scroll through social media and before I know it, the clock strikes 12. I’m tempted to write for a moment, to document what happened to me during the day, but I resist the urge out of exhaustion. Instead, I think to myself, “This is the last thing I’m going to watch!” and check my favorite YouTube channels to see if they’ve uploaded any videos, but alas, there are none today. Still disappointed, I checked the personalized video list on YouTube’s main page. I checked the personalized videos list on the main page of YouTube and found that there were videos from other channels that were similar to the types of videos I usually watch. In fact, this is how I discovered the YouTube channel I’ve been watching lately.
But since I don’t watch every single video on YouTube, how does YouTube suggest similar videos to me? Is it simply based on the title or keywords of the video I’ve watched before? Or is there a more complex algorithm behind it? One of the latest trends in technology is big data analytics. Big data refers to data of a size that exceeds the limits of the processing methods previously used to collect and manage data. In general, it is information that is sent and received through various channels such as the Internet, social media, GPS information, weather information, etc.
Big data analytics is the process of analyzing these huge amounts of data, ranging from tens of terabytes to petabytes. But why is all this data important? The point of big data isn’t just to have tons of data. It’s about discovering meaningful patterns within it and making predictions based on them. For example, businesses can analyze customer behavior patterns to provide personalized services, and governments can identify trends in society at large and use them to shape policy. In other words, big data is becoming more than just a collection of numbers, but a powerful tool for predicting the future and solving problems.
YouTube’s personalized video feature above is an example of big data analytics. Facebook also offers personalized ads, and Amazon.com analyzes customers’ purchase history to understand their tastes and interests. Big data is not only used in the business world, but also in politics. During the 2008 U.S. presidential election, candidate Barack Obama utilized big data to help him win the election. Barack Obama was able to win the election by researching not only the basic demographics of voters, but also information related to personal preferences such as past voting behavior and magazine subscriptions. Other applications include DNA analysis in biotechnology, data-driven business management, and weather information analysis.
So how do we analyze such large amounts of data? Most of the big data analysis techniques apply the methods used in the fields of statistics and computing, but as the number of unstructured data has increased due to the recent activation of SNS, techniques such as text mining, opinion mining, and social network analysis are mainly used.
Text mining is a technique that extracts desired information by processing unstructured written data and is based on natural language processing technology. The general process of text mining involves the following steps: text preprocessing, semantic information conversion, semantic information extraction, pattern and trend analysis, and information representation and evaluation. In the text preprocessing stage, the text is divided into words or sentences and processed for further processing. The semantic information conversion stage separates meaningful data from the preprocessed data. The complex semantic information is simplified in the semantic information extraction stage. Once analyzed in the pattern and trend analysis stage, the analysis results are drawn and evaluated using visualization tools in the information representation and evaluation stage. This information is used to classify or summarize documents.
Opinion mining is a type of text mining that is used to analyze recent posts on social media to determine if there is a positive or negative preference for a particular topic. For example, “I really like the design of the MacBook” is written on SNS. In this case, the object of evaluation is “MacBook” and the target is “I really like the design”, so it is analyzed as a positive evaluation. In this case, a numerical rating is given to each word that expresses emotion, and then the rating is categorized according to the score.
Social network analysis is a technique that analyzes the connections between individuals or groups on social networks and the strength of those connections. If you think of individuals or groups as nodes and the connections between them as links, the connections between individuals or groups on SNS are represented as a graph. In this graph, the importance of a node is determined by looking at how many other nodes it is directly connected to.
In 2020, Facebook revealed that it processes more than 4 petabytes of data per day, and YouTube uploads 50 years of video in a single day. And IBM estimated the amount of data the world produces every day at 2.3 trillion gigabytes. It’s like a big bang of data, with an overwhelming amount of data being created every day. Big data analytics techniques are useful tools to help us find the hidden gems in between.