Do statistical representatives in evolutionary studies accurately reflect the complexity and tendency of evolution?

D

This article explains why using statistical representatives such as the mean, median, and smallest values in evolutionary studies may not adequately reflect the complexity and tendency of evolution, and concludes that it is inappropriate to set a single representative value when studying evolutionary processes.

 

When scholars conduct research, the process of analyzing data is essential. This data can be the results of experiments on evolution, or it can be statistical data. The same is true for the study of evolution. Because it takes too long to conduct experiments on evolution, statistical representations of the evolutionary process to date exist based on fossils and the creatures that evolved from them. If evolutionary data were ideally shaped like a normal distribution, it would be a great thing to study. But the reality is not so simple. Evolution has been, and continues to be, highly unpredictable. Stephen Jay Gould, author of Full House, argues that when studying evolution, we should look at the expansion and contraction of variation in the system as a whole, not the trend of change in a particular data point. In other words, we should look at the whole, not the change in a particular value. While I don’t readily agree with Gould’s assertion that evolution is trendless and random, I strongly agree that no sample is representative of evolution and that we should look at the system as a whole. Therefore, I tend to agree with Gould.
When conducting research, we prefer to determine representative values for our data. We like to think that representative values like the mean, median, and smallest values are representative of the data as a whole. However, in evolutionary studies, we shouldn’t make the fatal mistake of choosing a representative value. An evolutionary representative is an individual or species that most closely approximates the evolutionary trend. In the following, I’ll explain why you shouldn’t choose a representative value in evolutionary studies.
First, let’s take a look at the most commonly used representative value: the mean. While the broad definition of the mean includes the median and the smallest value, we’ll use the narrow definition of the mean to describe the characteristics of a statistical population in a single number. There are arithmetic, geometric, and harmonic averages, but we’ll use the arithmetic mean. The advantage of the arithmetic mean is that all of the data is involved in determining the average value. Unlike the median, which involves only the centered data, or the least frequent data, which involves only the most frequent data, when the data is arranged in order of size, the arithmetic mean involves all of the data in determining the representative value, so we tend to think that the mean is the best representation of the data. I partially agree with this. However, the advantage of the average, “all data is involved,” can be a fatal problem for using it as a representative value in evolutionary theory.
For example, consider the situation where we want to get a representative value for the GPA of Harvard University students. In this case, it would be fine to take a random sample of 100 students and use their average value as the representative value. But suppose you want to get a representative sample of people’s incomes around the world, say around 100,000 people. Can you still use the average as a representative value? If you include one person in that 100,000, say Bill Gates, his income will increase the average income of the 100,000 by about $10,000. In statistics, the arithmetic mean is not representative of the data when it is greatly affected by extreme variability. This is not a problem when trying to get a representative value for GPA because GPA is limited to a range between 0 and 4.3. Income, on the other hand, starts at 0 but has no upper limit. This means that while it’s easy to use the average as a representative value in situations where the data has a finite and predictable value, it can be problematic in other situations.
Consider the situation in evolutionary studies. The evolution of organisms is more unpredictable than the income of people around the world, and evolution can’t be limited to any range. So, does it make sense to use the average as a representative value in evolutionary studies? I don’t think so. In evolutionary data, the representative value is likely to be skewed by extreme data, and it is inappropriate to base evolutionary studies on such skewed representative values. Averages are meaningful as representatives when they are affected by more variability. For example, dinosaurs are often cited as representative of the Mesozoic Cretaceous period, but at no time in the history of the Earth have vertebrates outnumbered invertebrates in population or species. What we think evolution looks like in general is actually the result of special cases.
Second, let’s look at the median. The median is the value that lies in the center of the data when arranged in order of size. Based on the median, half of the data is larger than the median, and half is smaller than the median. Calculating the median is complicated if there are duplicate values in the data, in which case you can use a formula that calculates the points with a cumulative percentage of 50% to get the median. The median may be a more appropriate representation than the mean for evolutionary studies because it is less susceptible to variation by extreme data than the mean. However, there are also problems with using the median.
To illustrate this, consider the concepts of “walls” and “tails.” Walls are the limiting points. When graphing a set of data, the value that cannot exist to the left of any value is called the left wall. The “tail” is the opposite of the wall, meaning that the data can exist infinitely to the left or right. However, the extreme values are so infrequent that they look like tails that get smaller and smaller. In the case of the aforementioned study of people’s incomes around the world, zero is the left wall because no one has an income less than zero. On the other hand, income with no upper limit would have a right tail when graphed.
Using the median to represent the income of a few people is probably fine. However, in the study of evolutionary theory, is it right to assume that there is a trend in evolution and choose the median as a representative? No, because when we think of an organism evolving and becoming more complex or simpler in its structure, it cannot get any simpler than the basic structure needed to sustain life, which would create a “wall” in the data. If you look at the x-axis as the complexity of an organism’s structure, you will have walls on the left and tails on the right. Data with walls and tails will inevitably result in a curve that is skewed to one side rather than a normally distributed curve. It doesn’t make sense to use the median as a representative on a curve that lacks this symmetry.
On a skewed curve, the median is not in the center of the graph, nor is it the most frequent value. It’s unreasonable to consider a value that doesn’t exhibit these representative properties as representative of evolution.
Finally, let’s look at the least frequent value. If a set of data shows a trend, the least frequent value might be considered a good representation of the data. However, in the study of evolution, it is not appropriate. Suppose there is a trend in evolution. Does the existence of a tendency in evolution mean that most organisms have similar characteristics? I disagree, because evolution is a slow process that happens over a long period of time, and it is very difficult for evolution to happen. So are there many organisms that are close to a tendency, or are there many that are not? We don’t know unless we know how far our world has progressed. Maybe what the least common denominator represents is not life that has successfully evolved and clustered into a tendency, but life that has not yet evolved.
Some might argue that most life has succeeded in evolving because of the long time since life began. However, all life on Earth has never existed at the same time. And even if they did, we can’t say that they evolved under the same conditions. There are many cases where the same species evolved in different environments. So, taking all this into account, can we consider the minimum value as a representative of evolution? I don’t think so, because there are far more organisms that are close to a certain trend than those that are not, so if there is a trend in evolution, it is inappropriate to set the minimum value as a representative.
So far, we’ve discussed why the mean, median, and smallest values cannot be used as representative values for evolutionary studies. Evolutionary data is very different from what we are used to. There is no single data point that can be said to be representative of evolution, and it is pointless to try to determine a single tendency in evolutionary processes. As a result, it is impossible to set a representative value for evolution. The study of evolution is not something that can be studied through a single representative.

 

About the author

Blogger

Hello! Welcome to Polyglottist. This blog is for anyone who loves Korean culture, whether it's K-pop, Korean movies, dramas, travel, or anything else. Let's explore and enjoy Korean culture together!

About the blog owner

Hello! Welcome to Polyglottist. This blog is for anyone who loves Korean culture, whether it’s K-pop, Korean movies, dramas, travel, or anything else. Let’s explore and enjoy Korean culture together!