In its latest showcase, technology giant Nvidia unveiled a new product called Omniverse, a “platform for creating and operating metaverse operations.” The demos shown were nothing short of impressive, with computers capturing data from cameras and creating 3D simulations of streets and environments. The promise is that this technology could revolutionize the AI-driving industry.
There is a reason why we haven’t quite cracked self-driving cars yet. We are getting there, but there are still some kinks to iron out of the system. That reason is called reality.
I’m often reminded of the words of philosopher David Hume, who once posited that the universe has no particular order; rather, it is us humans who create an illusion of order to make sense of the world. Maybe he’s right, maybe not, but we can all agree that reality can be unpredictable, to say the least.
Math is probably the most powerful tool in our arsenal as human beings, engineers, and as scientists. With math, we have been able to model everything from a spacefaring adventure to the moon to human behavior. But there is a limit to what we can achieve, and it’s not because of a problem with math itself; it’s the nature of the world as a complex system.
The Multivariate Phenomenon
All the science points to a very clear conclusion: climate change is happening. We know that this change is related to pollution, but every single time someone has predicted the point of no return or the complete melting of the ice caps, they’ve been mostly wrong.
So, certainly, there must be a problem with science, right? Maybe there is no climate change? Well, not quite. As the saying goes, correlation does not equal causation, and what that means is that we may see a relationship between two phenomena (like pollution and global warming) but that doesn’t mean we can fully explain the interaction or predict the outcome.
Causality is rarely linear. In truth, most phenomena can be attributed to a series of causes. For example, not everyone who smokes develops lung cancer, but we know that people who smoke have a higher chance of developing the disease. So there is a causal relation there. It’s just that we don’t have all the pieces of the puzzle to say: “Well, if you smoke and X, Y, and Z happen, then you will get sick.”
A simulation can only recreate the world as it’s filtered by our capabilities. If I feed data to a system, the computer isn’t going to stop midtrack and tell me, “Hey, you are missing some variables, so I can’t run the simulation.” It will just output whatever is the natural consequence of the underlying model.
So the first problem we have to tackle when simulating complex systems is accurately assessing which variables are the most significant. For example, acculturation, the process by which our thoughts are influenced by the media we consume, only explains 6% of the change in attitudes. Get rid of all the TVs in your house and we’d still have a whopping 94% of variance unaccounted for.
Is There Enough Data?
COVID was another great example of how simulations can fail dramatically. During the first few weeks of the pandemic, some simulations estimated a death rate of at least 10%. One out of every ten infected people was going to die. Fortunately, that wasn’t the case, but why did the scientists miss the mark by such a wide margin?
Because at the time we barely had any data about the disease, and whatever data we had was heavily biased. Sampling came from hospitals, which most people were avoiding in the first place because they were overrun. So we oversampled people who’ve had lung complications and were at risk.
I will never grow tired of saying it: a model is only as good as the data it’s based on. For complex systems, there are a myriad of issues we have to tackle: unstructured data, biased samples, missing data, and uncalibrated equipment are just some of the most common.
Grab three different sociological studies about happiness and you’ll find massive differences in the results. That’s to be expected considering that sometimes different sources provide different information. For example, NPOs often criticize governments for sharing incomplete or skewed data.
Other times the problem isn’t with the data gathering, but that the data isn’t there. No one could have simulated a global pandemic, plus an accident in the Suez Canal, plus a surge in cryptocurrency, plus one of the worst droughts in recent history in Taiwan — all factors that played on the massive chip shortages and surge in computer prices we experienced.
At least we have the data now, but what are the odds of these events happening again at the same time?
Are Simulations Useless?
I know I’m painting a very bleak picture, but the truth is that simulations are extremely important. For hundreds of years, we have simulated situations via experiments, equations, and computer models to try to understand how the world works. And for every fluke, we’ve also managed to learn a little bit more.
This is more of a cautionary tale, a reminder that a simulation is neither witchcraft nor séance but a simple artificial recreation that plays the script it has been given. We should never take the results of a simulation at face value. As any data scientist knows, we always have to look beyond the results and see how we reached them. The method is even more important than the outcome.
Having said that, the future is bright for simulations. With the Internet of Things and Big Data, we have grown exponentially in our ability to gather data, opening the venue to all manner of simulations that we thought impossible a few years ago: warehouses, deliveries, market trends, political action.
Excluding the simplest of situations, simulations will never be devoid of error, but we can keep working on minimizing that margin of error. Will we ever be able to accurately simulate complex systems? Of course. It’s just a matter of time.