Fundamental Science: Artificial intelligence and the difference between making and using

August 25, 2022

Robert

Whether it’s a shirt that monitors the risk of a heart attack, a potato masher or even a screwdriver, every tool has two sides: one is about the skills to use it, the other, the skills to create it. When it comes to artificial intelligence, the same applies: there are several ways to use it. On the side of creation and improvement, the protagonist is clear: we are talking about mathematics.

Let’s start with a very simple question to understand how mathematics makes artificial intelligence possible: assigning a label to some objects. In the health area, for example, it is essential to distinguish images that indicate a healthy patient from those that suggest some abnormality. That is, it is about labeling an image as healthy or unhealthy.

Another example has to do with the online traffic of internet users, when we want to classify pages visited on the web. For example, personalized ads require understanding which products interest a particular person (books, shoes, toys, airline tickets). To this end, it is useful to label the sites visited by the user.

But here there is a fundamental question. The volume of data to be classified is gigantic, and having human labor is tremendously expensive, in addition to being eventually unfeasible. The solution? Just teach the task to a computer. Simple, no? But there is one small detail: how does a computer learn to label? This difficult question occupies an important space, whether in the scientific landscape or in conversations with my friend and collaborator José Miguel Urbano, walking through the campus of the University of Coimbra.

Semi-supervised learning is one of the strategies. Some images, with their respective labels, are shown to a computer. From this representative fraction, a rule is created that the machine starts to apply alone to those without a label. But again: how to create this rule? One answer lies in partial differential equations (which we will call PDEs).

Let’s think about the problem in other terms. An image, or website, is a point; the rule we are looking for is a mathematical object that for each point matches a label. And we know how this rule behaves on a very small set of points. In fact, in mathematics, rules that relate two sets by assigning a single element of one to the elements of the other are called functions.

That is, we want to extend the rule (function) from known to unknown points. There are many mechanisms for this. Let’s focus on a very popular method in mathematics: the cheapest way. The idea is to consider a mathematical object that calculates the price of each rule and choose the one that minimizes this cost.

In the simplest case, the cost to be minimized is known as the Dirichlet energy, named after the German mathematician Johann Dirichlet (1805-59). In this scenario, we obtain the Laplace equation, which celebrates the French scientist Pierre-Simon Laplace (1749-1827). After so much homage, it’s artificial intelligence’s turn: the solution of the Laplace equation that takes into account the (few) known labels is a sought-after learning rule. In other words, teaching the computer involves solving a partial differential equation.

But this is just one possible rule. And experience shows that it may not be the best. When the cost in question is the Dirichlet energy, the learning function we get can choose a single label and assign it to all points, without making any distinction between them.

One way to refine the method is to slightly change the cost. In mathematical terms, an exponent that appears in energy is changed: instead of squaring something, it starts to raise another exponent. Now the PDE resulting from the process is no longer the Laplace equation, but the p-Laplace equation. It seems like a change of nothing, but deep down it gives rise to another universe. And your solution offers another rule for teaching computers to think.

It is reasonable to expect the rule obtained to improve as this exponent increases — almost like a volume knob on a stereo. And since the greatest possible exponent would simply be infinite, we want to know if there is an EDP that deals with this case. The answer is yes (mathematics is really fun!) and the equation in question is called Laplacian-infinity.

Many important questions about these equations remain unanswered, and the mathematical community addresses each of these aspects. Because, after all, making artificial intelligence available on computers depends on knowing how to use natural intelligence, available everywhere.

Edgard Pimentel is a researcher at the Center for Mathematics at the University of Coimbra and a professor at PUC-Rio.

Subscribe to the Serrapilheira newsletter to keep up with more news from the institute and from the Ciência Fundamental blog.