September 09, 2022

Meaning boxes. This is the way programmers and engineers usually call the areas with markings/tags in the images that train the so-called computer vision. In the official Visual Genome project article, the authors call this process of selecting specific regions of images as “canonicalization”. This interests me, as it creates relationships with the History of Art itself and also with the idea that we are talking about a group of believers dealing with materials that have been beatified.

In 2021, friend/programmer Bernardo Fontes and I thought of a way to “decanonize” these same training images that made Visual Genome possible. With the fundamental help of the philosopher Caroline Carrion, we wrote an essay for Rosa magazine about what this experience can tell us about AI and its way of “seeing”. Read the full text here (in Portuguese).

Some excerpts (Free English translation):
“When programmer Bernardo Fontes proposed a programming experience that would invert the game of computer vision, we immediately thought about reverse engineering processes. But also in an image of an eye capable of moving 180 degrees, so that it no longer sees what is in front of it, but the interior of that structure that makes the subject see.

The Python code used in the images of this visual essay is able to identify the selections of specific areas of the images that contain labeling — small boundaries in the image called, by engineers and programmers, “meaning boxes”. Once this identification is made, the command created by Fontes is to do the opposite of what a commercial AI would do: instead of highlighting these nobler regions of the images, erase them completely.

The results are the leftover images that are not important to computer vision — the carcasses, which can be understood, too, as the regions where these images did not receive human work (that is, where the tagging was not performed by the turkers). These carcasses may not even be useful for those who just want to understand what goes on in the scenes. But, by excluding what is considered important and highlighting everything that was disregarded, this experience helps us to understand something beyond the specific situations portrayed there.

Computer vision is much less the complex process of seeing and its multiple mediations, and much more the process of extracting, segmenting and decontextualizing. In the images of this essay, we see what professionals involved in the process of creating scientific knowledge — from programmers to turkers and computer engineers — consider spare, excessive, unnecessary.”

False Mirror/René Magritte/meaning box.