I'm sitting in AI

Alvin Lucier’s 1981 work ‘I Am Sitting In A Room’ is a classic of media art. Lucier recorded himself reading a text, then played it back and recorded it again. Then played back the recording of the recording and recorded it again. After a while his words were hardly decipherable, as the resonance of the room, multiplied again and again, took over.

AI-based super resolution algorithms try to upscale an image in a way that seems most natural. That includes generating details that were not present in the original image. There are quite a few super resolution algorithms available, from open source ERSGAN, SwinIR and BSRGAN to those included in commercial products such as Topaz Labs’ Gigapixel AI, Photoshop and Pixelmator. They are trained on different data and work in slightly different ways. But they all do need to ‘dream’, to infer some image details where there were none before. But what are those details? Are they different for each model?

In order to see that I let each AI model reinforce its idiosynchrasies, again and again, until its resonance became visible just as the resonance of the room was reinforced in Alvin Lucier’s work. I let each super resolution algorithms upscale an image, again and again, slightly (4-15%) zooming im between the steps. After a while, AI begins to upscale its own artifacts, with some algorithms trapping themselves in a solid color, while others continuing to generate imaginary details indefinitely.

I selected a photograph with a lot of details as a starting point.

This will be our starting image.

Zooming in by a typical non-AI algorithm would look something like this:

Now let’s see what kind of artifacts are generated by the AI-based upscaling algorithms.

Pixelmator, 15% zoom per frame.

Pixelmator upscales an image x3 instead of the more common x2 or x4. It is also special as it continues to generate this kind of dust detail in the video and could go on indefinitely.

Photoshop, 8% zoom per frame.
Photoshop generates these cristalline looking high-contrast structures, until it gets stuck in a solid color.

Photoshop, 11,5% zoom per frame.

With some settings or starting images Photoshop gets stuck in a texture and continues to generate it until it converges to black or white.

Gigapixel AI, 13% zoom per frame, standard setting. Here we can observe how at some point it seems to generate the texture of water as seen from above from a solid blue color. It doesn’t happen with other solid colors.

Gigapixel AI, 14% per frame. Gigapixel AI is special as it offers multiple models trained on different kind of images. This one was trained specifically for upscaling low quality images.

Gigapixel AI, 5% per frame. This one was upscaled using a model trained on CG art images.

SwinIR exceedingly increases the contrast of the image, and ends up (in this case of 4% upscale per step) with a sequence repeating indefinitely.

ESRGAN is pretty violent in what it does to images, darkening the image in the first few steps, then removing the details while emphasizing the edges. 6% zoom per frame.

But it also contains an option to include GFPGAN to upscale faces better. Here I’ll try using it on Ilya Repin’s masterpiece.

Cossacks write a letter to the Turkish sultan, Ilya Repin, 1891.

ESRGAN with 4% zoom per frame. The option to include GFPGAN to upscale faces is enabled. In reality that turns out to be quite a destructive option, generating similar but different faces in place.

Here is the same image with the same 4% setting using ESRGAN, but without GFPGAN.

Similar to ESRGAN, BSRGAN overemphasizes the edges, but does not darken the image, filling it with colorful gradients instead.

Technical details and code

The image is upscaled to whatever the particular model is trained for (2x, 3x or 4x). Then the image is downscaled to around 104%-115% (whatever produces the more visible effect) of the original image, and cropped at 100% of the original image, thus generating the starting image for the next cycle.

code tbd.