Alvin Lucier’s 1981 work “I Am Sitting In A Room” is a classic of media art. Lucier recorded himself reading a text, then played it back and recorded it again. Then he played the recording of the recording and recorded it again. After a while, his words were almost indecipherable as the resonance of the room, multiplied again and again, took over.
AI-based super-resolution algorithms try to upscale an image in a way that seems most natural. That includes generating details that were not present in the original image. There are a number of super-resolution algorithms available, from open-source ERSGAN, SwinIR and BSRGAN to those included in commercial products such as Topaz Labs’ Gigapixel AI, Photoshop and Pixelmator. They are trained on different data and operate in slightly different ways. But they all do have to ‘dream’, in order to infer some image details where there were none before. But what are these details? Are they different for every model?
In order to see that I let each AI model amplify its idiosynchrasies, again and again, until its resonance became visible just as the resonance of the room was amplified in Alvin Lucier’s work. I let each of the super-resolution algorithms upscale an image, over and over, zooming in slightly (4-15%) between the steps. After a while, the AI begins to upscale its own artifacts, with some algorithms becoming trapped in a single color, while others continue to generate imaginary details indefinitely.
I chose a photograph with a lot of detail as a starting point.
Zooming in by a typical non-AI algorithm would look something like this:
Now let’s see what kind of artifacts are generated by the AI-based upscaling algorithms.
Pixelmator, 15% zoom per frame.
Pixelmator upscales an image x3 instead of the more common x2 or x4. It is also special as it continues to generate this kind of dust detail in the video and could go on indefinitely.
Photoshop, 8% zoom per frame.
Photoshop generates these cristalline looking high-contrast structures, until it gets stuck in a solid color.
Photoshop, 11,5% zoom per frame.
With some settings or starting images Photoshop gets stuck in a texture and continues to generate it until it converges to black or white.
Gigapixel AI, 13% zoom per frame, standard setting. Here we can observe how at some point it seems to generate the texture of water as seen from above from a solid blue color. It doesn’t happen with other solid colors.
Gigapixel AI, 14% per frame. Gigapixel AI is special as it offers multiple models trained on different kind of images. This one was trained specifically for upscaling low quality images.
Gigapixel AI, 5% per frame. This one was upscaled using a model trained on CG art images.
SwinIR exceedingly increases the contrast of the image, and ends up (in this case of 4% upscale per step) with a sequence repeating indefinitely.
ESRGAN is pretty violent in what it does to images, darkening the image in the first few steps, then removing the details while emphasizing the edges. 6% zoom per frame.
But it also contains an option to include GFPGAN to upscale faces better. Here I’ll try using it on Ilya Repin’s masterpiece.
ESRGAN with 4% zoom per frame. The option to include GFPGAN to upscale faces is enabled. In reality that turns out to be quite a destructive option, generating similar but different faces in place.
Here is the same image with the same 4% setting using ESRGAN, but without GFPGAN.
Similar to ESRGAN, BSRGAN overemphasizes the edges, but does not darken the image, filling it with colorful gradients instead.
Technical details and code
The image is upscaled to whatever the particular model is trained for (2x, 3x or 4x). Then the image is downscaled to around 104%-115% (whatever produces the more visible effect) of the original image, and cropped at 100% of the original image, thus generating the starting image for the next cycle.
code tbd.