I used an A.I. art generator and it was...

The weird, the beautiful, and the educational

Feb 07, 2023

What’s this about?

AI art generators are extremely popular right now, easy to access, and free to test.
I decided to use the most popular one to see if the hype and fear was valid.
The results are so strange that it raises many suspicions about what’s really going on with the system behind the scenes.
I’ll give you some hints about my “crazy theory” and show you examples of what the AI created for me and others.

There’s always weird stuff

You start by watching other people spam insane or poorly-conceived prompts. Like this one, which the poor guy was trying over and over again:

“beautiful website for bike and car repair service on call where we are displaying the problem of local mechanic and how our service will help in fix those problems”

Was this a Baby Boomer trying to avoid hiring a website designer for his small business? The results were very mediocre, considering he specified things like “displaying the problem of local mechanic” and the requirement to be “on call”. The images generated by the AI bot (in less than a minute) did indeed look like website concepts, but the text was totally incoherent gibberish. Some had pictures of a guy repairing a bike, while others had a picture of a car. They always had some aspect of what he wanted, but nothing close to the full request.

And there are amazing possibilities

On the other extreme, we have this truly bizarre and specific prompt, with its jaw-dropping results:

“60mm kodakchrome color + synthology + 3000’s nicheartunion + face and torso + armenian woman with a enormous glowing ornamental headpiece + in a gaudi pink garden + 00’sfuturism + utopia + 8k realistic, high dynamic range + analog colors + retro sci fi + shot by larry sultan”

Yes, this was one of the results I saw. I decided to save it immediately, so that I could write about it. Here’s the bottom left image, which the user manually ordered the program to upscale, meaning it would have even more detail:

Yes, the quality blows our minds. Could this really be generated from scratch by a machine in less than a minute? Do robots understand beauty, symmetry, architecture, and even depth of field blur? This isn’t a photograph, after all. The lighting is pretty much consistent everywhere, from the shrubs in the background to thin reflective strap across her collar bone. If you look very closely, you will notice that her eyes are not aligned; they’re off and looking in different directions. That alone would make it unusable as a finished product in a magazine, but beyond that it’s almost flawless, which raises the question of how the hell those keywords work.

Notice that her face is totally different than the one in the bottom left corner of the first sample result, too. When you first input a prompt, the program will spew out a four quadrants of results, showing different varieties. You can then choose which one you’d like to enhance or make variations of.

So many questions remain. Who is Larry Sultan? What is 3000’s nicheartunion?

My tests

I decided to put the AI through my own little gauntlet, for science.

My first AI image test. I wanted to see if “AI” was another way of saying “Google Image Search with Photoshop Filters”

I asked the program to create an image of the videogame character “Solid Snake” in the location of “Shadow Moses Island” (the arctic military base which served as the setting of the 1998 PlayStation game, Metal Gear Solid), in the style of my childhood favorite comic book artist, Joe Madureira.

The test here was to see if my theory was correct: was the “AI” mostly just a Google image search with Photoshop filters applied? The result were enlightening, because I knew that there was already a picture of Solid Snake by Joe Mad online:

A real, old drawing by the artist Joe Madureira of Solid Snake.

By choosing a prompt with a similar counterpart on the Internet already, I wanted to see how much the “AI” borrowed from the source I had in mind. As you can see, there’s a clear influence. In each picture, Solid Snake is looking to the left side of the image, with an expression and posture similar to the real stuff from Joe Mad. But none of the images depict Shadow Moses Island at all. This is interesting because Joe Mad’s art did not include it either. Instead, the program guessed that I wanted to see a jungle island with shadowy figures, which is why we have dark blotches of vines, plants, and rubble instead of the cold blue blizzard of the game I remember. This is a genuine misunderstanding on its part, showing that it searches for associated words when it doesn’t know a specific term. There are many pictures of Shadow Moses Island online, and I would think they should be associated with Solid Snake, but the AI couldn’t make the connection and went with its own misunderstanding.

The upscaled version of the result. Very strange things start happening here.

Why does Solid Snake have weird sunglasses, I wonder? Perhaps because in Metal Gear Solid 4 he had a cybernetic eyepatch, which the AI mistook for sunglasses, although the color is wrong either way. Solid Snake has had a few outfits in his long-running series of games, and this is close to being a jumble of them. The AI does not understand specific designs. And what’s up with his hand? You can see that the AI is trying to replicate the Joe Mad drawing with the pose and hands, but it doesn’t know what’s going on with the fingers or gloves. It’s just a jumble of shapes that make less and less sense the closer you examine them.

Why are there random pebbles poking out of Solid Snake’s cheek, too? Why is there a big rock clamped to his shoulder — or is it floating in mid-air? What is that huge shadowy spiral popping out of the tower in the background? I didn’t ask for any of that. Solid Snake is a very recognizable character with thousands of pieces of official and unofficial artwork on the Internet, and tons of artwork from Joe Mad, but this doesn’t look like Joe Mad drew it. If you showed me this drawing randomly, I would say it looks like it was created by a weirdo amateur artist who had no real idea what he was doing, but a tremendous amount of dedication to his bad ideas. Somebody who, rather than fixing his errors, did his best to turn errors into something appealing.

To be, or not to be impressed?

Maybe you are blown away by the drawings and can’t help but think of how much better it is than what you could do. For me, not so much. I am an amateur artist myself, and while it would take me a few days to get a finished product with that level of coloring, shading, and little touches, I could do better. But more importantly, I know it isn’t creating anything from scratch. There are things it could never do, because it relies on a handful of advanced tricks, not to create better artwork, but to make your brain overlook the problems that exist.

Perhaps the fact that I’m even comparing some disposable blip made by an AI to what talented artists do means we’ve lost the war. Everyone says it’s a matter of time before these things surpass us. If my theories are correct, that’s not the case.

My theory

The program first looks at the prompts to decide on a “style”, which limits the elements it will pull from when composing the image. Cute, realistic, scary, surreal, etc. Think of these as gigantic folders with subfolders, tagged to hell and back. This prevents horrible accidental mismatches, because consistency is always a key to making artwork look convincing.
Anything requiring 3D objects are composed behind the scenes with various levels of detail. Objects taken from the 3D library are dynamically posed and combined by the program to match the verbs within a “scene”. For example, “jumping” has a bunch of 3D poses, and so does “sitting” or “laughing”. Models exist for almost anything, including cars, monsters, humans, flowers, ornamental headdresses, rocks, machines, and whatever else is typically used in art.
When the 3D elements are assembled, they are given a fixed perspective and lighting matrix setup, which causes the human eye to believe the scene is real no matter how bizarre the elements and the details. Our brains are designed to “believe” things if they align in 3D space and have coherent lighting.
At the same time, hundreds of millions of 2D image elements are stored in a different set of libraries. These can be used instead of 3D objects if appropriate, or applied on top of the models, like a texture, after they’ve been arranged. For example, the face of the woman could be swapped out and replaced with another, as we see in the first image and its upscaled counterpart. But as we also saw, even the eyes themselves were not consistent, because two different “eye” elements were being pulled form the library and placed on the 3D model.

To make it simple for you, the AI art generator is actually a simulated videogame world that changes its scenes and renders snapshots depending on what you tell it to do. All of the assets are either stored or drawn from some massive library, and Internet searches are used to help it find the closest thing. The “intelligent” part is mixing and matching the elements in a way that respects your prompt.

If this sounds crazy, think about this: the average 3D video game, using modern graphics cards to process the information, renders 30-60 pieces of artwork every second while playing. The “movement” and “animation” you see is a trick, snapping a moment of time and rendering it so fast that you don’t think about it as a series of still images, which just as easily be exported as image files to a web browser. It’s not hard for computers to make believable scenes with realistic lighting and arranged objects in a 3D space, and to do it incredibly fast.

If I’m right…

If my theory is correct, AI image generators are the product of a very deceptive combination of technologies, and must have been engineered by DARPA or some other military-grade project group. It would require a massive amount of electrical energy, graphics processing, information processing, and library’s of visual assets beyond what any single corporation could be expected to possess.

What does this sound like to you?

To me, it sounds like Bitcoin. More on that next time.

Don’t forget to follow me on Telegram! https://t.me/wolfpoxnews