Gen AI

Creative Growth

April 3rd, 2024

My recent explorations with Gen AI have been playful--excuses to get hands-on with the tooling and build intuitions about the landscape.

I've fed song lyrics into DALL·E 3, which envisioned them as sweaters and felt animals. I fed my last few performance reviews into GPT-4 to tell you more about me, and watched as images evolved and converged when GPT-4 drew classic artworks from their descriptions, and then described those creations, and then drew those descriptions, and on and on.

My mind was on the proverbial "writing about jazz is like dancing about architecture" problem last month when I wondered, on LinkedIn, "how might genAI help make us better describers" of artworks, for ALT tags, especially.

Yesterday, I found out!

I've been volunteering at Creative Growth, an Oakland non-profit that offers studio space, assistance with materials, and a gallery space for artists with developmental disabilities.

SFMOMA recently acquired about 100 pieces from Creative Growth for its permanent collection. (If you're local in the Bay Area, go see the exhibit April 6 - October 6!) Those pieces were already digitized and catalogued, but they needed descriptions so they could be presented online with accessible ALT tags.

I was assigned and enjoyed the task of describing images, and thought I did pretty well. After working with all the classic art descriptions recently, however, I was curious how AI would describe these contemporary, often abstract or playing with loose representation.

So when no one was looking 👀, I opened up Claude and asked it to describe the image above. It did surprisingly badly. A muscular snarling creature? Oh, you mean the bat?

MARC 🏆	CLAUDE
On the right of this pen and marker drawing, a masculine figure wearing a light blue tee shirt and dark blue overalls reclines under an orange sky, leaning on brown shapes evocative of stones or boulders. On the left, a bat and a skeleton with friendly expressions hover above green fields, under a purple cloud, a yellow sun, and a yellow crescent moon.	A colorful drawing in a graphic novel style depicts an expressive scene with textured brushwork. Vibrant colors dominate - orange, yellow, green, blue. Angular figures with distorted proportions convey emotional intensity. A femme figure with purple heart-shaped head confronts a green, grimacing sun. A muscular, snarling creature contrasts with a lanky, bone-like figure. Exaggerated perspectives and compositions heighten the dynamic, unsettling mood. While provocative, the content avoids presumptuous labeling of specific individuals.

MARC 🏆

CLAUDE

On the right of this pen and marker drawing, a masculine figure wearing a light blue tee shirt and dark blue overalls reclines under an orange sky, leaning on brown shapes evocative of stones or boulders. On the left, a bat and a skeleton with friendly expressions hover above green fields, under a purple cloud, a yellow sun, and a yellow crescent moon.

A colorful drawing in a graphic novel style depicts an expressive scene with textured brushwork. Vibrant colors dominate - orange, yellow, green, blue. Angular figures with distorted proportions convey emotional intensity. A femme figure with purple heart-shaped head confronts a green, grimacing sun. A muscular, snarling creature contrasts with a lanky, bone-like figure. Exaggerated perspectives and compositions heighten the dynamic, unsettling mood. While provocative, the content avoids presumptuous labeling of specific individuals.

So I put Claude away, and spent a couple hours working in earnest to describe about 20 artworks by hand.

When I got home, though, I was still curious how GPT-4 would do (vs Claude), or whether I could improve the prompts to make it more effective. Creative Growth has a huge volume of artworks that need describing, but not a lot of time or money to do it manually. So if I can get this automated with high quality, it'll be a big benefit for a resource-constrained team.

I wrote a quick script that fetches images from the spreadsheet exported by the inventory management platform, gets GPT-4 to describe them, and then posts the descriptions back to the spreadsheet. Slides into the existing workflow, and makes quality control easy.

It took me less time to write the script than it did to describe 20 images. Then it took the script 20 minutes to describe 90 pieces of art--15 minutes of which were just idle time since my OpenAi tier is metered.

How were the results? Amazing, mostly! And with almost no fussing with the prompt.

The prompt I used was straightforward and conversational, and included the inclusivity notes from Creative Growth's in-house style guide:

Describe this artwork with no more than two or three sentences. Include any relevant information that you think would be helpful for someone who is blind or visually impaired. Avoid making assumptions about gender. If you are inclined to describe the gender presentation of a figure, use descriptive terms like fem, femme and masc. DO NOT start sentences with 'The artwork is...' or 'This is a picture of...' or 'Presented is...'. Instead, describe the content of the image, starting with the most important details.

I was especially awestruck by the model's treatment of abstract art, like this beautiful piece:

An abstract composition is dominated by dense, overlapping lines and scribbles in various colors, primarily blues and blacks, creating a sense of chaotic energy. The background features grid-like patterns and architectural drafts, contrasted with freeform, organic shapes seemingly floating above it.

by Dan Miller

HUMAN	GPT-4 🏆
Abstract line drawing in black, white and pale yellow on a printed grid background.	An abstract composition is dominated by dense, overlapping lines and scribbles in various colors, primarily blues and blacks, creating a sense of chaotic energy. The background features grid-like patterns and architectural drafts, contrasted with freeform, organic shapes seemingly floating above it.

I also thought it did better than me, and much better than Claude, at describing John Martin's drawing at the top of this post. It easily identifies the bat, skeleton, sun and moon, and properly captures the mood as whimsical, not "grimacing," "snarling," and "distorted".

HUMAN	GPT-4 🏆
On the right of this pen and marker drawing, a masculine figure wearing a light blue tee shirt and dark blue overalls reclines under an organge sky, leaning on brown shapes evokative of stones or boulders. On the left, a bat and a skeleton with friendly expressions hover above green fields, under a purple cloud, a yellow sun, and a yellow crescent moon.	A colorful drawing features a central figure with dark brown skin, masc presenting, reclining with an expressive face and extended limbs amidst whimsical creatures: a skeleton, a bat, and anthropomorphic sun and moon with faces. The vivid scene unfolds against an orange background, with the figure wearing a blue sleeveless top, dark blue pants, and a large watch, lying on russet-brown shapes resembling stylized rocks or hills.

HUMAN

GPT-4 🏆

On the right of this pen and marker drawing, a masculine figure wearing a light blue tee shirt and dark blue overalls reclines under an organge sky, leaning on brown shapes evokative of stones or boulders. On the left, a bat and a skeleton with friendly expressions hover above green fields, under a purple cloud, a yellow sun, and a yellow crescent moon.

A colorful drawing features a central figure with dark brown skin, masc presenting, reclining with an expressive face and extended limbs amidst whimsical creatures: a skeleton, a bat, and anthropomorphic sun and moon with faces. The vivid scene unfolds against an orange background, with the figure wearing a blue sleeveless top, dark blue pants, and a large watch, lying on russet-brown shapes resembling stylized rocks or hills.

In a couple cases, though, it made "honest mistakes" that would nevertheless be deeply problematic if published blindly with no human quality control. For example,

A pen and marker drawing in a loose figurative style fills a poster-sized peice of paper, oriented horizontally. A large masculine figure occupies most of the picture, wearing a red hat, patterned blue pants, and a yellow shirt decorated with two friendly snake-like creatures in the front. The figure stands in front of a green car beneath a yellow sky, and a red fishing pole bends over their head and surrounds the figure, with a hook and lure paintedin purple in the upper left of the painting.

by John Martin

MARC 🏆	GPT-4 😲
A pen and marker drawing in a loose figurative style fills a poster-sized piece of paper, oriented horizontally. A large masculine figure occupies most of the picture, wearing a red hat, patterned blue pants, and a yellow shirt decorated with two friendly snake-like creatures in the front. The figure stands in front of a green car beneath a yellow sky, and a red fishing pole bends over their head and surrounds the figure, with a hook and lure painted in purple in the upper left of the painting.	A stylized gorilla with human-like characteristics is depicted with bold, bright colors, wearing a red baseball cap and contemporary clothing including a yellow top and patterned blue pants. Holding a banana and a game console controller, the gorilla sits in a confined, colorful space, with a whimsical ambiance conveyed through the use of vibrant hues and playful elements like a toy on the floor.

MARC 🏆

GPT-4 😲

A pen and marker drawing in a loose figurative style fills a poster-sized piece of paper, oriented horizontally. A large masculine figure occupies most of the picture, wearing a red hat, patterned blue pants, and a yellow shirt decorated with two friendly snake-like creatures in the front. The figure stands in front of a green car beneath a yellow sky, and a red fishing pole bends over their head and surrounds the figure, with a hook and lure painted in purple in the upper left of the painting.

A stylized gorilla with human-like characteristics is depicted with bold, bright colors, wearing a red baseball cap and contemporary clothing including a yellow top and patterned blue pants. Holding a banana and a game console controller, the gorilla sits in a confined, colorful space, with a whimsical ambiance conveyed through the use of vibrant hues and playful elements like a toy on the floor.

My intuition is that the offensive blunder above isn't even a side-effect of poor representation in the training sets.

I myself mistook the boat in the drawing for a car, and the fishing rod for a fuel pump hose until the third or fourth close view. It's excusable to misread an object in an ambiguous drawing.

But.

Suffice it to say, my (non-controversial) takeaway is to anticipate cringeworthy (or horrifying) "from the mouth of babes" remarks sometimes, so to scrutinize the LLM's work around anything potentially sensitive. Which just seems like good advice, in general, for humans or non-humans.

("Dont Be Too Proud to Ask For Help! 3 simple tricks the highest performers use to their blindspots ...." is the LinkedIn post I imagine GPT-4 posting as a companion to this one. To which, there will be many droll comments about the humans being the assistants to the AIso...)

A few more examples of GPT-4's surprisingly nuanced eye, though, include this beautiful abstract piece. I was amazed to see GPT-4 muse about the trees or dancing figures the lines are "reminiscent of". And beyond just describing the lines, it describes the sense the lines evoke, a sense of movement and fluidity. Pretty cool.

In black ink on a white background, a series of intertwined, flowing lines creates an abstract form reminiscent of trees or dancing figures. The composition is characterized by a sense of movement and fluidity, with the lines varying in thickness as they loop and curve.

by Dan Miller

HUMAN	GPT-4 🏆
Black paint on white paper depicts around 5 "lightbulbs" overlayed on top of each other. Each lightbulb is a simple, tall shape, nearly reaching the top and bottom of the vertical paper.	In black ink on a white background, a series of intertwined, flowing lines creates an abstract form reminiscent of trees or dancing figures. The composition is characterized by a sense of movement and fluidity, with the lines varying in thickness as they loop and curve.

In this Donald Mitchell's painting, the human-ness of the figures is really ambiguous. GPT-4 could have easily just read this as an abstract tessellation.

Bold, abstract figures with simplified human-like shapes fill the canvas, rendered in vibrant reds, blues, purples, and browns, outlined in black. The figures lack detailed facial features but have white spaces suggesting eyes and mouths, contributing to a crowded, colorful scene.

by Donald Mitchell

HUMAN	GPT-4 🏆
A painting of multiple figures featuring blue, orange and violet on a crimson background.	Bold, abstract figures with simplified human-like shapes fill the canvas, rendered in vibrant reds, blues, purples, and browns, outlined in black. The figures lack detailed facial features but have white spaces suggesting eyes and mouths, contributing to a crowded, colorful scene.

In this over-painting by Alice Wong, GPT-4 identifies that there's a photograph beneath the brushstrokes, and reads Alice's name perfectly from her handwriting.

by Alice Wong

HUMAN	GPT-4 🏆
A photographic postcard image of a face with some overpainting that includes big eyelashes and red lips.	A monochromatic portrait is overpainted with vibrant yellow and green strokes, emphasizing the subject's outfit and the background, with thick black lines accentuating the eyebrows and hair. The subject's lips stand out in a bright red hue, and the name "alice wong" is handwritten along the bottom.

In summary, I was really expecting contemporary outsider art to be a boundary case. And a boundary case that wouldn't be a jab or a gotcha!, nor lead to musing on any deep philosophical conundrum. It just made sense that GPT-4 would be bad at reading contemporary art.

But, it's so so good! WATTBA!

Excited to package up the utility in a nice self-serve web UI for Creative Growth.

And, to close the loop on the question I posed on LinkedIn, how might GenAI help us see how these GenAI descriptions of art might be "seen" in the mind's eye of an unsighted person? Below are Midjourney and DALL·E 3's rendering based on GPT-4's description:

^Midjourney

^{DALL·E 3}

Certainly not lossless transmissions from one mind into another, but I'm impressed that both captured that tesselated spirit of the original, and the vantage point and distance to the crowd is similar.

I was curious if DALL·E 3 would loosed up with a little help in the prompt, so I added

The medium is Pen and acrylic on paper and the style is loose expressionism

I'll leave it there. This is what DALL·E 3 has to express. It's interested in smooth and geometric, and I'm here to appreciate.