After creating two short story comics with Midjourney, I wanted to write a little review about the current limitations of this specific AI program. Overall, I would estimate Midjourney to be capable of doing around 25% of what I want to do with it. And the remaining 75% aren’t achievable by simply “getting better” at using Midjourney. It takes improvements of the AI program to increase the results.
By the way, you can read my two AI short comics here:
The Main Issues of Making AI Comics with Midjourney
Character consistency – Using character references is a step in the right direction. This feature was introduced not long ago. But the reference will always be used in the specific angle of your reference image.
Future Midjourney versions should be able to understand a character reference as a simple character design. Currently, they will interpret it as a complete design reference, which means that if used, Midjourney will always give you the same angle, facial expression, features, and details of the character reference.
Trying to use different poses, actions, angles, details, or just clothing for the same character is almost impossible with the current feature, as it will always use pose, action, angle, and clothing of the reference.
Environment consistency – An even bigger issue is the lack of options to determine a set environment. An office, a car, a bar, a shop – it will all look different in detail with every new image generation no matter how specific you prompt it.
I tried to get around it by having lots of scenes play in an open environment while the main character is moving. This explains different buildings, streets, and other details. But whenever I wanted to create a dozen images to be set in the same setting (e.g., an office) the details were so far off that it is tough to make the reader believe that the scene is taking place in the same setting.
Having more than one character in the same image – Another big issue is that Midjourney has massive problems with characters interacting in the same image.
“Man running away from woman” will be almost impossible to control, as Midjourney still has issues using character references for more than one character at a time. Try to have two characters fighting or hugging each other and it completely falls apart.
Community Guidelines – I understand that you don’t want your AI model to be trained on certain imagery (e.g., adult content). But lots of important words and actions for storytelling are blacklisted in Midjourney.
Most stories simply need bad guys doing bad things. How can you visualize these characters if “bad” words are banned for prompting?
Action scenes – I had problems with the most basic action scenes due to a combination of having the necessary words banned from being used in prompts and the need to have two characters interact.
The simple prompt of “Young thief shooting old man with a gun” was already too much for Midjourney to result in useful image generations. Trying to have a complex Kung Fu style fight between two superheroes seems to be impossible for years to come.
To Conclude
As I said, 25% is doable currently.
But without doing heavy editing in Photoshop, the results will not be on the level of professional graphic novels. I’ll come back to Midjourney next year and report on improvements.