Image Generation in ChatGPT Just Got Way Better

Image Generation in ChatGPT Just Got Way Better

Summary

  • 4o Image Generation in ChatGPT offers photorealistic images with improved consistency and follows instructions accurately.
  • Users can convert images into different styles and refine them through prompts.
  • Uploaded images can be employed as references, or ChatGPT can utilize its own knowledge base.

When OpenAI drops a new feature, there’s often a small amount of buzz among people who are interested, but it rarely breaks the internet. However, with the release of an updated image generation model, ChatGPT did exactly that.

4o Image Generation has replaced DALL-E as the default image generation tool in ChatGPT, and the results are seriously impressive. It has led to people flooding the internet with images that they’ve generated using the tool, and its popularity seems to have even taken OpenAI by surprise.

4o Image Generation Is Built Into GPT-4o

As the name suggests, 4o Image Generation is built into the GPT-4o model. As long as you’re using that model, you don’t need to do anything other than ask ChatGPT to create an image, and 4o Image Generation will get to work. Some models, such as o1, don’t allow you to create images at all, but it seems 4o Image Generation isn’t limited to GPT-4o. I tried creating an image in GPT-4, and it still used 4o Image Generation rather than the DALL-E model that was used previously.

The DALL-E GPT in ChatGPT.

If you prefer to use DALL-E for any reason, there is still a dedicated DALL-E GPT available in the public GPT store. You can use this to generate images using the older, less capable model. There’s little use for it now other than for seeing just how much better image generation has become.

Create Excellent Photorealistic Images

One of the most obvious improvements over DALL-E is that 4o Image Generation can produce some excellent photorealistic images, without you having to worry too much about prompt crafting. While the images take a little while to generate and slowly reveal from the top-down in a way that’s reminiscent of how images used to slowly load over dial-up, the results are far superior to what DALL-E could produce.

Related

ChatGPT’s New Image Generation Feels Like Dial-Up All Over Again

Watching my images appear slowly from the top down takes me back to the good old days.

I asked DALL-E for a photorealistic image of a monkey wearing a top hat, and this is what it gave me:

An image of a monkey wearing a top hat generated by DALL-E
Adam Davidson / How-To Geek / DALL-E

This is an image generated by 4o Image Generation using the same prompt:

An image of a monkey wearing a top hat
Adam Davidson / How-To Geek / ChatGPT

The difference is staggering and, frankly, a little bit frightening. Until now, it’s usually been possible to tell if an image was AI-generated if you looked hard enough for extra fingers or mangled text. The images that ChatGPT generates, however, are very hard to distinguish from the real thing, and as is commonly said about new AI developments, this is the worst they will ever be.

You Can Convert Images Into Different Styles

One of the things that has set the internet alight since the launch of 4o Image Generation is the ability to ask ChatGPT to convert your images into different styles. For example, you can upload a photo of yourself, and ask ChatGPT to change it to the style of Van Gogh. This isn’t something new, but the quality of the results is a huge step up from DALL-E.

An image of a monkey converted into the style of Van Gogh
Adam Davidson / How-To Geek / ChatGPT

This caused loads of people to start uploading images of themselves or from popular culture that had been transformed into the style of Studio Ghibli, the popular animation studio behind classic movies such as Spirited Away and My Neighbor Totoro. The results are usually awesome, but it sparked a debate online about how ethical it is to use AI to essentially steal the style of an artist without their permission. At the time of writing, however, I was still able to make images in the style of Studio Ghibli without a problem.

It’s Easy to Refine Images Through Prompts

Another major improvement is that 4o Image Generation has excellent consistency. This means that if there’s one small thing wrong with your image, you can ask ChatGPT to fix it, and it will leave the rest of the image alone. DALL-E will often make major changes to the rest of the image when you try to fix one part of it.

This makes it much easier to get the exact image that you want, which is often a huge source of frustration with DALL-E. You would have to try multiple times to even get close to the image that you wanted, and sometimes you would fail completely. Now, for example, you can ask to have the monkey’s top hat at a different angle, and the hat will change, but the rest of the image will stay the same.

An image of a monkey in a top hat with the hat moved to a thirty degree angle
Adam Davidson / How-To Geek / ChatGPT

This consistency also makes it great for producing multiple images of the same person or character. You can ask for the same character to appear in a different setting, and ChatGPT will preserve the character’s appearance in their new image.

ChatGPT Can Finally Handle Text

This is one of the biggest changes in 4o Image Generation. DALL-E could add text to images, but it really, really struggled to do so. You’d usually get text that mostly resembled the words that you wanted but were just ever so slightly off. Enough to ruin your images, at least. Using 4o Image Generation, you can create the exact text that you want, and it generates flawlessly.

A four panel cartoon created in ChatGPT.
Adam Davidson / How-To Geek / ChatGPT

This, combined with the improved consistency, means you can create things using 4o Image Generation that just weren’t possible before. I sketched a terrible drawing of a cartoon alien and was able to create a four-panel cartoon that used that character, complete with speech bubbles with perfect text. It took longer to type the prompt than it did to generate my completed cartoon.

4o Image Generation Will Actually Follow Instructions

This is huge. One of the biggest issues I had with DALL-E is that it would often just refuse to follow an instruction, especially if that instruction involved a negative. I spent hours trying to get it to generate an image of Santa with a mustache but no beard (just to see how he’d look, obviously), and no matter what I tried, I’d get a full beard every time.

The only way I managed to get close to success was by asking it to generate an image of Hercule Poirot disguised as Santa, and even then, it took multiple attempts before I got an image without the beard and a white mustache. Now, however, I can get an image of Santa without a beard on the first try.

An image of Santa with a mustache but no beard.
Adam Davidson / How-To Geek / ChatGPT

The instruction adherence is even more impressive, however. You can define up to 20 different objects, describing each, and 4o Image Generation will follow the instructions for every single object. The example OpenAI gives is for a 4×4 grid of emoji with specific shapes and colors, and ChatGPT can create an image with all 16 emoji exactly as described.

You Can Use Uploaded Images as References

One downside of generating images from prompts is that describing what you want in an image can be hard, but describing the style of the image can be even harder. Telling ChatGPT to produce the exact look you have in your head isn’t always that easy.

Thankfully, you don’t just need to use text, however. You can upload images to indicate the type of style that you want for your images. ChatGPT will then use these images to inform the final image that it generates from your prompt.

A monkey in a top hat in the style of Studio Ghibli.
Adam Davidson / How-To Geek / ChatGPT

If you want a specific item in your image, for example, you can upload an image of it to ChatGPT. If you want people to stand in a specific pose, you can upload an image of people standing in that pose. If you find an illustration that you wish was a photorealistic image, you can upload it and ask ChatGPT to make it into a photograph.

You can even draw a rough sketch of what you want the image to look like, take a photo of it, and upload that to ChatGPT. It can then generate a photorealistic image based on your terrible sketch. It makes it so much easier to generate the exact image that you want.

Images Can Call on ChatGPT’s Own Knowledge

4o Image Generation isn’t limited to the information in your prompt or the files that you upload. GPT-4o has its own knowledge base that it can turn to, to help it create the images that you want. The Studio Ghibli images are a prime example; you don’t need to explain what Studio Ghibli animation looks like; ChatGPT already knows.

An 8-bit image explaining the water cycle.
Adam Davidson / How-To Geek / ChatGPT

This goes a lot further than just knowing different artistic styles, however. Any knowledge that ChatGPT has can be applied to your images. For example, you can ask for a diagram explaining the water cycle, and you don’t need to explain what the water cycle is; ChatGPT will pull the key information from its own knowledge.

4o Image Generation Isn’t Perfect (Yet)

4o Image Generation is incredibly good. In fact, it’s so good that Sam Altman, the CEO of OpenAI, had to add rate limits because the company’s GPUs were starting to melt.

Initially, you could create as many images as you wanted, but now you’ll often see a message telling you that you need to wait for a few minutes before creating another image. It’s not the only problem that you may find with 4o Image Generation.

A family of chipmunks in the style of the Simpsons.
Adam Davidson / How-To Geek / ChatGPT

There are also limitations on creating certain types of content. In theory, at least, you shouldn’t be able to generate anything offensive or inappropriate. If you try to create images featuring copyrighted characters, ChatGPT may also refuse. The lines are a bit blurry here. You can usually create characters in a similar style, if not the characters themselves, or get around the restrictions using slightly vague prompts.

The instruction-following doesn’t always work perfectly, and I still occasionally have issues with text, too. It’s very rare now, but occasionally, it will throw in an extra letter, especially if adding that letter still makes the text a valid word. You can usually easily fix these errors with the next generation, however.


4o Image Generation is a considerable leap forward in AI image generation, with improved photorealism, better consistency, and significantly better instruction following. It’s now incredibly easy to create photorealistic images that look exactly like you want them to.

There are a lot of ethical questions this raises, however. If you’re a graphic designer or a photographer, this update will send shivers down your spine. What can’t be denied is that this update has made it much easier for ChatGPT users to create seriously impressive images, whatever the ethical dilemmas.

Leave a Comment

Your email address will not be published. Required fields are marked *