Midjourney is a commercial image generation and transformation AI service. Similar services include:
- DALL-E 2 (Microsoft)
- Stable Diffusion (Open Source)
Midjourney is a somewhat opinionated AI and tends to try to create "art". It has a very high level of coherence, which means elements in the resulting images tend to have natural relationships with a low level of glitches and random weirdness. In my opinion, Midjourney has much higher quality "out of the box", trouncing the competition unless the user is quite advanced at effectively utilizing both positive and negative prompt particles.
It is important to note that the only interface for Midjourney is a Discord bot; there is no dedicated mobile or web app. Midjourney provides a helpful startup guide:
You will need to subscribe to a service plan use Midjourney. In Midjourney, you are basically paying for GPU time, so your account balance is measured in time remaining. Generally speaking, a single prompt will consume about 30 seconds of time. The basic ($10/mo) plan provides 3 hours of time, which is enough for 200-ish images. Higher tier plans provide more time (15/30 hours) and the ability to have more concurrent jobs running.
Once you follow the instructions and join the Midjourney Discord Server, you will can join a beginner channel and start crafting. Everything is done "out in the open" on the Midjourney server. You can see what other people are making and they can see what you are doing. It's a little overwhelming, but can serve as a great source of inspiration for what's possible as you're starting out. Note: It is possible to invite the Midjourney Bot to join any Discord server, so you may eventually want to start your own private (free) server to work in peace.
There are a few different main actions you can take with Midjourney:
/imagineis the text-to-image generation command, and the most-commonly used feature of Midjourney. It takes the prompt you provide and generates a 2x2 grid of images that attempt to fulfil it. Note, it can also take an image URL as part of the prompt and will attempt to use the photo as the basis for content.
/mergetakes 2 images (provided via URL) and merges them. I haven't used it a lot, but when I tried merging merging myself with my wife, the results were disturbing. 😅
/describetakes an image URL as an argument and returns several unique descriptions of what Midjourney thinks is going on in the image. This can be very useful for then crafting imagine prompts that will create something similar to the image you provided.
/infoshows you how much remaining time you have for the current month as well as any in-progress jobs.
The text prompts can be as simple or as elaborate as you want. They can be more narrative or comma-separated lists of properties. You can even use ChatGPT to generate prompts for Midjourney!
My preferred technique is to describe the subject I want, then follow it up with a comma-separated list of additional requirements:
an athletic middle-aged woman dressed in victorian clothing sitting in the captain's chair of her airship, steampunk theme, high detail, in the style of impressionist painters
You can force Midjourney to use a specific model version when rendering your image. Different models had different inherent "preferences" about output style, and sometimes an older model is better for a specific task. Generally, though, you'd be using the main model or the Niji model.
The main model is currently at v5.1. It will be used by default if no other specifier is used. To use an earlier version of the model, you simply use the
--v X option, where
X is the model version to use. It's facinating to see how far Midjourney has come in the past year by running prompts through
The main alternative model is called Niji and specializes in generating Japanese-style artwork (anime, manga, etc.). It can produce some stunning painterly scenes reminiscent of Studio Ghibli. To use the Niji model, simply add the
--niji X option where
X is the version to use (currently at 5).
There are a number of options you can append to your prompt to modify the output.
- Style (
--s 0-1000) - Basically a slider that effects the amount of flair or stylization in the image. It's hard to directly quantify the effect, but high-style images will generally be a lot busier.
- Chaos (
--c 0-100) - Determines how divergent the 4 grid images are. With lower chaos, there will usually be similar styles in play. With high chaos, the 4 images can be wildly different.
- Aspect Ratio (
--ar X:Y) - By default, images are 1:1 (square). This lets you affect the aspect ratio of the output making it more portrait or landscape. This is preferable to cropping since Midjourney will attempt keep the main image subject within the bounds of the aspect ratio.
- Quality (
--q <0.25, 0.5, 0.75, 1.0, 2.0>) - Affects how many passes the GPU makes refining the image to match the prompt. Lower values will complete faster and cost you less time. Quality 2 is good for when you want the best possible output once you feel confident in your prompt.