Please note - this guide will no longer be updated here, and is now maintained at https://www.notion.so/theally/The-Ally-s-SD-WD-NAI-Model-Guide-de10c88e81e7456c82245663e2b06f10
The .ckpt model is available for download here;
Why haven't I remixed this model with SD 1.5? Two reasons. I never knew the original merge ratio of 1.4 and Waifu Diffusion for one, and two, Voldy/Auto1111's UI no longer supports Sigmoid interpolation merges. It could all be worked out, but... no time.
Note that Danbooru tags DO NOT need underscores for spaces, as I have them written below. We found that it makes zero difference; skip em.
This model is a checkpoint merge of Stable Diffusion 1.4, Waifu Diffusion 1.2 (ratio unknown), and a Sigmoid interpolation (0.5 strength) of NovelAI's model.
NovelAI is trained on a dataset of images from Danbooru, a 2D Hentai art site (NSFW!). This is significant because Danbooru images are categorized with Danbooru Tags, keywords describing every aspect of an image, including clothing, style, and pose. These Danbooru tags can be referenced in your prompt to great effect. The tags are particularly specific, and can be combined with the natural flow of a standard Stable Diffusion prompt, to fine-tune your image.
Some things this merge does better than the standard SD model are;
- natural bare feet
- shoes, including high heels and boots
- Complex poses, including
(squatting)
,(looking_at_viewer)
, etc
- Realism (while trained on anime/hentai images, it is perfectly capable of outputting realistic human faces and bodies)
There are over 20,000 Danbooru tags, and ~80% of those tested so far have a marked effect when added to a prompt.
To use a tag in your prompt, you must reference the tag exactly as shown - if it is presented in the tag search with an underscore, it must use an underscore in your prompt.
Additionally, Danbooru tags benefit from emphasis ( ) in the prompt; they are keywords to enhance specific elements of your image, and should stand out. () adds emphasis to a term, [] decreases emphasis, both by a factor of 1.1. You can either stack ()/[] for increasing/decreasing emphasis on a particular keyword.
First, an image created purely using Danbooru tags;
(1girl), (hair_ribbon), (side_ponytail), (floral_print) (crop_top), (simple_background)
Now, we take a more standard SD prompt, and incorporate the same Danbooru tags to add those specific elements;
hyperrealistic (1girl) portrait of Shakira with a (hair_ribbon) and (side_ponytail) wearing a (floral_print) (crop_top), on a (simple_background), photo realistic, artstation, 4k, award winning, art by greg rutkowski
The tags also work for landscape images to good effect;
beautiful landscape with a (mountainous_horizon), (light_rays), ((waterfall)), magnificent, luxury, detailed, sharp focus, low angle, high detail, volumetric, illustration, cold lighting, by jordan grimmer and greg rutkowski, trending on artstation, pixiv, Canon EOS 5D1
NovelAI's model which powers their site uses a CFG of 10, Euler a Sampler, and 20-30 Steps, and this translates over into my model to good effect. Lower CFG and Step values can also produce impressive results.
NovelAI's default negative prompt is as follows, and you can add to it as required:
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
Danbooru have an AI Tag Search feature, which shows examples of some of the top tags, as images, generated by AI models (not this particular model). Beware, super NSFW.