LoRA Training Toolkit

Introduction

This toolkit is designed to provide an accessible introduction to local AI image generation and training using open-source tools. It is aimed at cultural practitioners, researchers, artists, and other practitioners new to machine learning who wish to explore how generative AI models—specifically Stable Diffusion and Low-Rank Adaptation (LoRA) models-can be used in creative and academic contexts.

This toolkit is part of the ‘Democratising Artificial Intelligence through Culture: Making Generative AI Participatory and Intersectional Through an AI of the Commons’ publication by Seoyoung Choi and Miro Leon Bucher, developed through a research project supported by ifa – Institut für Auslandsbeziehungen. The study is available at: https://culturalrelations.ifa.de/en/research/results/democratising-ai-through-culture/ and https://doi.org/10.17901/1571.

The aim of this toolkit is to provide practical knowledge backed by a basic understanding of the underlying workings of AI. The toolkit discusses:

This toolkit aims to equip you with both the practical tools and conceptual understanding necessary to engage meaningfully with local, open-source AI image generation and LoRA training. From installing Stability Matrix and ComfyUI, to building your own dataset, training a LoRA model using OneTrainer, and finally applying it through hands-on workflows, each step has been designed to demystify the process and empower you as a critical, creative user of AI technologies.

The toolkit places particular emphasis on transparency, local control, and ethical reflection. All software used is free, open-source, and designed to run locally, ensuring users maintain control over their data and workflows at all times. While many AI tools today rely on commercial application programming interfaces (APIs) or cloud services, this guide intentionally avoids those, favouring self-contained tools that respect user privacy and provide insight into how generative AI models actually function.

By following the basic technical explanations and step-by-step instructions to gain practical experience using local AI models, we want to help readers have a more deeply founded critical view of AI. Through this process, we hope to show that users with diverse backgrounds can become engaged actors in forming AI, questioning the top-down approach used by many commercial AI tools.

LoRA Training Toolkit: Understanding and Utilising AI Image Generation and Model Training

Hardware Requirements

While this toolkit is designed to be as accessible as possible (relying exclusively on free and open-source software), running AI models locally still requires at least mid- to high-range consumer hardware. Unlike web-based AI tools that offload processing to cloud servers, local tools like Stability Matrix perform all computations on your own device. This ensures privacy and transparency but also means your computer needs to meet specific minimum requirements.

Minimum System Requirements (as defined by Stability Matrix)

Component

Requirement

Operating System

Windows 10/11, macOS (Apple Silicon), or Linux

Solid State Drive (SSD)

Minimum 20 GB of free space

Graphics Card (GPU)

NVIDIA (4 GB+ VRAM recommended)

AMD (4 GB+ VRAM; experimental support only)

Apple Silicon (M1/M2/M3 chips and newer)

Memory (RAM)

Minimum 12 GB

Note: Apple devices must have an Apple Silicon (M-series) chip. Older Intel-based Macs are not supported.

Performance Considerations

While the minimum specifications listed above are sufficient to run the tools introduced in this toolkit, your experience will vary significantly depending on your hardware, especially regarding speed.

For example, a mid-range laptop with an NVIDIA GeForce RTX 4060 Laptop GPU might take several minutes to generate a single high-resolution image. On the other hand, a high-end desktop with an NVIDIA GeForce RTX 4090 can produce the same image in a few seconds. Further, training a Low-Rank Adaptation (LoRA) model might take several hours on a mid-range device, potentially requiring overnight runs. The same training process could be completed on a high-end desktop in 30 to 60 minutes.

Since the described tools are all free to use, you can follow the ensuing descriptions and see how your hardware performs.

Safety Notice

This toolkit relies exclusively on free and open-source software, which is designed to run on local computers and does not transmit data to external servers by default. This local-first approach supports transparency and allows anyone to inspect the source code to identify potential security concerns.

While we have carefully selected tools that are widely used and developed by trusted contributors in the AI community, we must clarify that the authors of this toolkit do not possess formal IT security certifications. Therefore, we cannot guarantee the absolute safety or integrity of the software included, especially considering that these tools receive frequent updates. As with any software, there is always a risk that future versions could include harmful or malicious code.

To reduce potential risks, we strongly recommend the following:

Please note that by using this toolkit, you accept full responsibility for any consequences that may arise. The authors and the publisher of this toolkit (Institut für Auslandsbeziehungen - ifa) cannot be held liable for any issues related to the use or misuse of the tools described herein. Use this toolkit at your own risk.

How Does Stable Diffusion Work?

Before diving into the practical steps of image generation or model training, it is important to understand what Stable Diffusion is and how it works. While this overview is intentionally simplified for the purposes of this toolkit, it will help you develop a basic understanding of image generation and LoRA training. Please refer to the foundational research papers listed below for a more technical and in-depth explanation.

What Is Stable Diffusion?

Stable Diffusion is a type of text-to-image generative AI model. It belongs to a broader class of machine learning systems known as diffusion models. These models can produce new, high-quality images based solely on a written description-known as a text prompt.

Two common AI models that have proven to be popular over the last few years are:

Both models follow the same general diffusion process, with SDXL incorporating improvements in fidelity, resolution, and prompt interpretation (see Rombach et al., 2021; Podell et al., 2023).

AI Model Training and Image Generation

Even if your goal is to generate images, understanding the training process provides essential insight into how and why the outputs appear the way they do.

At a high level, the training of a diffusion model like Stable Diffusion proceeds as follows:

  1. The model is trained on a large dataset of image-text pairs (e.g., "photo of a cat" paired with an actual photo of a cat).
  2. During training, noise is progressively added to each image over a fixed number of steps-let us say 30 for simplicity.
  3. The process starts with the original image at step 0 and ends with a completely noised-out image at step 30.
  4. At each step, the model learns to predict the image's previous (less noisy) version, conditioned on the text description.

Let us consider an example:

Once trained, the model can reverse this process. Instead of starting with an actual image, it starts with pure random noise. Based on a text prompt, e.g., "photo of a cat," it predicts what a less noisy image might look like at each step, gradually building a coherent image from pure noise.

This step-by-step denoising-from pure noise back to an image, guided by the text prompt-is the core of how Stable Diffusion generates images. Understanding this process is especially helpful when working with tools like ComfyUI or training your own LoRA models. Many of the settings you will encounter, such as steps, sampler, denoise, and classifier-free guidance (CFG) scale, relate directly to the concepts described above.

In the next section, we will begin exploring the software environment used to work with Stable Diffusion, starting with Stability Matrix and its integration of ComfyUI, the visual interface where you will generate and experiment with images, workflows, and LoRA models. We then move on to the OneTrainer software and learn how to train our own AI model. Finally, we compare the images generated with and without our locally trained LoRA.

Stability Matrix

Before diving into training your own models, it is helpful to explore how image generation works in practice. This section guides you through generating your first images using Stable Diffusion XL via the Stability Matrix platform. The aim is to help you develop an initial understanding of how the model behaves, especially identifying any biases it may reveal in the specific context or domain you are working with.

Getting Started

Stability Matrix is a free and open-source platform that serves as a central hub for various AI tools, including tools for image generation and model training. It was developed by LykosAI, a group that has become a generally trusted source within the AI community.

There are two ways to run Stability Matrix:

Option 1: Building from Source (Advanced Option)

If you or your institution's IT administrator have the necessary technical expertise, you can build Stability Matrix directly from its source code. This approach is considered more secure because it allows you to review every line of the code. However, it requires advanced technical knowledge and is not necessary for most users.

Option 2: Using a Prebuilt Version (Recommended for Most Users)

For easier access, you can download a prebuilt version of Stability Matrix, available for Windows, macOS, and Linux. These are distributed as ZIP or DMG files through the official channels:

Note on Storage: While Stability Matrix itself is not very large, further AI models and extensions, as well as the generated images, can quickly pile up to tens and even hundreds of gigabytes. Thus, make sure that you have enough free disk space. Alternatively, you can run Stability Matrix from an external Solid State Drive (SSD) with a sufficiently high-speed connection to your computer. A common commercial option is the "Sandisk Extreme Portable Pro SDD" series, with which this toolkit has been tested.

Installation Instructions

  1. Download the ZIP file for your operating system.
  2. Unpack (extract) the ZIP file into a regular folder using your system's standard procedure.
  3. After unpacking, you may delete the original ZIP file.
  4. Important: Do not try to run the application directly from the ZIP file-always run it from the extracted folder.

Running the Application

Figure A1: Microsoft Defender warning.

Figure A2: Decline Microsoft Defender warning.

Initial Setup

  1. When Stability Matrix launches, you will be prompted to select a setup mode. We strongly recommend selecting "Portable Mode." This mode keeps all files self-contained in one folder, making it easier to uninstall later. You can simply delete the folder to remove the program entirely. If you do not select this option, Stability Matrix will be installed directly on your system.
  2. Click "Continue".
  3. The next screen will ask for permission to collect analytics. This is not required, and we recommend choosing "Don't Share Analytics".
  4. You will then be asked which Stable Diffusion WebUI (user interface) you want to install. While multiple options are available, this toolkit will focus on ComfyUI, so you may choose ComfyUI directly here.
  5. You will then be asked to select models for download. You can skip this step for now. Proceed without selecting any models and click "Download." This will begin installing only the selected user interface (ComfyUI in our case), which may take some time, depending on your internet connection and hardware.
  6. While ComfyUI is downloading and installing, you can click the "Hide" button to minimise the download screen, which keeps the installation running in the download tab at the bottom left of the UI.

Figure A3: Stability Matrix Portable Mode setup.

Figure A4: Available interfaces in Stability Matrix.

Navigating Stability Matrix

While ComfyUI is downloading and installing in the background, you can begin familiarising yourself with the Stability Matrix interface. Stability Matrix is a user-friendly environment that organises various tools and resources for working with AI models, including Stable Diffusion. You can also use this time to explore and download other models you may wish to use later in your image generation.

The following section provides a brief overview of each key menu within the Stability Matrix interface. They are located on the left side of the window. By clicking the three lines (hamburger menu icon), you can expand the menu and read its sections' titles.

Packages

The Packages section lists all of the tools and applications you have installed within Stability Matrix. This includes essential components such as ComfyUI (a flexible image generation interface) and OneTrainer (used for training and fine-tuning models like LoRA). This will be empty until your ComfyUI installation is completed. Here, you can also install additional tools by clicking on the "Add Packages" button. When you click the button, you find a list of available tools subdivided into an "Inference" and a "Training" list, with the inference tools being primarily for image generation and the training tools being for AI model training.

Inference

The Inference tab allows you to generate images inside Stability Matrix itself. While creating images within this menu is possible, we recommend using ComfyUI instead, as it offers a more flexible and powerful interface.

Checkpoint Manager

This menu contains all the AI models you have downloaded so far. These include:

The Checkpoint Manager helps you keep track of the different models available to your tools.

Model Browser

In the Model Browser, you can download additional AI models from two sources:

To download a model:

  1. Select it by clicking the checkbox.
  2. Click the "Import" button that appears in the bottom right corner.

This will install the selected model directly into your Stability Matrix environment.

Output Browser

This section displays the images you generate in your inference tool of choice, such as ComfyUI. The organisation of output files depends on the settings of the package you are using. If your images do not appear here, you can also locate them manually in the Stability Matrix folder on your computer, which we will cover in the ensuing section.

Workflows

This menu contains community-made ComfyUI workflows, which are pre-configured setups for specific image generation tasks. We will not use this feature for the purposes of this toolkit. However, it can be helpful for more advanced users who want to automate or share their processes.

Navigating the Stability Matrix Folder Structure

Understanding the folder structure used by Stability Matrix is key when working with AI models, generating images, or making custom adjustments to your setup. This section provides a top-level overview of the main directories and their functions.

When referring to the "Data" folder below, we mean the one located inside the directory where Stability Matrix is installed in or launched from.

The following is a simplified representation of the folder structure as seen on a Windows installation. The macOS structure should be identical.

StabilityMatrix-win-x64 (Main Folder)

Folder Explanations

Getting Started with ComfyUI and Generating the First Images

Launching ComfyUI for the First Time

Once ComfyUI has been successfully installed through Stability Matrix and you have downloaded a base model-such as Stable Diffusion XL, available via the Hugging Face tab described earlier-you are ready to start working with ComfyUI.

Step-by-Step Launch Instructions

  1. Open the "Packages" tab: From the left-hand menu in Stability Matrix, navigate to the Packages section. Here, you should see ComfyUI listed among the installed tools.
  2. Launch ComfyUI: Click the "Launch" button on the ComfyUI card. This will open a terminal window inside of Stability Matrix.
  3. Wait for the server to start: After a short series of startup messages, the terminal should display: "To see the GUI go to: http://127.0.0.1:8188". This message indicates that ComfyUI is now running on a local server on your computer.
  4. Open the ComfyUI Interface: You can now click the "Open Web UI" button on top of the terminal in Stability Matrix. This opens the ComfyUI interface in your default web browser. Alternatively, you can manually open your web browser of choice and open the link displayed in the terminal (e.g. http://127.0.0.1:8188).

Important: Although ComfyUI opens in your browser, it does not require an internet connection and is not online by default, as of this writing. It is hosted locally on your machine via the IP address 127.0.0.1 (also called localhost), meaning that all processing stays on your device.

Understanding the ComfyUI Interface

When ComfyUI opens in your browser for the first time, it will automatically load a default workflow. ComfyUI operates on the concept of workflows, which are composed of nodes connected with each other. Each node performs a specific function in the image generation process. You can think of this system as a flowchart that visually maps out how your image is created.

Key Concepts

Generating the First Image

Once ComfyUI is running, you are ready to generate your first image using the default tools provided. This section will walk you step-by-step through the process of loading a basic workflow, understanding its individual components (nodes), and producing an image using a Stable Diffusion model.

Loading a Default Workflow

If no workflow is visible when ComfyUI starts-or if you want to return to the original setup after making changes-you can reload a template by following these steps:

  1. Click on "Workflow" in the top menu of the ComfyUI interface.
  2. Select "Browse Templates" from the dropdown menu.
  3. Choose the first example called "Image Generation".

This template loads a basic image generation workflow. We will now explore the different nodes included in this setup and explain how they contribute to generating an image.

Figure A5: Workflow menu inside ComfyUI.

Figure A6: Workflow Templates inside ComfyUI.

Understanding the Nodes in the Default Workflow

1. Load Checkpoint

This node is responsible for loading the checkpoint model used to generate your image. You can select the model using the dropdown menu labelled "ckpt_name". In our case, choose Stable Diffusion XL or Stable Diffusion 1.5, depending on what you downloaded earlier. The models listed here are pulled from the directory "Data\Models\StableDiffusion". This node connects to four others. We continue with the CLIP outputs leading to the next node.

2. CLIP Text Encode (Prompt) Nodes

These nodes transform your text prompts into a format the AI model can understand:

Important Tip: Do not use negated phrases like "no clouds." Instead, list the undesired elements, like "clouds", as a positive expression in the negative prompt. Stable Diffusion works by generating two images: one based on the positive prompt, and one based on the negative. It then subtracts the features of the negative image from the positive one. This prompt-negation technique applies specifically to models like Stable Diffusion 1.5 and Stable Diffusion XL, though other models may use different mechanisms.

Both CLIP nodes produce orange "CONDITIONING" outputs connecting to the KSampler node.

3. Empty Latent Image

This node provides the starting noise for the generation process, which is referred to as a latent image.

In ComfyUI, you can technically enter any image resolution you like when configuring the Empty Latent Image node. However, in practice, the resolution you choose has a significant impact on the quality and coherence of the generated image and is directly influenced by how the underlying AI model was trained.

Each Stable Diffusion model is trained on a large dataset of images, but crucially, this dataset is often standardised to a fixed resolution. The model learns to recognise and reconstruct image content within that specific resolution space. Because of this, the model becomes particularly good at producing outputs that match the resolution it was trained on, and it can struggle or produce unexpected artifacts when asked to generate images at different resolutions. For example:

Matching your image resolution to the resolution the model was trained on is highly recommended when using these models, especially when working on projects that require consistency, high quality, or realism.

What happens when you do not match the resolution? Let us assume you are working with Stable Diffusion 1.5, which expects square images at 512 by 512 pixels. The model may start producing unexpected outputs if you enter a non-square resolution, such as 512 by 1024 pixels. A common artifact in this situation is the generation of duplicated elements, such as two heads stacked on top of each other in a portrait. This happens because the model, simply put, tries to tile or mirror its learned 512 by 512 pixels structure to fill the larger canvas, effectively treating your new resolution like multiple smaller, square images placed together.

It is worth noting that not all non-square resolutions cause apparent "errors". The models tend to be more forgiving when the chosen resolution approximates a 1:1 aspect ratio, even if it is not exact. For instance, a 4:5 ratio often still produces acceptable results, especially useful for applications like social media, where such dimensions are common.

However, the most reliable results will mostly come from generating images at the native resolution the model was trained on.

Summary of Recommended Resolutions

Model Version

Native Resolution

Stable Diffusion 1.5

512 by 512 pixels

Stable Diffusion XL

1024 by 1024 pixels

When working with either of these models in ComfyUI, you should ideally set your Empty Latent Image resolution to 512 by 512 pixels when using SD 1.5 and 1024 by 1024 pixels when using SDXL.

The situation is gradually changing with the development of more advanced models like Stable Diffusion 3 or FLUX. These models are trained on more diverse datasets that include images of various sizes and aspect ratios. As a result, they are more flexible and can better handle custom resolutions. That said, even these newer models typically have a preferred resolution range where their performance is most consistent. When in doubt, it is a good idea to consult the model documentation or community recommendations to determine the ideal dimensions.

4. KSampler

The KSampler is where the actual image generation occurs. It takes input from:

Moving on to the parameters inside of the KSampler node, we go through all settings from top to bottom:

Seed

The seed defines the starting noise pattern: If you generate an image twice using the same model, same resolution, same prompt, and same seed, the resulting image will be identical. If you change the seed-even by a single digit-you will get a different image, though it may be stylistically or compositionally similar, depending on how much the rest of the configuration stays the same.

Because there are theoretically infinite seed values, a single AI model trained on a finite dataset can theoretically generate an infinite variety of images, although limited by real-world conditions, such as finite colour configurations per pixel, etc.

Control After Generation

The second parameter in the KSampler node controls what happens to the seed value after an image is generated. By default, this parameter is set to "randomise". This means that ComfyUI will automatically insert a new, randomly generated seed value each time you run the workflow. As a result, every image you generate will be unique, even if all other settings remain unchanged. This behaviour is helpful for experimentation or when you want variety without manually adjusting any settings.

However, you can also choose different behaviours from the dropdown menu, depending on how much control or consistency you want in your outputs:

Steps

The third parameter in the KSampler node is called "steps", which controls the number of denoising steps the algorithm performs during image generation. This is one of the most important parameters for determining the clarity and quality of your output. To understand this better, imagine that Stable Diffusion begins with a canvas full of random noise. Over the course of many small, calculated steps, it transforms that noise into a coherent image guided by your prompt. The "steps" parameter tells the model how many of those refinement steps it should take.

General Rule: More Steps = More Detail

However, this improvement does not increase linearly. That means doubling the steps does not necessarily double the image quality. In fact, past a certain point, too many steps can begin to degrade the image. If there is no noise left to remove, the model may start breaking down the image instead of refining it.

Typical Step Ranges (for SD1.5 and SDXL)

Steps

Description

10-20 steps

Quick preview; low detail or blurry

30-80 steps

Standard use: good balance of speed and quality

These ranges are specific to Stable Diffusion 1.5 and Stable Diffusion XL. Other models require more or fewer steps depending on how they were trained and how their internal denoising algorithms function.

It is also important to note that increasing the number of steps will increase generation time-sometimes significantly. This is especially relevant if you work with slower hardware or generate multiple images in a batch. For testing and experimentation, it can be preferable to use a lower step count to get a sense of how the prompt behaves. For final outputs or production-quality images, you will usually want to use a higher number of steps while keeping an eye on the processing time and image clarity.

CFG (Classifier-Free Guidance)

The CFG value-short for Classifier-Free Guidance-is a key parameter in how your prompt influences the final image. In simple terms, this value controls how literally the model follows your prompt versus how freely it interprets it.

CFG Value Ranges: What They Mean

CFG Value

Model Behaviour

0-4

Loose adherence to the prompt; may not relate much to the prompt

4-10

Balanced interpretation, allowing for variance while staying close to the prompt's content

10-15

Strong adherence to prompt, which can start to create artifacts

16+

Over-guided, usually producing outputs with extreme colours and other artifacts

It is important to know that different models respond to CFG values differently. While SD 1.5 and SDXL work well with values between 3 and 15, other base or fine-tuned models, such as lightning-type models, Stable Diffusion 3 or FLUX, may use different CFG values or do not use the classifier-free guidance scale in the same way as SD1.5 and SDXL.

There is no universally perfect CFG value. It often depends on the prompt, the desired style, and the model you use. Ideally, try a few different CFG values while keeping the seed fixed to see the impact of this parameter in your specific context.

Sampler and Scheduler

The sampler and scheduler settings are among the more technically advanced parameters in the KSampler node. These two components work together to define how the model reduces noise during the image generation process. When starting to generate images, you can keep them at their default values for the beginning. The following will give you a basic understanding so that you know what impact it has when you change these values.

As previously mentioned in sub-section 5.3.2 of this toolkit, Stable Diffusion uses a process called denoising. The model begins with a canvas full of random noise and, over many small steps, transforms that noise into a coherent image based on your prompt.

The sampler defines the mathematical method of removing noise during each generation step. You can think of it as the model's strategy for shaping the image from the starting randomness. Different samplers take different approaches: Some prioritise speed and efficiency (e.g., for quick previews). Others focus on accuracy and finer details but may take longer or require more system resources. Thus, when you start experimenting with these values, you may find that generation times vary significantly, even if the number of steps stays the same. Two common samplers are Euler (the default value) and DPM++ 2M Karras. Euler is commonly associated with being fast, stable, and good for general-purpose generations, whereas DPM++ 2M Karras is associated with higher detail but longer render times.

As the name suggests, the schedule defines how the sampling method is applied over the course of the denoising steps. Some samplers work better with a specific schedule, and some models work better with a specific sampler and schedule combination. The default value is set to "normal", which is usually a good starting point. Other options, such as "exponential", can significantly impact the look and content of the output image and are interesting to experiment with.

Some fine-tuned models give recommendations for sampler and schedule. Thus, it can be helpful to consult your model's description or documentation, if available.

Denoise

The denoise value is the final parameter in the KSampler node. It controls the amount of random noise used during image generation. When generating an image from a text prompt, the denoise value should be set to 1.0. This tells the model to start from 100 per cent random noise, which is necessary to fully construct an image from scratch. So, in a text-to-image setup, setting denoise to any value less than 1 typically prevents the model from properly generating a coherent image. This is different when generating an image from a reference image in an image-to-image workflow. The denoise value defines how much of the reference image and how much random noise the KSampler uses. If you set the value to 0.7, it would use 30 per cent of the reference image to start the generation and 70 per cent random noise. But to repeat, when generating an image only with a prompt, as in this case, the value should be at one.

VAE Decode Node

The KSampler connects to the VAE Decode node. The VAE Decode is responsible for converting the generated image from the model's internal latent space into a standard pixel image that can be viewed and saved. The latent space is a compressed mathematical representation of the image. While it carries all the necessary information, it is not a final image file we are used to working with, like JPEG or PNG. Thus, we need this decoding step to work with the output. You can often see this transition visually, as the preview inside the KSampler might look grainy or have unusual colour tones, showing a version before the final decoding. After passing through the VAE Decode, the image appears more detailed and with higher colour-accuracy. For this toolkit, you do not need to understand how a VAE works internally; you just need to know that this node is essential for producing a usable image.

Save Image Node

The Save Image node is the final step in the basic image generation workflow. It displays the final decoded image and automatically saves it to your system in the default output folder.

Within this node, you can set a custom filename for the image and choose a different save location if needed (important: this must be done before you run the workflow). This is useful if you want to organise outputs by theme, date, or experimental settings, especially when generating larger batches of images.

Sometimes, you may not want to save every image automatically. ComfyUI allows you to replace the Save Image node with a Preview Image node, so you can manually choose which images to save. The following steps explain how to change the Save Image node to the Preview Image node but also apply to any other situation in which you may want to add or change nodes in the workflow.

  1. Remove the Save Image node
  1. Insert a Preview Image node
  1. Connect the VAE Decode output to the Preview Image input

Now, when you run the workflow, the image will appear in the preview window inside the node. You can manually save the image by right-clicking the Preview Image node and selecting "Save Image". This method gives you more control over what gets saved, which is especially helpful when you are generating drafts or experimenting with settings.

Running the Workflow

Once your workflow is set up and ready, you can generate your image by clicking the "Run" button at the top of the ComfyUI interface.

Next to the Run button is a small numeric field:

This is especially useful when you want to compare variations or batch-generate content without manually rerunning the workflow each time.

Summary

You now have all the essential tools to generate, refine, and save AI-generated images using ComfyUI and Stable Diffusion. This workflow provides a flexible foundation you can build on as you move into more advanced uses, such as applying your self-trained LoRAs. Before continuing with this toolkit, we recommend you generate a range of images that fit the context in which you want to train your own AI model. Take some time to reflect on the outcomes and what they say about the checkpoint model. With these reflections, you can move to the next section about preparing your own dataset.

Training an AI Model

Now that you have experimented with Stable Diffusion XL and gained a sense of how it visually interprets prompts-along with any potential biases-it is time to prepare your own dataset. This dataset will be the foundation for training your LoRA model and should reflect the kinds of images you eventually want the AI to generate.

Low-Rank Adaptation Models

LoRA, short for Low-Rank Adaptation, is a technique originally introduced by Hu et al. (2021) as a way to fine-tune large language models efficiently without having to modify all of their internal weights. Rather than updating the entire model-which could involve billions of parameters-LoRA adds a pair of small trainable matrices to selected layers. These new matrices are much smaller than the full model and only capture the changes necessary to adapt them to a new task.

This technique, originally proposed for language models, has later been adapted for Stable Diffusion by the AI community. Here, LoRA models also allow fine-tuning image generation models with much less computational effort than training foundational models.

For the purpose of this toolkit, you can imagine the LoRA model adding an aspect to or changing an aspect of the foundation model while still utilising its much more extensive dataset. If you want to train a LoRA model of your cat, you may choose a dataset representing your cat without having to train the model on other things, such as landscapes. When later generating the image, you can use the LoRA model to reproduce the likeness of your cat but have your cat sit at the beach. Even if your cat never sat at a beach, the foundation model is trained on beach images, allowing the combination of the two subject matters, leveraging both datasets. It is important to mention that this has many ethical implications and can be abused. While the toolkit itself does not provide the framework to discuss these ethical implications in detail, the following may provide a better technical and practical insight into the technique and inform readers about its beneficial and harmful potential.

Collecting Image Data

When collecting image data for your dataset, you can follow these considerations:

Image Tagging

Once your dataset is complete, the next crucial step is tagging, which links each image to descriptive text. These tags help the model learn which visual patterns correspond to which concepts or prompts. This step guides how your LoRA will respond during image generation.

Tagging Best Practices

There are no rigid rules for image tagging, but these general principles can be helpful considerations:

  1. Unique Activation Tag: Every image in your dataset should share one unique tag that does not appear in the base model's training data. This tag serves to 'activate' your LoRA during image generation (e.g., "xyz123"). This helps the LoRA to see all of your images in a mutual context and allows you to trigger it by mentioning this keyword in the prompt when you later generate images with your LoRA model.
  2. Consistent Labels: Tag similar features or styles in the same way across all images. This consistency helps the model recognise visual patterns. For example, if you have multiple cloudy scenes and want to tag them, use the same tag ("cloudy") for all images instead of tagging one as "cloudy," another as "misty," and another with "overcast."
  3. Component Separation: Use commas to separate individual elements (e.g., "person, cat, lamp"). Theoretically, you can also use other separators, but we will use the comma as a separator in OneTrainer later.
  4. Contextual Phrasing: When two or more elements are closely related, describe them as a unit (e.g., "human holding a cat" instead of "human, cat").
  5. Sentence Mixing: With Stable Diffusion XL, combining tags and full descriptive sentences often produces more accurate results.

Tip: For beginners or testing purposes, it may be enough only to use a unique activation tag (strategy 1). This simplifies the process while still allowing useful experimentation.

Tagging can be done manually, but software tools can make the process significantly easier. If you decide to do tagging manually, you have to create a .txt file for each image in your dataset. The .txt file name must be the exact same as the corresponding image file (e.g., "cat001.jpg" and "cat001.txt"). Type your tags into the .txt file and save it. Repeat this process for all images. This can be extremely time-consuming, so using a software tool is recommended.

Tagging Tools

  • TagGUI (Windows): An open-source program for simple tagging.
  • CrossTag: A cross-platform visual tagging tool developed by the authors of this toolkit. It is open-source and supports macOS, Windows, and Linux.

Using CrossTag for Image Tagging

CrossTag is a lightweight tagging app created with Node.js and Electron.js, available as open-source software.

Installation

Tagging Workflow in CrossTag

  1. Open CrossTag and click the folder icon to select your dataset folder.
  1. Once the folder is loaded, CrossTag will:
  1. In the tagging interface:
  1. Additional Features:

Figure A7: CrossTag interface on initial start.

Figure A8: CrossTag interface after loading a folder.

Following the descriptions above, you should be able to create your dataset of image-text-pairs stored in a folder on your local device.

OneTrainer

This section guides you through the final phase of the LoRA training process using OneTrainer, a graphical interface designed to simplify model training. You will learn how to set up, configure, and monitor the training process, including the use of sample images to evaluate progress. By the end of this section, you will have produced your own trained LoRA model.

Unfortunately, OneTrainer is not supported on macOS devices as of this writing. However, there are other macOS applications, such as Draw Things, that support LoRA model training. While the toolkit cannot cover all related tools and applications, the concepts from this section can be applied across a range of different software.

Getting Started

To install OneTrainer:

  1. Open the "Packages" menu in Stability Matrix, as described in Section 5.4. of this toolkit.
  2. Click the "Add Package" button at the bottom of the window.
  3. To find OneTrainer, switch to the "Training" list at the top of the window. In this list, you should find "OneTrainer". Click on it and choose "Install". This should run all the installation steps automatically.
  4. Once it has finished, you should find a new card in the "Packages" menu in Stability Matrix. There, you can start OneTrainer by clicking the "Launch" button.

Other than ComfyUI, OneTrainer does not run in your browser. Thus, after the starting script is executed in the Stability Matrix terminal, a new window should appear, showing the OneTrainer user interface.

Figure A9: Adding a Package in Stability Matrix.

Figure A10: Installing OneTrainer in Stability Matrix.

Figure A11: Installed Packages in Stability Matrix.

OneTrainer is a powerful training environment that supports a variety of AI models and training strategies. It comes with a range of presets for different training goals, including fine-tuning, full model training, and LoRA training across multiple base models (e.g., SDXL, SD1.5, Flux).

Your choice of base model affects both training speed and final output quality. Consider the following options:

Important: Keep in mind that an SDXL LoRA can only be used in conjunction with an SDXL and no other checkpoint model, such as SD1.5 or Flux. The same is true for SD1.5 LoRA and Flux LoRA, etc.

For this toolkit, we will use the preset "#sdxl 1.0 LoRA", which automatically applies suitable parameters for training a LoRA on the SDXL 1.0 model.

Note: If you wish to explore different base models (such as SD1.5 or Flux), you can select a different preset and still follow this guide broadly. Not all presets are fully optimised, but most produce functional results with minimal effort.

Navigating the OneTrainer Interface and Training Settings

Start by clicking on the dropdown menu on the top left corner, which shows the list of all presets available in OneTrainer. Select the "#sdxl 1.0 LoRA" as mentioned above, but you may choose another preset. All settings for OneTrainer are stored within "Data\Packages\OneTrainer\training_presets" if used through Stability Matrix.

Figure A12: Choosing the SDXL LoRA training preset in OneTrainer.

The interface is divided into several key tabs and menus:

General

In most situations, you can keep all settings in the "general" menu as they are by default. The only thing you should consider is turning "Continue from last backup" on if you wish to resume training from a previous session.

Figure A13: The general menu in OneTrainer.

Model

The preset usually handles the model settings, so you should not need to modify them. If an error occurs, you can manually download the base model and select it in the "Base Model" field. Otherwise, OneTrainer should download the correct model automatically. You may also use this section to choose a custom name and storage location under "Model Output Destination".

Figure A14: The model menu in OneTrainer.

Data

The preset should set these settings automatically and should not require any changes.

Figure A15: The data menu in OneTrainer.

Concepts

This is where you load your training dataset.

  1. Click "Add Concept".
  2. Ensure the new concept is enabled.
  3. Click the concept box that appears to configure it.

Figure A16: The concepts menu in OneTrainer.

Figure A17: Adding a new concept in the concepts menu in OneTrainer.

Concepts\General

Figure A18: Editing the general settings of a concept in OneTrainer.

Concepts\Image Augmentation

Figure A19: Editing the image augmentation settings of a concept in OneTrainer.

Concepts\Text Augmentation

Figure A20: Editing the text augmentation settings of a concept in OneTrainer.

Training

You can always stop the training manually before running through all epochs and evaluate the results using the saved backups.

Figure A21: The training menu in OneTrainer.

SamplingThis menu allows you to generate sample images during training to evaluate progress.

Figure A22: The sampling menu in OneTrainer.

Backup

Matching your sampling and saving schedule makes comparing sample images with their respective LoRA files easier. By saving these in-between training steps, you can later see how the LoRA model changes throughout epochs. If you get the impression that the quality is decreasing before, e.g., 100 epochs, you can use one of the backed-up models instead of the final one.

Figure A23: The backup menu in OneTrainer.

Tools / Additional Embeddings / Cloud

LoRA Settings

These are automatically handled by the preset and usually do not require any changes.

Figure A24: The LoRA menu in OneTrainer.

Starting, Pausing and Resuming Training

Once you have reviewed all settings and set up your dataset, you can click the "Start Training" button on the lower left side of the window. If you do this for the first time, OneTrainer may start by downloading the base model. Thus, it may take some time until the actual training process starts. If you have to interrupt training, you can click the "Stop Training" button. OneTrainer is designed to support interrupted training, meaning:

When saving the configuration mid-training, name your config file after the current epoch for clarity (e.g., epoch-50.json).

To resume:

  1. Launch OneTrainer and load the saved configuration from the dropdown menu.
  2. Enable "Continue from last backup".
  3. Click "Start Training".

Understanding the Folder Structure

We introduced the folder structure of Stability Matrix earlier in Section 5.4.3. If you use OneTrainer via Stability Matrix, your files are stored in "Data\Packages\OneTrainer". Within that directory, you will find the following folders:

workspace\run\samples

workspace\run\backup

workspace\run\save

models

Using the Trained LoRA for Image Generation

Once training is complete:

  1. Select the LoRA version you wish to use from the save or the model folder, as described above.
  2. Copy the .safetensor file into your WebUI installation directory:
  1. The LoRA will now be available for use within your image generation interface, in our case ComfyUI.

Generating Images with LoRAs in ComfyUI

If you followed the instructions above for generating an image with ComfyUI and Stable Diffusion, you should be able to run a workflow with a LoRA model without any issues.

Start by opening ComfyUI in Stability Matrix as described before. Once you have ComfyUI open in your browser, you can continue by loading the workflow:

  1. In the ComfyUI interface, go to the top menu and click "Workflow".
  2. Select "Browse Templates" from the dropdown menu.
  3. Choose the template labelled "Lora".

This will load a workflow that looks almost identical to the basic text-to-image workflow introduced earlier. The only difference is the inclusion of one additional node: Load LoRA.

Figure A6: Workflow Templates inside ComfyUI.

Figure A25: The default LoRA workflow inside ComfyUI.

How the LoRA Workflow Works

The "Load LoRA" node is inserted into the workflow to apply the LoRA model in addition to the base model.

Effectively, the LoRA model acts as a modifier layer, altering the generation process according to your training data.

Choosing Your LoRA Model

The first parameter inside the "Load LoRA" node allows you to select the specific LoRA model you want to use. If you have correctly placed your LoRA file into the designated folder "Data\Models\Lora", it should appear in the dropdown list automatically.

If the model does not show up, double-check that it is in the correct location and uses a compatible format (usually .safetensors).

LoRA Parameters

The Load LoRA node offers two key parameters:

1. Model Strength

2. Clip Strength

Both model strength and clip strength values can range from -100 to +100. However, most practical use cases fall within the range of 0.3 to 1.0.

Value Range

Effect

0.3 - 0.7

Medium influence of the LoRA, making it helpful in blending image contents

0.8 - 1.0

Strong influence of the LoRA emphasising the LoRA's traits

1.0 - 3.0

Very strong influence of the LoRA in case the LoRA's training data does not show very well in the generated image

Negative values

Reverse effect that is more useful for creative experimentation

There are no fixed rules for setting these values. The ideal configuration depends on:

The best approach is often to experiment:

Once you click "Run", the first image using your trainer LoRA model should generate.

A/B Testing the LoRA in ComfyUI

After training or applying a LoRA model, it is helpful to evaluate how much of an impact it actually has on image generation. One effective way to do this is through A/B testing, generating a pair of images, one with the LoRA applied and one without the LoRA.

This allows you to compare the results directly and determine whether the LoRA effectively influences the output in the intended way.

How to Set Up A/B Testing in ComfyUI

To make this easier, we (the authors of this report) have prepared a prebuilt workflow template that automates this process.

Steps:

  1. Download the workflow in JSON format directly from here or manually via the GitHub Repository of this toolkit.
  2. Open ComfyUI in your browser
  3. Drag and drop the JSON file into the workflow area or click on "Workflow" and select "Open". Select the JSON file.

This template includes two nearly identical node configurations inside of one workflow: one using only the base model and one adding the LoRA model in addition to the same base model.

Figure A26: Example for an A/B testing LoRA workflow inside ComfyUI.

For this comparison to be meaningful, all relevant generation parameters must remain the same across both parts of the workflow:

If these values differ, any output variation may be due to differences in these parameters and not the influence of the LoRA.

Performing A/B tests helps you answer important questions:

Conclusion

This toolkit aims to equip you with both the practical tools and conceptual understanding necessary to engage meaningfully with local, open-source AI image generation and LoRA training. From installing Stability Matrix and ComfyUI, to building your own dataset, training a LoRA model using OneTrainer, and finally applying it through hands-on workflows, each step has been designed to demystify the process and empower you as a critical, creative user of AI technologies.

By focusing on locally run, transparent, and open-source systems, this guide encourages a move away from opaque, centralised AI platforms and toward user autonomy and deeper engagement. Whether you are a cultural practitioner, researcher, or simply curious, the ability to train and test your own models not only expands creative possibilities but also allows for a more nuanced critique of the biases, potentials, and limits embedded in generative models.

As AI becomes increasingly embedded in visual culture, design, and research, we believe it is essential that these tools are not only used-but understood, challenged, and reshaped by a broad community of practitioners. This toolkit is one small contribution toward that goal.

Bibliography

Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., & Rombach, R. (2023). SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2307.01952

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2106.09685

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021). High-Resolution Image Synthesis with Latent Diffusion Models (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2112.10752