Style and Character Adaptation for Stable Diffusion LoRA Model
Context
Technical Prototype
Role
AI Developer / Researcher
Year
2025
Industry
Generative AI, Digital Art
The Idea
This project focused on training a LoRA (Low-Rank Adaptation) model for Stable Diffusion 1.5 to capture and replicate the unique artistic style and character designs associated with the "Neon Drive" narrative concept. The objective was to create a lightweight, adaptable model that allows users to generate new images consistent with the specific Neon Drive aesthetic (e.g., character 'Kaela Eryndor') when used in conjunction with the base SD 1.5 model.
Training Dataset - Generated using Midjourney 6.1
Development
The LoRA development involved several stages outlined in the "From Noise to Art" research:
Dataset Creation: A small dataset consisting of 7 synthetic images (512x512 resolution) was generated using Midjourney v6.1 (for characters/locations) and NijiJourney (for stylized character illustrations) based on the "Neon Drive" theme.
Dataset Preparation: Image captions were automatically generated for the dataset images using the Florence 2 caption model to provide textual context during training.
Training Environment: Training was conducted using both local (NVIDIA RTX 3080) and cloud (NVIDIA A10G) hardware, utilizing environments like Flux Gym and SD Scripts with Python 3.10.16. Tools like Invoke, ForgeWebUI, and ComfyUI were used for generation and testing.
LoRA Training: The LoRA was trained specifically for the Stable Diffusion 1.5 (pruned) base model. Key hyperparameters included a learning rate of 1e-4, a batch size of 1, and 10 training epochs, using the AdamW8bit optimizer. Training was performed at a 512x512 resolution.
Evaluation: Sample outputs were generated after training iterations to assess the model's ability to capture the desired style and character features.
Outputs of the LoRA Model
Reflection
This project successfully demonstrated the process of creating a custom LoRA for SD 1.5 from a small, curated, synthetic dataset. It showcased a workflow integrating various AI tools for dataset generation (Midjourney, NijiJourney), captioning (Florence 2), and model training (SD Scripts). The resulting LoRA allows users to infuse the specific "Neon Drive" aesthetic into their SD 1.5 generations. The broader research compared SD 1.5 with other models like SD XL and Flux, noting trade-offs in training time, quality, and prompt consistency.
Evaluation of Different Model Performance
What Worked
Successful training of a functional SD 1.5 LoRA from a small (7-image) synthetic dataset.
Demonstrated a viable workflow combining multiple AI tools for dataset creation and preparation.
Generated sample images showing the LoRA's ability to influence style and character appearance.
What Did Not Work / Limitations
The LoRA's performance and fidelity are likely limited by the very small training dataset size.
Requires the user to have the correct base model (SD 1.5 pruned specified in training).
Effective use likely requires specific trigger words and prompting techniques, which could not be verified from the Civitai page.
Based on model comparisons in the document, SD 1.5 LoRAs might offer lower quality or prompt consistency compared to those trained for newer base models.