Leo Zhang
Back to Projects

ChatClothes Virtual Try-On

Auckland University of Technology · Nov 2024 – Apr 2025

Role: AI Engineer & Python Developer (Independent)

Cover

Cover

Master's thesis: multimodal AI virtual try-on combining OOTDiffusion+LoRA generation, YOLO12n-LC classification, and DeepSeek LLM conversational control. Completed 6 months early. Published at IVCNZ 2025.

FID 28.5 (19%↑), 75% hand artifact reduction, 94.2% accuracy, <10s Pi latency, 87% user success (50 users)

Problem

Fashion e-commerce lacks interactive, multimodal try-on experiences that work on edge devices.

Solution

Multimodal AI VTON: OOTDiffusion with LoRA fine-tuning for pose-aligned generation, YOLO12n-LC lightweight classifier (5MB, 8x smaller), DeepSeek LLM + RAG for natural language to structured prompts.

Architecture

Python AI pipeline (PyTorch/ComfyUI/Dify) → FastAPI backend → PWA Android frontend → Raspberry Pi 5 edge deployment

Key Highlights

  • Shipped handheld-facing PWA control UX for diffusion/LLM jobs on mobile alongside Pi deployments
  • Applied LoRA fine-tuning to OOTDiffusion for enhanced pose alignment and texture reconstruction
  • Optimized YOLO12n to YOLO12n-LC for on-device and resource-constrained targets
  • Orchestrated DeepSeek LLM via Ollama for natural language control
  • Deployed full system on Raspberry Pi 5 for offline-capable inference
  • Thesis passed with First Class Honours, published at IVCNZ 2025

Tech Stack

PythonPyTorchOOTDiffusionLoRAYOLO12n-LCComfyUIDifyDeepSeekFastAPIRaspberry Pi 5

What I Learned

Model compression for edge deployment is critical; multimodal alignment needs iterative tuning; LoRA fine-tuning achieves significant quality gains without modifying the backbone.