By Gowthami Somepalli and Sravani Somepalli

A series exploring emergent capabilities hidden inside VL-conditioned single-stream diffusion transformers. No fine-tuning, no additional training — just scaffolding: architectural hacking and careful probing of what these models already know.

Posts

Part 3
Coming soon

SDEdit-style denoising for approximate object composition.


If you would like to cite this series in an academic context, you can use this BibTeX snippet:

@misc{somepalli2026latentscaffold,
  author = {Somepalli, Gowthami and Somepalli, Sravani},
  title = {Latent Scaffolding Image Generation Models},
  url = {https://somepago.github.io/posts/latent-scaffolding-series/},
  year = {2026}
}