Multimodal AI researcher obsessed with how machines perceive, remember, and generate the world. Based in Mountain View, CA. (Friends call me the "Evals Shill" for a reason.) Currently post-training Adobe's image gen models to push creative boundaries.
PhD from UMD focused on diffusion model memorization. Built evals that actually test video understanding – like CinePile (long-video QA benchmark, Best Paper at CVPR 2024 SynCV) and ARGUS (hallucination/omission eval for dense captions).
Before academia: did SGD in industry for a while in India, IIT Madras alum, founded a Fashion AI startup that was way too early to the party.
Open to collabs on generative modeling (evals + post-training). Hit me up: gowthami [dot] somepalli [at] gmail.com
// featured writing
Emergent Behaviour Rabbit Hole: Z-Image Is Secretly an I2I Model
A simple architectural splice unlocks zero-shot image-to-image variations with no training.
BLOG POST: coming soonDistillation landscape in image generation
Some thoughts on the distillation landscape in image generation.