Paris Zhang

Han (Paris) Zhang 章晗

I am currently an Applied Research Engineer at NVIDIA.

I earned my Master's degree in Computer Science from Stanford University, where I was honored as a Siebel Scholar, Class of 2025.

I received my B.A. in Computer Science and Economics from UC Berkeley, where I was awarded the prestigious EECS Department Citation. During my time there, I had the privilege of conducting research in vision and language under the guidance of Prof. Trevor Darrell, Prof. Joseph E. Gonzalez, and Ph.D. candidate Lisa Dunlap.

Email / Google Scholar / LinkedIn

Research

My research interests broadly span computer vision, with a particular focus on vision-and-language multimodalities, generative models, and the robustness and interpretability of AI systems.

	Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation Lisa Dunlap, Alyssa Umino, Han Zhang, Jiezhi Yang, Joseph E. Gonzalez, Trevor Darrell NeurIPS 2023 website / arXiv / code We introduce ALIA, a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing.
	Using Language to Extend to Unseen Domains Lisa Dunlap, Clara Mohri, Han Zhang, Devin Guillory, Trevor Darrell, Joseph E. Gonzalez, Aditi Raghunathan, Anna Rohrbach ICLR 2023 (Spotlight) website / arXiv / code / blog We propose LADS, a method that learns a transformation of the image embeddings from the training domain to each unseen test domain guided by language, while preserving task relevant information.
	Delayed and Indirect Impacts of Link Recommendations Han Zhang, Shangen Lu, Yixin Wang, Mihaela Curmei FAccT 2023 arXiv / code We find that link recommendations have surprising delayed and indirect effects on the structural properties of networks through adapting a simulation-based approach and an explicit dynamic formation model.

Industry

	Applied Research Engineer Intern, NVIDIA Jan 2025 - March 2025, June 2024 - September 2024 Developed and released Commercial VILA, a Vision-Language Model (VLM) trained exclusively on commercial datasets. Designed autolabeling workflows and trained a 3D-VLM for spatial reasoning in warehouse environments, with a demo featured at GTC 2025 Spatial AI session and exhibition booth.
	Research Intern, Tencent AI Lab June 2023 - Feb 2024 Conducted research on 3D-aware human body generation utilizing diffusion models, ControlNet, and autoencoders.
	Engineering Summer Analyst, Goldman Sachs June 2022 - August 2022 Developed and modularized a calculator for the loan product, GS Select, enabling automated daily computation of key financial metrics, including risk-weighted assets.
	Software Engineer Intern, Bytedance March 2021 - August 2021 Optimized TikTok's Android package size and cold startup time by developing custom optimization passes leveraging an open-source Android bytecode optimizer, ReDex.

Website template from Jon Barron. Thanks for stopping by :)
Last updated: May 27, 2025