Han (Paris) Zhang 章晗
I am currently an Applied Research Engineer at NVIDIA.
I earned my Master's degree in Computer Science from Stanford University, where I was honored as a Siebel Scholar, Class of 2025.
I received my B.A. in Computer Science and Economics from UC Berkeley, where I was awarded the prestigious EECS Department Citation. During my time there, I had the privilege of conducting research in vision and language under the guidance of Prof. Trevor Darrell, Prof. Joseph E. Gonzalez, and Ph.D. candidate Lisa Dunlap.
Email  / 
Google Scholar  / 
LinkedIn
|
|
Research
My research interests broadly span computer vision, with a particular focus on vision-and-language multimodalities, generative models, and the robustness and interpretability of AI systems.
|
|
Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation
Lisa Dunlap,
Alyssa Umino,
Han Zhang,
Jiezhi Yang,
Joseph E. Gonzalez,
Trevor Darrell
NeurIPS 2023
website /
arXiv /
code
We introduce ALIA, a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing.
|
|
Using Language to Extend to Unseen Domains
Lisa Dunlap,
Clara Mohri,
Han Zhang,
Devin Guillory,
Trevor Darrell,
Joseph E. Gonzalez,
Aditi Raghunathan,
Anna Rohrbach
ICLR 2023 (Spotlight)
website /
arXiv /
code /
blog
We propose LADS, a method that learns a transformation of the image embeddings from the training domain to each unseen test domain guided by language, while preserving task relevant information.
|
|
Delayed and Indirect Impacts of Link Recommendations
Han Zhang,
Shangen Lu,
Yixin Wang,
Mihaela Curmei
FAccT 2023
arXiv /
code
We find that link recommendations have surprising delayed and indirect effects on the structural properties of networks through adapting a simulation-based approach and an explicit dynamic formation model.
|
|
Applied Research Engineer Intern, NVIDIA
Jan 2025 - March 2025, June 2024 - September 2024
Developed and released Commercial VILA, a Vision-Language Model (VLM) trained exclusively on commercial datasets.
Designed autolabeling workflows and trained a 3D-VLM for spatial reasoning in warehouse environments, with a demo featured at GTC 2025 Spatial AI session and exhibition booth.
|
|
Research Intern, Tencent AI Lab
June 2023 - Feb 2024
Conducted research on 3D-aware human body generation utilizing diffusion models, ControlNet, and autoencoders.
|
|
Engineering Summer Analyst, Goldman Sachs
June 2022 - August 2022
Developed and modularized a calculator for the loan product, GS Select, enabling automated daily computation of key financial metrics, including risk-weighted assets.
|
|
Software Engineer Intern, Bytedance
March 2021 - August 2021
Optimized TikTok's Android package size and cold startup time by developing custom optimization passes leveraging an open-source Android bytecode optimizer, ReDex.
|
Website template from Jon Barron. Thanks for stopping by :)
Last updated: May 27, 2025
|
|