Hi! I'm currently a third-year PhD student in Machine Learning at the University of Washington, co-advised by Professors Ludwig Schmidt and Sewoong Oh.
My research interests include studying neural network representation structures and improving the quality of web-crawled machine learning datasets.
I was an AI Resident at Google Brain from Oct 2019 to Sept 2021. Prior to that I completed my undergrad at Stanford, majoring in Computer Science, and had the chance to spend a wonderful summer at Two Sigma.
From June to December 2022, I was a student researcher at Google Brain, working with Simon Kornblith.
Starting from September 2023, I am a visiting researcher at Meta AI Research, working with Luke Zettlemoyer and Xian Li.
Selected Research Papers
* = equal contribution. ** = authors are listed in alphabetical order.
Improving Multimodal Datasets with Image Captioning
Thao Nguyen, Samir Yitzhak Gadre, Gabriel Ilharco, Sewoong Oh, Ludwig Schmidt
NeurIPS Datasets & Benchmarks 2023
ICML 2023 DataPerf - Data-centric Machine Learning Research Workshop
DataComp: In search of the next generation of multimodal datasets
Samir Yitzhak Gadre*, Gabriel Ilharco*, Alex Fang*, Jonathan Hayase, Georgios
Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim
Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen
Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner,
Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev,
Yair Carmon, Vaishaal Shankar, and Ludwig Schmidt
Oral paper at NeurIPS Datasets & Benchmarks 2023
Probing Clustering in Neural Network Representations
Thao Nguyen, Simon Kornblith
Under Review
Guiding Image Captioning Models Toward More Specific Captions
Simon Kornblith, Lala Li, Zirui Wang, Thao Nguyen
ICCV 2023
ICLR 2023 Multimodal Representation Learning Workshop
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, Ludwig Schmidt
Oral paper at NeurIPS 2022
Contributed talk at ICML 2022 DataPerf - Benchmarking Data for Data-Centric AI Workshop
On the Connection between Pre-training Data Diversity and Fine-tuning Robustness
Vivek Ramanujan*, Thao Nguyen*, Sewoong Oh, Ludwig Schmidt, Ali Farhadi
Spotlight paper at NeurIPS 2023
ICML 2022 Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward
Avoiding Spurious Correlations: Bridging Theory and Practice
Thao Nguyen, Vaishnavh Nagarajan, Hanie Sedghi, Behnam Neyshabur
NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications
Dominant Datapoints in Neural Network Representations
Thao Nguyen, Maithra Raghu, Simon Kornblith
Transactions on Machine Learning Research
ICML 2021 Overparameterization Pitfalls & Opportunities Workshop
Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
Thao Nguyen, Maithra Raghu, Simon Kornblith
ICLR 2021
Spotlight talk at NeurIPS 2020 Interpretable Inductive Biases and Physically Structured Learning Workshop
NeurIPS 2020 Women in Machine Learning Workshop
Robust and Private Learning of Halfspaces
Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Thao Nguyen**
Oral presentation at AISTATS 2021
NeurIPS 2020 Privacy-Preserving ML Workshop
Concept bottleneck models
Pang Wei Koh*, Thao Nguyen*, Yew Siang Tang*, Steve Mussmann, Emma Pierson, Been
Kim, and Percy Liang
ICML 2020
Predicting Inpatient Discharge Prioritization with Electronic Health Records
Anand Avati*, Stephen Pfohl*, Chris Lin, Thao Nguyen, Meng Zhang, Philip Hwang, Jessica Wetstone, Kenneth Jung, Andrew Ng, Nigam H. Shah
arXiv 2018