Thao Nguyen

thaottn@cs.washington.edu

PhD Candidate, University of Washington
Visiting Researcher, Meta AI Research

Hi! I'm currently a third-year PhD student in Machine Learning at the University of Washington, co-advised by Professors Ludwig Schmidt and Sewoong Oh. My research interests include studying neural network representation structures and improving the quality of web-crawled machine learning datasets.

I was an AI Resident at Google Brain from Oct 2019 to Sept 2021. Prior to that I completed my undergrad at Stanford, majoring in Computer Science, and had the chance to spend a wonderful summer at Two Sigma.

From June to December 2022, I was a student researcher at Google Brain, working with Simon Kornblith.

Starting from September 2023, I am a visiting researcher at Meta AI Research, working with Luke Zettlemoyer and Xian Li.

News

  • 3 papers accepted at NeurIPS 2023: Improving Multimodal Datasets with Image Captioning (Poster), On the Connection between Pre-training Data Diversity and Fine-tuning Robustness (Spotlight), DataComp: In search of the next generation of multimodal datasets (Oral).
    I will be attending the conference, happy to chat about data-centric research.
  • We are organizing "Towards the Next Generation of Computer Vision Datasets" workshop at ICCV 2023.
    See the call for paper.

Selected Research Papers

* = equal contribution. ** = authors are listed in alphabetical order.

Improving Multimodal Datasets with Image Captioning
Thao Nguyen, Samir Yitzhak Gadre, Gabriel Ilharco, Sewoong Oh, Ludwig Schmidt
NeurIPS Datasets & Benchmarks 2023
ICML 2023 DataPerf - Data-centric Machine Learning Research Workshop
DataComp: In search of the next generation of multimodal datasets
Samir Yitzhak Gadre*, Gabriel Ilharco*, Alex Fang*, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, and Ludwig Schmidt
Oral paper at NeurIPS Datasets & Benchmarks 2023
Probing Clustering in Neural Network Representations
Thao Nguyen, Simon Kornblith
Under Review
Guiding Image Captioning Models Toward More Specific Captions
Simon Kornblith, Lala Li, Zirui Wang, Thao Nguyen
ICCV 2023
ICLR 2023 Multimodal Representation Learning Workshop
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, Ludwig Schmidt
Oral paper at NeurIPS 2022
Contributed talk at ICML 2022 DataPerf - Benchmarking Data for Data-Centric AI Workshop
On the Connection between Pre-training Data Diversity and Fine-tuning Robustness
Vivek Ramanujan*, Thao Nguyen*, Sewoong Oh, Ludwig Schmidt, Ali Farhadi
Spotlight paper at NeurIPS 2023
ICML 2022 Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward
Avoiding Spurious Correlations: Bridging Theory and Practice
Thao Nguyen, Vaishnavh Nagarajan, Hanie Sedghi, Behnam Neyshabur
NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications
Dominant Datapoints in Neural Network Representations
Thao Nguyen, Maithra Raghu, Simon Kornblith
Transactions on Machine Learning Research
ICML 2021 Overparameterization Pitfalls & Opportunities Workshop
Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
Thao Nguyen, Maithra Raghu, Simon Kornblith
ICLR 2021
Spotlight talk at NeurIPS 2020 Interpretable Inductive Biases and Physically Structured Learning Workshop
NeurIPS 2020 Women in Machine Learning Workshop
Robust and Private Learning of Halfspaces
Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Thao Nguyen**
Oral presentation at AISTATS 2021
NeurIPS 2020 Privacy-Preserving ML Workshop
Concept bottleneck models
Pang Wei Koh*, Thao Nguyen*, Yew Siang Tang*, Steve Mussmann, Emma Pierson, Been Kim, and Percy Liang
ICML 2020
Spotlight talk at the ICML 2020 Workshop on Human Interpretability in Machine Learning
Predicting Inpatient Discharge Prioritization with Electronic Health Records
Anand Avati*, Stephen Pfohl*, Chris Lin, Thao Nguyen, Meng Zhang, Philip Hwang, Jessica Wetstone, Kenneth Jung, Andrew Ng, Nigam H. Shah
arXiv 2018