I am on the job market, looking for faculty or research scientist positions. My research statement summarizes my accomplishments and vision. Drop me an email if you are interested in my profile!
I create artificial agents that have the communication skills and incentives to assist humans. Specifically, I explore the following questions:
How to enable AI agents to learn from natural human feedback (listening skill): My [EMNLP’17] paper demonstrated for the first time the feasibility of using only noisy, complete-output ratings to improve the performance of a neural text generator. This work was followed by studies that used real human ratings at eBay and OpenAI, ultimately leading to the development of InstructGPT that popularized RLHF. More recently, I have been developing frameworks for learning from language feedback with theoretical guarantees [ICML’21, ACL’24WS].
How to identify and share with humans what AI agents know and do not know (speaking skill): I was an early explorer of calibration analysis for NLP models [EMNLP’15’] and pioneered the development of robots that ask for help[CVPR’19, EMNLP’19, ICML’22]. Lately, I develop models that guide human navigation with language, improving their pragmatic reasoning capability [ACL’23] and making them useful even when they generate inaccurate instructions [EMNLP’24].
How to drive AI agents toward efficient and beneficial communicative behavior (incentive): I create agents that learn with progressive efficiency[NeurIPS’23WS], i.e. the more you talk to them, the less effort it will take to teach them. In an ongoing work, I characterize the limitations of the popular RLHF approach and propose a new alignment framework that emphasizes alignment with not only with the human principal but also with reality.
Some personal facts:
My real name is Nguyễn Xuân Khánh . My first name (Khánh) means “joy” or “happiness”. Please do not confuse it with Khan or Kahn :(
I was born in Việt Nam , a peaceful country (click here for inspiration to visit us).
I am also proud to be a PTNK (Phổ Thông Năng Khiếu) alumnus.
A system that can successfully guide humans in simulated residential environments despite generating potentially inaccurate instructions
@inproceedings{zhao2024successfully,title={Successfully Guiding Humans with Imperfect Instructions by Highlighting Potential Errors and Suggesting Corrections},author={Zhao, Lingjun and X. Khanh, Nguyen and Daum{\'e} III, Hal},booktitle={EMNLP},year={2024},}
Agents that can decide when and what question to ask, and incoporate answer to make progress
@inproceedings{nguyen2022hari,author={Nguyen, Khanh and Bisk, Yonatan and Daum{\'e} III, Hal},title={A framework for learning to request rich and contextually useful information from humans},booktitle={ICML},month=jul,year={2022},}
Framework for learning from language feedback with theoretical guarantees
@inproceedings{nguyen2021iliad,title={Interactive learning from activity description},author={Nguyen, Khanh and Misra, Dipendra and Schapire, Robert and Dud{\'\i}k, Miro and Shafto, Patrick},booktitle={ICML},year={2021},}
Agents that can ask for help and interpret language instructions
@inproceedings{nguyen2019hanna,author={Nguyen, Khanh and Daum{\'e} III, Hal},title={Help, Anna! Visual navigation with natural multimodal assistance via retrospective curiosity-encouraging imitation learning},booktitle={EMNLP},month={},year={2019},}
Improve machine translation with reinforcement learning from noisy ratings
Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.
@inproceedings{nguyen2017banditnmt,title={Reinforcement learning for bandit neural machine translation with simulated human feedback},author={Nguyen, Khanh and Daum{\'e} III, Hal and Boyd-Graber, Jordan},booktitle={EMNLP},month=sep,year={2017},address={Copenhagen, Denmark},publisher={Association for Computational Linguistics},url={https://www.aclweb.org/anthology/D17-1153},doi={10.18653/v1/D17-1153},pages={1464--1474},}
Calibrated probabilities for detecting errors of NLP models
@inproceedings{nguyen15calibration,title={Posterior calibration and exploratory analysis for natural language processing models},author={Nguyen, Khanh and O{'}Connor, Brendan},booktitle={EMNLP},year={2015},address={Lisbon, Portugal},publisher={Association for Computational Linguistics},url={https://www.aclweb.org/anthology/D15-1182},doi={10.18653/v1/D15-1182},pages={1587--1598},}