CausalDetox: Causal Head Selection and Intervention for Language Model Detoxification
Findings of ACL 2026
PhD candidate · UIUC Computer Science · advised by Hari Sundaram and Varun Chandrasekaran
I am a PhD candidate in Computer Science at the University of Illinois Urbana-Champaign, advised by Prof. Hari Sundaram and Prof. Varun Chandrasekaran. I started my PhD in Fall 2023.
My research lies in trustworthy machine learning and AI alignment I study how learning systems encode, retain, and propagate concepts or undesirable behaviors, and how these mechanisms can be controlled through unlearning and causal analysis.
I am currently interested in machine unlearning, concept representations in generative models, and emergent behavior in multi-agent systems, especially settings where harmful behavior or hidden influence can persist, transfer, or emerge through interaction.
Before UIUC, I earned my B.S. in Physics from the University of Science and Technology of China (USTC).
Findings of ACL 2026
PoliSim @ CHI 2026 · Best Paper Nomination
CSCW 2025
EMNLP 2025
IEEE BigData 2022
Neurocomputing 374, 77–85 · 2020