About
I am a postdoctoral researcher at CNRS and the Grenoble Computer Science Laboratory (LIG), working with Maxime Peyrard on explanation of AI systems by causal abstraction. Previously, I was a postdoctoral researcher at LIG (2024–2025), working with Philippe Mulhem and Didier Schwab on information retrieval, RAG, and explainability. I obtained my PhD in Computer Science from Université Grenoble Alpes in 2024, supervised by François Portet and Fabien Ringeval.
My research investigates the explainability, trustworthiness, and evaluability of large language models — spanning mechanistic interpretability, robustness and fairness, and task-specific evaluation for high-stakes domains such as clinical NLG and scientific fact verification.
I'm open to collaborations and discussions — feel free to reach out!
Selected Publications
View All →TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness
Yongxin Zhou, Philippe Mulhem, Didier Schwab
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Systematic evaluation of how LLM temperature and external perturbations jointly affect RAG robustness.
What Matters to an LLM? Behavioral and Computational Evidences from Summarization
Yongxin Zhou, Changshun Wu, Philippe Mulhem, Didier Schwab, Maxime Peyrard
Findings of the Association for Computational Linguistics: EACL 2026
Behavioral and computational study of LLM informational preferences in summarization, revealing divergence from pre-LLM baselines.
Can GPT models Follow Human Summarization Guidelines? A Study for Targeted Communication Goals
Yongxin Zhou, Fabien Ringeval, François Portet
Proceedings of the 18th International Natural Language Generation Conference (INLG 2025)
Evaluation of GPT models' ability to follow expert-crafted summarization guidelines for targeted communication goals.
PSentScore: Evaluating Sentiment Polarity in Dialogue Summarization
Yongxin Zhou, Fabien Ringeval, François Portet
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
PSentScore: an automatic evaluation metric for sentiment polarity preservation in dialogue summarization.
A Survey of Evaluation Methods of Generated Medical Textual Reports
Yongxin Zhou, Fabien Ringeval, François Portet
Proceedings of the 5th Clinical Natural Language Processing Workshop (ClinicalNLP 2023)
A systematic survey of automatic and human evaluation methods for generated medical textual reports.
