Yongxin Zhou

Yongxin Zhou
Email: yongxin [dot] zhou [at] univ-grenoble-alpes [dot] fr

I am currently a research engineer/postdoc at Grenoble Computer Science Laboratory. I obtained my PhD in Computer Science in 2024, in the field of Natural Language Processing at Université Grenoble Alpes, supervised by François Portet and Fabien Ringeval. Before that, after studying at Université Sorbonne Nouvelle - Paris 3 in Phonetics and Phonology, I received my second master's degree in Language and Computer Science from Sorbonne Université.
My current research focuses on explainability, RAG and NLG. In addition, I have worked on evaluation, clinical NLP, affective computing and have continued interests on these topics.
I'm open to discussions and collaborations, do not hesitate to contact me!

LinkedIn | twitter | github

News

22/08/2025 “Can GPT models Follow Human Summarization Guidelines? A Study for Targeted Communication Goals” accepted by INLG 2025.
July 27–August 1st, 2025 I will be attending ACL 2025 in person.
June 30–July 4th, 2025 I will be attending CORIA-TALN 2025 in person.
26/03/2025 I passed the French lecturer qualification, CNU, Section 27 (Computer Science)!
06/11/2024 Defended my thesis entitled “Affect-aware Natural Language Generation: application to dialogue and cognitive remediation session summarization in low-resource settings”. - link
23/05/2024 Oral presentation of our paper at LREC-COLING 2024.
20/02/2024 “PSentScore: Evaluating Sentiment Polarity in Dialogue Summarization” accepted by LREC-COLING 2024.
20/02/2024 “Jargon: A Suite of Language Models and Evaluation Tasks for French Specialized Domains” accepted by LREC-COLING 2024.
25/10/2023 Check our new preprint "Can GPT models Follow Human Summarization Guidelines? Evaluating ChatGPT and GPT-4 for Dialogue Summarization" - arXiv link
23/07/2023 Check our new preprint "Evaluating Emotional Nuances in Dialogue Summarization" - arXiv link
14/07/2023 Presented a poster of our paper at Clinical NLP Workshop 2023, Via GatherTown.
15/06/2023 Presented a poster of our abstract at JPC 2023, Toulouse.
01/06/2023 “A Survey of Evaluation Methods of Generated Medical Textual Reports” accepted by ACL 2023 Workshop Clinical NLP.
23/03/2023 Résumé “Exploration de caractéristiques linguistiques et acoustiques pour la génération automatique de rapports de séances de remédiation cognitive avec un assistant virtuel” accepté aux JPC 2023 (les 9èmes Journées de Phonétique Clinique).
31/10/2022 Shared Task paper “MLLabs-LIG at TempoWiC 2022: A Generative Approach for Examining Temporal Meaning Shift” accepted by EMNLP 2022 Workshop EvoNLP.
22/06/2022 Presented a poster of our paper at LREC 2022.
29/04/2022 Presented a poster at LIG PHD Day - Journée des doctorants 2ème année.
04/2022 “Effectiveness of French Language Models on Abstractive Dialogue Summarization Task” accepted by LREC 2022.
05/2021 Presented a talk of our paper at AAMAS 2021 Workshop on Explainable and Transparent AI and Multi-Agent Systems.
03/2021 “Towards an XAI-Assisted Third-Party Evaluation of AI Systems: Illustration on Decision Trees” accepted by AAMAS 2021 Workshop on Explainable and Transparent AI and Multi-Agent Systems.

Talks

23/05/2024 PSentScore: Evaluating Sentiment Polarity in Dialogue Summarization
@LREC-COLING 2024, Torino, Italia

21/07/2020 XAI pour l'évaluation de l'IA. (Séminaire stage Craft AI)
@Craft AI, Paris, France

Research Experience

05/2020 - 10/2020 Research Intern in the Department of Artificial Intelligence Evaluation
@Laboratoire national de métrologie et d’essais (LNE), Trappes, France

Teaching

10/2021 - 12/2021 Advanced models of machine learning (exercise classes, M2)
(and 09/2022 - 12/2022)
@ Master INDUSTRIES DE LA LANGUE - Université Grenoble Alpes | GitHub

10/2021 - 12/2021 Automatic text generation (exercise classes, M2)
(and 09/2022 - 12/2022)
@ Master INDUSTRIES DE LA LANGUE - Université Grenoble Alpes | GitHub

Publications

PSentScore: Evaluating Sentiment Polarity in Dialogue Summarization
Yongxin Zhou, Fabien Ringeval, François Portet
LREC-COLING, 2024

pdf | abstract | bibtex

Automatic dialogue summarization is a well-established task with the goal of distilling the most crucial information from human conversations into concise textual summaries. However, most existing research has predominantly focused on summarizing factual information, neglecting the affective content, which can hold valuable insights for analyzing, monitoring, or facilitating human interactions. In this paper, we introduce and assess a set of measures PSentScore, aimed at quantifying the preservation of affective content in dialogue summaries. Our findings indicate that state-of-the-art summarization models do not preserve well the affective content within their summaries. Moreover, we demonstrate that a careful selection of the training set for dialogue samples can lead to improved preservation of affective content in the generated summaries, albeit with a minor reduction in content-related metrics.

	@inproceedings{zhou-etal-2024-psentscore-evaluating,
	    title = "{PS}ent{S}core: Evaluating Sentiment Polarity in Dialogue Summarization",
	    author = "Zhou, Yongxin  and
	      Ringeval, Fabien  and
	      Portet, Fran{\c{c}}ois",
	    editor = "Calzolari, Nicoletta  and
	      Kan, Min-Yen  and
	      Hoste, Veronique  and
	      Lenci, Alessandro  and
	      Sakti, Sakriani  and
	      Xue, Nianwen",
	    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
	    month = may,
	    year = "2024",
	    address = "Torino, Italia",
	    publisher = "ELRA and ICCL",
	    url = "https://aclanthology.org/2024.lrec-main.1163",
	    pages = "13290--13302",
	    abstract = "",
	}

Jargon: A Suite of Language Models and Evaluation Tasks for French Specialized Domains
Vincent Segonne, Aidan Mannion, Laura Cristina Alonzo Canul, Alexandre Daniel Audibert, Xingyu Liu, Cécile Macaire, Adrien Pupier, Yongxin Zhou, Mathilde Aguiar, Felix E. Herron, Magali Norré, Massih R Amini, Pierrette Bouillon, Iris Eshkol-Taravella, Emmanuelle Esperança-Rodier, Thomas François, Lorraine Goeuriot, Jérôme Goulian, Mathieu Lafourcade, Benjamin Lecouteux, François Portet, Fabien Ringeval, Vincent Vandeghinste, Maximin Coavoux, Marco Dinarelli, Didier Schwab
LREC-COLING, 2024

pdf | abstract | bibtex

Pretrained Language Models (PLMs) are the de facto backbone of most state-of-the-art NLP systems. In this paper, we introduce a family of domain-specific pretrained PLMs for French, focusing on three important domains: transcribed speech, medicine, and law. We use a transformer architecture based on efficient methods (LinFormer) to maximise their utility, since these domains often involve processing long documents. We evaluate and compare our models to state-of-the-art models on a diverse set of tasks and datasets, some of which are introduced in this paper. We gather the datasets into a new French-language evaluation benchmark for these three domains. We also compare various training configurations: continued pretraining, pretraining from scratch, as well as single- and multi-domain pretraining. Extensive domain-specific experiments show that it is possible to attain competitive downstream performance even when pre-training with the approximative LinFormer attention mechanism. For full reproducibility, we release the models and pretraining data, as well as contributed datasets.

	@inproceedings{segonne-etal-2024-jargon,
	    title = "Jargon: A Suite of Language Models and Evaluation Tasks for {F}rench Specialized Domains",
	    author = "Segonne, Vincent  and
	      Mannion, Aidan  and
	      Alonzo Canul, Laura Cristina  and
	      Audibert, Alexandre Daniel  and
	      Liu, Xingyu  and
	      Macaire, C{\'e}cile  and
	      Pupier, Adrien  and
	      Zhou, Yongxin  and
	      Aguiar, Mathilde  and
	      Herron, Felix E.  and
	      Norr{\'e}, Magali  and
	      Amini, Massih R  and
	      Bouillon, Pierrette  and
	      Eshkol-Taravella, Iris  and
	      Esperan{\c{c}}a-Rodier, Emmanuelle  and
	      Fran{\c{c}}ois, Thomas  and
	      Goeuriot, Lorraine  and
	      Goulian, J{\'e}r{\^o}me  and
	      Lafourcade, Mathieu  and
	      Lecouteux, Benjamin  and
	      Portet, Fran{\c{c}}ois  and
	      Ringeval, Fabien  and
	      Vandeghinste, Vincent  and
	      Coavoux, Maximin  and
	      Dinarelli, Marco  and
	      Schwab, Didier",
	    editor = "Calzolari, Nicoletta  and
	      Kan, Min-Yen  and
	      Hoste, Veronique  and
	      Lenci, Alessandro  and
	      Sakti, Sakriani  and
	      Xue, Nianwen",
	    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
	    month = may,
	    year = "2024",
	    address = "Torino, Italia",
	    publisher = "ELRA and ICCL",
	    url = "https://aclanthology.org/2024.lrec-main.827/",
	    pages = "9463--9476",
	    abstract = ""
	}

A Survey of Evaluation Methods of Generated Medical Textual Reports
Yongxin Zhou, Fabien Ringeval, François Portet
ACL - ClinicalNLP, 2023

pdf | abstract | bibtex

Medical Report Generation (MRG) is a sub-task of Natural Language Generation (NLG) and aims to present information from various sources in textual form and synthesize salient information, with the goal of reducing the time spent by domain experts in writing medical reports and providing support information for decision-making. Given the specificity of the medical domain, the evaluation of automatically generated medical reports is of paramount importance to the validity of these systems. Therefore, in this paper, we focus on the evaluation of automatically generated medical reports from the perspective of automatic and human evaluation. We present evaluation methods for general NLG evaluation and how they have been applied to domain-specific medical tasks. The study shows that MRG evaluation methods are very diverse, and that further work is needed to build shared evaluation methods. The state of the art also emphasizes that such an evaluation must be task specific and include human assessments, requesting the participation of experts in the field.

	@inproceedings{zhou-etal-2023-survey,
	    title = "A Survey of Evaluation Methods of Generated Medical Textual Reports",
	    author = "Zhou, Yongxin  and
	      Ringeval, Fabien  and
	      Portet, Fran{\c{c}}ois",
	    booktitle = "Proceedings of the 5th Clinical Natural Language Processing Workshop",
	    month = jul,
	    year = "2023",
	    address = "Toronto, Canada",
	    publisher = "Association for Computational Linguistics",
	    url = "https://aclanthology.org/2023.clinicalnlp-1.48",
	    doi = "10.18653/v1/2023.clinicalnlp-1.48",
	    pages = "447--459",
	    abstract = "",
	}

Effectiveness of French Language Models on Abstractive Dialogue Summarization Task
Yongxin Zhou, François Portet, Fabien Ringeval
LREC, 2022

pdf | abstract | bibtex

Pre-trained language models have established the state-of-the-art on various natural language processing tasks, including dialogue summarization, which allows the reader to quickly access key information from long conversations in meetings, interviews or phone calls. However, such dialogues are still difficult to handle with current models because the spontaneity of the language involves expressions that are rarely present in the corpora used for pre-training the language models. Moreover, the vast majority of the work accomplished in this field has been focused on English. In this work, we present a study on the summarization of spontaneous oral dialogues in French using several language specific pre-trained models: BARThez, and BelGPT-2, as well as multilingual pre-trained models: mBART, mBARThez, and mT5. Experiments were performed on the DECODA (Call Center) dialogue corpus whose task is to generate abstractive synopses from call center conversations between a caller and one or several agents depending on the situation. Results show that the BARThez models offer the best performance far above the previous state-of-the-art on DECODA. We further discuss the limits of such pre-trained models and the challenges that must be addressed for summarizing spontaneous dialogues.

          @InProceedings{zhou-portet-ringeval:2022:LREC,
	  author    = {Zhou, Yongxin  and  Portet, François  and  Ringeval, Fabien},
	  title     = {Effectiveness of French Language Models on Abstractive Dialogue Summarization Task},
	  booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
	  month          = {June},
	  year           = {2022},
	  address        = {Marseille, France},
	  publisher      = {European Language Resources Association},
	  pages     = {3571--3581},
	  abstract  = {},
	  url       = {https://aclanthology.org/2022.lrec-1.382}
	}

THERADIA: Digital Therapies Augmented by Artificial Intelligence
Franck Tarpin-Bernard, Joan Fruitet, Jean-Philippe Vigne, Patrick Constant, Hanna Chainay, Olivier Koenig, Fabien Ringeval, Béatrice Bouchot, Gérard Bailly, François Portet, Sina Alisamir, Yongxin Zhou, Jean Serre, Vincent Delerue, Hippolyte Fournier, Kévin Berenger, Isabella Zsoldos, Olivier Perrotin, Frédéric Elisei, Martin Lenglet, Charles Puaux, Léo Pacheco, Mélodie Fouillen, Didier Ghenassia
AHFE, 2021

pdf | abstract | bibtex

Digital plays a key role in the transformation of medicine. Beyond the simple computerisation of healthcare systems, many non-drug treatments are now possible thanks to digital technology. Thus, interactive stimulation exercises can be offered to people suffering from cognitive disorders, such as developmental disorders, neurodegenerative diseases, stroke or traumas. The efficiency of these new treatments, which are still primarily offered face-to-face by therapists, can be greatly improved if patients can pursue them at home. However, patients are left to their own devices which can be problematic. We introduce THERADIA, a 5-year project that aims to develop an empathic virtual agent that accompanies patients while receiving digital therapies at home, and that provides feedback to therapists and caregivers. We detail the architecture of our agent as well as the framework of our Wizard-of-Oz protocol, designed to collect a large corpus of interactions between people and our virtual assistant in order to train our models and improve our dialogues.

	@inproceedings{tarpin2021theradia,
	  title={THERADIA: Digital Therapies Augmented by Artificial Intelligence},
	  author={Tarpin-Bernard, Franck and Fruitet, Joan and Vigne, Jean-Philippe and Constant, Patrick and Chainay, Hanna and Koenig, Olivier and Ringeval, Fabien and Bouchot, B{\'e}atrice and Bailly, G{\'e}rard and Portet, Fran{\c{c}}ois and others},
	  booktitle={International Conference on Applied Human Factors and Ergonomics},
	  pages={478--485},
	  year={2021},
	  organization={Springer}
	}

Towards an XAI-Assisted Third-Party Evaluation of AI Systems: Illustration on Decision Trees
Yongxin Zhou, Matthieu Boussard, Agnes Delaborde
AAMAS - EXTRAAMAS, 2021

pdf | abstract | bibtex

We explored the potential contribution of eXplainable Artificial Intelligence (XAI) for the evaluation of Artificial Intelligence (AI), in a context where such an evaluation is performed by independent third-party evaluators, for example in the objective of certification. The experimental approach of this paper is based on “explainable by design” decision trees that produce predictions on health data and bank data. Results presented in this paper show that the explanations could be used by the evaluators to identify the parameters used in decision making and their levels of importance. The explanations would thus make it possible to orient the constitution of the evaluation corpus, to explore the rules followed for decision-making and to identify potentially critical relationships between different parameters. In addition, the explanations make it possible to inspect the presence of bias in the database and in the algorithm. These first results lay the groundwork for further additional research in order to generalize the conclusions of this paper to different XAI methods.

	@InProceedings{10.1007/978-3-030-82017-6_10,
	author="Zhou, Yongxin
	and Boussard, Matthieu
	and Delaborde, Agnes",
	editor="Calvaresi, Davide
	and Najjar, Amro
	and Winikoff, Michael
	and Fr{\"a}mling, Kary",
	title="Towards an XAI-Assisted Third-Party Evaluation of AI Systems: Illustration on Decision Trees",
	booktitle="Explainable and Transparent AI and Multi-Agent Systems",
	year="2021",
	publisher="Springer International Publishing",
	address="Cham",
	pages="158--172",
	abstract="",
	isbn="978-3-030-82017-6"
	}

Communication

Explicabilité par Perturbations pour les Systèmes RAG
Yongxin Zhou, Philippe Mulhem, Didier Schwab
DIAG-LLM@CORIA-TALN, 2025

pdf | bibtex

          @article{}

Exploration de caractéristiques linguistiques et acoustiques pour la génération automatique de rapports de séances de remédiation cognitive avec un assistant virtuel
Yongxin Zhou, Fabien Ringeval, François Portet
JPC, 2023 - 9èmes Journées de Phonétique Clinique

pdf | poster | bibtex

          @article{zhouexploration,
	  title={Exploration de caract{\'e}ristiques linguistiques et acoustiques pour la g{\'e}n{\'e}ration automatique de rapports de s{\'e}ances de rem{\'e}diation cognitive avec un assistant virtuel},
	  author={ZHOU, Yongxin and RINGEVAL, Fabien and PORTET, Fran{\c{c}}ois},
	  journal={9{\`e}me Journ{\'e}e de Phon{\'e}tique Clinique},
	  pages={117}
	}

Professional Activities

Conference Reviewer
- ACL Rolling Review, COLING (2025), LREC-Coling (2024), ACII (2023), GEM (2022, 2023, 2025), LREC (2022)
Volunteer
- Phd volunteer in the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2022
- Virtual Volunteer in the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
- Volunteer in the CLEF 2024 Conference and Labs of the Evaluation Forum, CLEF 2024
Mentorship
- 02/2021 - 07/2021, Supervision of Master Internship on Natural Language Grounding through Dense Video Captioning, Multi3Generation
- Teacher Assistant at The second Advanced Language Processing School, ALPS 2022
Organizer
- Organisation of social activities at The first Advanced Language Processing School, ALPS 2021
- DIAG-LLM workshop for CORIA-TALN 2025