Ecole d'Eté en traitement du signal et des images

L'École comportera à la fois des cours tutoriaux, ainsi que des sessions ouvertes permettant aux participants de présenter leurs travaux et de confronter leurs idées.

Présentation des cours

1. Challenging deep learning models in real-world applications: learning with few or no data and looking for explainaibility (5 h)
Conférenciere : Céline Hudelot, Professeure, CentraleSupelec, Laboratoire MICS

Résumé : In this course, after a brief review of the principles at the core of deep models and their success, we will address two of the main limitations to their deployment in real applications: their greedy need in annotated data and their lack of interpretability and explicability. Then, in the ﬁrst part of the course, we will discuss and present eﬀective recent methods for learning with limited annotated data such as few-shot learning, self-supervised learning and data augmentation and generation. In the second part, we will address the diﬀerent approaches of the state of the art on explainable artiﬁcial intelligence that were proposed to improve the understanding of deep models and thus to favor their deployment in decision making scenarios.

Bio : Céline Hudelot is professor at the MICS ( Mathematics Interacting with Computer Science) Laboratory of CentraleSupélec (Université Paris Saclay) in which she was ﬁrst assistant professor. She obtained her HDR in september 2014. She was previously a postdoctoral fellow at TELECOMParisTech in Isabelle Bloch team (2005 and 2006). Before, she received a PhD in electrical and computer engineering from I.N.R.I.A and the University of Nice Sophia Antipolis in 2005. Her PhD work was done under the supervision of Monique Thonnat in the ORION (PULSAR) Team.
She is the head of the MICS Laboratory since October 2021 and in charge of the research axis on artiﬁcial intelligence. Her research is focused in learning semantic representations for unstructured data understanding tasks. It involves works on representation and machine learning, knowledge representation and reasoning and explainable artiﬁcial intelligence.

2. Domain adaptation in deep learning (5 h)
Conférencier : Rémi Flamary, Maitre de Conférences, Ecole Polytechnique, Laboratoire CMAP

Résumé : In this course we will introduce the problem of domain adaptation (DA) that occurs often in practical machine learning applications and that consists in learning a classiﬁer on a new unlabeled dataset using a related but diﬀerent labeled dataset. We will ﬁrst discuss the classical methods that have been proposed to solve DA. Next we will introduce modern DA methods relying on deep learning and representation learning with a focus on divergence and optimal transport based methods.Finally the natural extensions of domain adaptation, that are multi-source and multi target DA and heterogeneous DA will be introduced with a short discussion about the state of the art methods.

Bio : Remi Flamary is Monge Assistant Professor at École Polytechnique in the Centre de Mathématiques Appliquées (CMAP) and holder of a Chair in Artiﬁcial Intelligence from 3IA Côte d’Azur. He was previously Assistant Professor at Université Cote d’Azur (UCA) and a member of Lagrange Laboratory, Observatoire de la Cote d’Azur. He received the Dipl.-Ing. in electrical engineering and the M.S. degree in image processing from the Institut National de Sciences Appliquees de Lyon in 2008, and a Ph.D. degree from the University of Rouen in 2011. His current research interests include signal and image processing, and machine learning with a recent focus on application of Optimal Transport theory to machine learning problems.

3. Privacy-preserving machine learning (5 h)
Conférencier : Aurélien Bellet, Chargé de Recherche, INRIA, Laboratoire CRIStAL

Résumé : Personal data is being collected at an unprecedented scale by businesses and public organizations, driven by the progress of Machine Learning (ML) and AI. While training ML models on such personal or otherwise conﬁdential data can be beneﬁcial in many applications, this can also lead to undesirable (sometimes catastrophic) disclosure of sensitive information. In particular, ML models often contain precise information about individual data points that were used to train them. We must therefore deal with two conﬂicting objectives: maximizing the utility of the ML model while protecting the privacy of individuals whose data is used in the analysis. Unfortunately, recent years have shown that standard data anonymization techniques cannot reliably prevent leakage without largely destroying utility.
So how can we achieve both utility and privacy, or rather, obtain a good trade-oﬀ between the two? This course focuses on Diﬀerential Privacy (DP), a mathematical deﬁnition of privacy which comes with rigorous guarantees as well as an algorithmic framework that allows the design of practical privacy-preserving algorithms for data analytics and ML. In recent years, DP has become the gold standard in various ﬁelds and has recently seen several real-world deployments by companies and government agencies. We will introduce the formal deﬁnition of DP and analyze its key properties. We will then turn to the design of diﬀerentially private algorithms in the standard centralized model, where a trusted curator wants to release the result of an analysis in a privacy-preserving way. We will introduce key algorithmic tools for privately answering simple queries, which will serve as building blocks to construct private ML algorithms. We will also consider decentralized models of DP, where individuals or data owners do not trust a curator to handle their private data, and present applications to federated learning.

Bio : Aurélien Bellet is a researcher at Inria (France). He obtained his Ph.D. from the University of Saint-Etienne (France) in 2012 and was a postdoctoral researcher at the University of Southern California (USA) and at Télécom Paris (France). His current research focuses on the design of privacypreserving machine learning algorithms in federated and decentralized settings. Aurélien has served as area chair for ICML (since 2019) and NeurIPS (since 2020). He co-organized several international workshops on machine learning and privacy at NIPS/NeurIPS’16’18’20 and CCS’21. He also coorganizes FLOW, an online seminar on federated learning with 900+ registered attendees.

4. Enjeux éthiques du numérique et de l’intelligence artificielle – Différence entre éthique, régulation et déontologie (2 h)
Conférencier : Jean-Gabriel Ganascia, Professeur, Sorbonne Université, Laboratoire LIP6

Résumé : Après une introduction générale à l’éthique, ce cours s’organisera autour de quatre concepts majeurs qui prennent une dimension particulière dans le cas du numérique et des applications de l’intelligence articielle, l’autonomie, la justice (et l’équité, mais nous verrons que c’est presque la même chose, et que tout est dans le presque. . . ), la vie privée et enﬁn la transparence et l’explicabilité. Plus précisément voici la structure du cours :

Introduction à l’éthique en général, et à l’éthique du numérique, en particulier a. Déﬁnition de l’éthique et distinction avec le droit et la régulation b. Histoire de l’éthique et des différentes approches qu’elle recouvre c. Éthique appliquée, en particulier bioéthique et éthique du numérique d. Notion d’éthique « by design » dans les applications informatiques e. Grands concepts de l’éthique : autonomie, justice et équité, vie privée et enﬁn, transparence et explicabilité.
Autonomie humaine et autonomie technique, a. D’où provient le concept d’autonomie b. Lien entre l’autonomie humaine, liberté et consentement c. Distinctions et malentendus relatifs à l’autonomie technique des voitures, des armes et des autres systèmes d’intelligence artiﬁcielle
Justice, équité et « fairness » a. Diﬀérence entre justice et juste et entre équité, « fairness » et égalité b. Système d’IA non équitable, c’est-à-dire discriminatoire c. Biais algorithmique dans la prédiction
Vie privée et intimité a. Distinction entre intimité, extimité, opposition privé public et vie privée b. Éthique des données, loi sécurité et liberté, groupe PAPA, RGPD etc. c. Notions d’anonymisation (de k-anonymat), de pseudonymisation, d’identité multiples etc.
Transparence et explicabilité a. Besoin de transparence et d’explication pour la prise de décision : pourquoi ? b. Tension entre transparence (c’est-à-dire ﬁdélité) et interprétabilité c. Résumé de diﬀérentes approches

Bio : Ingénieur et philosophe de formation initiale, Jean-Gabriel Ganascia s’est ensuite orienté vers l’informatique et l’intelligence artiﬁcielle. Il a soutenu en 1983, à l’université Paris-Sud (Orsay), une thèse de doctorat sur les systèmes à base de connaissance puis en 1987, toujours à l’université ParisSud, une thèse d’État sur l’apprentissage symbolique. Professeur d’informatique à la faculté des sciences de Sorbonne Université depuis 1988, il poursuit ses recherches au LIP6 où il dirige l’équipe ACASA. Spécialiste d’intelligence artiﬁcielle (EurAI Fellow), d’apprentissage machine et de fouille de données, ses recherches actuelles portent sur le versant littéraire des humanités numériques, sur l’éthique computationnelle et sur l’éthique des technologies de l’information et de la communication.
Il est membre du CNPEN (Comité National Pilote d’Éthique du Numérique), président du comité d’éthique de pôle emploi et du comité d’orientation du CHEC (Cycle des Hautes Études de la Culture). Enﬁn, il a présidé le comité d’éthique du CNRS de 2016 à 2021.

5. Natural Language Generation: Advances and Challenges (2 h)
Conférencier : Claire Gardent, Directrice de Recherche, CNRS, Laboratoire Loria

Résumé : Neural language models and pre-training approaches have revolutionised the ﬁeld of Natural Language Generation (NLG) and generating high quality text is now possible. Still, multiple challenges remain such as: How to generate from multiple documents? How to retrieve, select and exploit knowledge that can improve dialog models? How to generate into languages other than English? How to generate factually correct, long form text?
In this tutorial, I will start with a brief introduction to neural Natural Language Generation (NLG). I will then present some work we did to adress these challenges and highlight some societal and ethical issues that arises from Neural approaches to Natural Language Generation.
Joint work with Angela Fan (Facebook AI Research), Antoine Bordes (Facebook AI Research) and Chloé Braud (CNRS/IRIT)

Bio : Claire Gardent is a senior research scientist at the French National Center for Scientiﬁc Research (CNRS). She is aﬃliated with the LORIA Computer Science research unit in Nancy, France.
Her research focuses on computational models of natural language. She has worked on a variety of task such as syntactic, semantic and discourse parsing, question answering, Human-Machine dialog and computer assisted language learning. Over the last decade however, she has mostly been working on natural language generation (NLG). In 2017, she launched the WebNLG challenge, a shared task where the goal is to generate text from DBPedia data. She has proposed neural models for simpliﬁcation and summarisation; for generation for long form, multidocument input question answering; for multilingual generation from Abstract Meaning Representations and for response generation for dialog. Claire Gardent currently heads the AI xNLG Chair on multi-lingual, multi-source NLG and the CNRS LIFT Research Network on Computational, Formal and Field Linguistics. She is also a member of the H2020 NL4XAI Innovative Training Network on Natural Language Technology for Explainable AI and of the ANR funded QUANTUM project on question generation.

6. Accélération matérielle des réseaux de neurones, contraintes énergétiques et performance (2 h)
Conférencier : Frédéric Pétrot, Professeur, INPG, Laboratoire TIMA

Résumé : Les calculs effectués pour réaliser l'inférence avec les réseaux de neurones profonds vont de centaines de MFLOPs à des dizaines de GFLOPs, et nécessitent un accès à un nombre de paramètres qui va du million à plus d'une centaine de millions. Dans ces conditions, les CPUs classiques ont montré leur limites, et des architectures matérielles programmables avec un haut niveau de parallélisme (type GPU) sont donc massivement utilisées. La surface de silicium et la consommation énergétique de ces solutions est cependant très élevée, et de nombreux travaux tentent de trouver des solutions plus économes, en jouant sur de multiples facteurs. Le cours portera sur les approches qui sont actuellement poursuivies pour améliorer performances temporelles et consommation tout en tentant de minimiser la perte de précision.

Bio : Frédéric Pétrot a soutenu sa thèse en informatique à la défunte Université Pierre et Marie Curie (maintenant Sorbonne-Université) en 1994. Il a été maître de conférences dans cette université qu’il a quitté en 2004 pour devenir professeur à L’Ensimag, l’école d’informatique de Grenoble INP, maintenant institut d’ingénierie de (la forme actuelle de) l’Université Grenoble Alpes. Ses travaux portent sur la spéciﬁcation, la simulation et l’implantation de systèmes intégrés incluant des processeurs, parfois en grand nombre. Il est le porteur de la chaire "Digital Hardware AI Architectures" de l’institut MIAI à Grenoble.

Présentation des cours

Programme provisoire