1. Challenging deep learning models in real-world applications: learning with few or no data and looking for explainaibility (5 h)
Conférenciere :
Céline Hudelot, Professeure, CentraleSupelec, Laboratoire MICS
Résumé : In this course, after a brief review of the principles at the core of deep models and their success, we will address two of the main limitations to their deployment in real applications: their greedy need in annotated data and their lack of interpretability and explicability. Then, in the first part of the course, we will discuss and present effective recent methods for learning with limited annotated data such as few-shot learning, self-supervised learning and data augmentation and generation. In the second part, we will address the different approaches of the state of the art on explainable artificial intelligence that were proposed to improve the understanding of deep models and thus to favor their deployment in decision making scenarios.
Bio : Céline Hudelot is professor at the MICS ( Mathematics Interacting with Computer Science) Laboratory of CentraleSupélec (Université Paris Saclay) in which she was first assistant professor. She obtained her HDR in september 2014. She was previously a postdoctoral fellow at TELECOMParisTech in Isabelle Bloch team (2005 and 2006). Before, she received a PhD in electrical and computer engineering from I.N.R.I.A and the University of Nice Sophia Antipolis in 2005. Her PhD work was done under the supervision of Monique Thonnat in the ORION (PULSAR) Team.
She is the head of the MICS Laboratory since October 2021 and in charge of the research axis on artificial intelligence. Her research is focused in learning semantic representations for unstructured data understanding tasks. It involves works on representation and machine learning, knowledge representation and reasoning and explainable artificial intelligence.
2. Domain adaptation in deep learning (5 h)
Conférencier :
Rémi Flamary, Maitre de Conférences, Ecole Polytechnique, Laboratoire CMAP
Résumé : In this course we will introduce the problem of domain adaptation (DA) that occurs often in practical machine learning applications and that consists in learning a classifier on a new unlabeled dataset using a related but different labeled dataset. We will first discuss the classical methods that have been proposed to solve DA. Next we will introduce modern DA methods relying on deep learning and representation learning with a focus on divergence and optimal transport based methods.Finally the natural extensions of domain adaptation, that are multi-source and multi target DA and heterogeneous DA will be introduced with a short discussion about the state of the art methods.
Bio : Remi Flamary is Monge Assistant Professor at École Polytechnique in the Centre de Mathématiques Appliquées (CMAP) and holder of a Chair in Artificial Intelligence from 3IA Côte d’Azur. He was previously Assistant Professor at Université Cote d’Azur (UCA) and a member of Lagrange Laboratory, Observatoire de la Cote d’Azur. He received the Dipl.-Ing. in electrical engineering and the M.S. degree in image processing from the Institut National de Sciences Appliquees de Lyon in 2008, and a Ph.D. degree from the University of Rouen in 2011. His current research interests include signal and image processing, and machine learning with a recent focus on application of Optimal Transport theory to machine learning problems.
3. Privacy-preserving machine learning (5 h)
Conférencier :
Aurélien Bellet, Chargé de Recherche, INRIA, Laboratoire CRIStAL
Résumé : Personal data is being collected at an unprecedented scale by businesses and public organizations, driven by the progress of Machine Learning (ML) and AI. While training ML models on such personal or otherwise confidential data can be beneficial in many applications, this can also lead to undesirable (sometimes catastrophic) disclosure of sensitive information. In particular, ML models often contain precise information about individual data points that were used to train them. We must therefore deal with two conflicting objectives: maximizing the utility of the ML model while protecting the privacy of individuals whose data is used in the analysis. Unfortunately, recent years have shown that standard data anonymization techniques cannot reliably prevent leakage without largely destroying utility.
So how can we achieve both utility and privacy, or rather, obtain a good trade-off between the two? This course focuses on Differential Privacy (DP), a mathematical definition of privacy which comes with rigorous guarantees as well as an algorithmic framework that allows the design of practical privacy-preserving algorithms for data analytics and ML. In recent years, DP has become the gold standard in various fields and has recently seen several real-world deployments by companies and government agencies. We will introduce the formal definition of DP and analyze its key properties. We will then turn to the design of differentially private algorithms in the standard centralized model, where a trusted curator wants to release the result of an analysis in a privacy-preserving way. We will introduce key algorithmic tools for privately answering simple queries, which will serve as building blocks to construct private ML algorithms. We will also consider decentralized models of DP, where individuals or data owners do not trust a curator to handle their private data, and present applications to federated learning.
Bio : Aurélien Bellet is a researcher at Inria (France). He obtained his Ph.D. from the University of Saint-Etienne (France) in 2012 and was a postdoctoral researcher at the University of Southern California (USA) and at Télécom Paris (France). His current research focuses on the design of privacypreserving machine learning algorithms in federated and decentralized settings. Aurélien has served as area chair for ICML (since 2019) and NeurIPS (since 2020). He co-organized several international workshops on machine learning and privacy at NIPS/NeurIPS’16’18’20 and CCS’21. He also coorganizes FLOW, an online seminar on federated learning with 900+ registered attendees.
4. Enjeux éthiques du numérique et de l’intelligence artificielle – Différence entre éthique, régulation et déontologie (2 h)
Conférencier :
Jean-Gabriel Ganascia, Professeur, Sorbonne Université, Laboratoire LIP6
Résumé : Après une introduction générale à l’éthique, ce cours s’organisera autour de quatre concepts majeurs qui prennent une dimension particulière dans le cas du numérique et des applications de l’intelligence articielle, l’autonomie, la justice (et l’équité, mais nous verrons que c’est presque la même chose, et que tout est dans le presque. . . ), la vie privée et enfin la transparence et l’explicabilité. Plus précisément voici la structure du cours :
Bio : Ingénieur et philosophe de formation initiale, Jean-Gabriel Ganascia s’est ensuite orienté vers l’informatique et l’intelligence artificielle. Il a soutenu en 1983, à l’université Paris-Sud (Orsay), une thèse de doctorat sur les systèmes à base de connaissance puis en 1987, toujours à l’université ParisSud, une thèse d’État sur l’apprentissage symbolique. Professeur d’informatique à la faculté des sciences de Sorbonne Université depuis 1988, il poursuit ses recherches au LIP6 où il dirige l’équipe ACASA. Spécialiste d’intelligence artificielle (EurAI Fellow), d’apprentissage machine et de fouille de données, ses recherches actuelles portent sur le versant littéraire des humanités numériques, sur l’éthique computationnelle et sur l’éthique des technologies de l’information et de la communication.
Il est membre du CNPEN (Comité National Pilote d’Éthique du Numérique), président du comité d’éthique de pôle emploi et du comité d’orientation du CHEC (Cycle des Hautes Études de la Culture). Enfin, il a présidé le comité d’éthique du CNRS de 2016 à 2021.
5. Natural Language Generation: Advances and Challenges (2 h)
Conférencier :
Claire Gardent, Directrice de Recherche, CNRS, Laboratoire Loria
Résumé : Neural language models and pre-training approaches have revolutionised the field of Natural Language Generation (NLG) and generating high quality text is now possible. Still, multiple challenges remain such as: How to generate from multiple documents? How to retrieve, select and exploit knowledge that can improve dialog models? How to generate into languages other than English? How to generate factually correct, long form text?
In this tutorial, I will start with a brief introduction to neural Natural Language Generation (NLG). I will then present some work we did to adress these challenges and highlight some societal and ethical issues that arises from Neural approaches to Natural Language Generation.
Joint work with Angela Fan (Facebook AI Research), Antoine Bordes (Facebook AI Research) and Chloé Braud (CNRS/IRIT)
Bio : Claire Gardent is a senior research scientist at the French National Center for Scientific Research (CNRS). She is affiliated with the LORIA Computer Science research unit in Nancy, France.
Her research focuses on computational models of natural language. She has worked on a variety of task such as syntactic, semantic and discourse parsing, question answering, Human-Machine dialog and computer assisted language learning. Over the last decade however, she has mostly been working on natural language generation (NLG). In 2017, she launched the WebNLG challenge, a shared task where the goal is to generate text from DBPedia data. She has proposed neural models for simplification and summarisation; for generation for long form, multidocument input question answering; for multilingual generation from Abstract Meaning Representations and for response generation for dialog. Claire Gardent currently heads the AI xNLG Chair on multi-lingual, multi-source NLG and the CNRS LIFT Research Network on Computational, Formal and Field Linguistics. She is also a member of the H2020 NL4XAI Innovative Training Network on Natural Language Technology for Explainable AI and of the ANR funded QUANTUM project on question generation.
6. Accélération matérielle des réseaux de neurones, contraintes
énergétiques et performance (2 h)
Conférencier :
Frédéric Pétrot, Professeur, INPG, Laboratoire TIMA
Résumé : Les calculs effectués pour réaliser l'inférence avec les réseaux de neurones profonds vont de centaines de MFLOPs à des dizaines de GFLOPs, et nécessitent un accès à un nombre de paramètres qui va du million à plus d'une centaine de millions. Dans ces conditions, les CPUs classiques ont montré leur limites, et des architectures matérielles programmables avec un haut niveau de parallélisme (type GPU) sont donc massivement utilisées. La surface de silicium et la consommation énergétique de ces solutions est cependant très élevée, et de nombreux travaux tentent de trouver des solutions plus économes, en jouant sur de multiples facteurs. Le cours portera sur les approches qui sont actuellement poursuivies pour améliorer performances temporelles et consommation tout en tentant de minimiser la perte de précision.
Bio : Frédéric Pétrot a soutenu sa thèse en informatique à la défunte Université Pierre et Marie Curie (maintenant Sorbonne-Université) en 1994. Il a été maître de conférences dans cette université qu’il a quitté en 2004 pour devenir professeur à L’Ensimag, l’école d’informatique de Grenoble INP, maintenant institut d’ingénierie de (la forme actuelle de) l’Université Grenoble Alpes. Ses travaux portent sur la spécification, la simulation et l’implantation de systèmes intégrés incluant des processeurs, parfois en grand nombre. Il est le porteur de la chaire "Digital Hardware AI Architectures" de l’institut MIAI à Grenoble.