Lecturers

Each Lecturer will hold two/three lessons.
The Lecturers below are confirmed.

Ioannis Antonoglou

Google DeepMind, UK

Topics

General Reinforcement Learning Algorithms

Lectures

Lecture 1: Model-based reinforcement learning I

The field of deep reinforcement learning combines deep learning techniques with reinforcement learning algorithms to develop intelligent agents that can tackle a wide variety of challenging tasks. Recent years have seen the development of a wide class of deep reinforcement learning agents which have been successfully employed in complex environments such as video games, board games and robotics. However, most of the current state-of-the-art agents employ methods which belong to the model-free class of RL algorithms. In this lecture we will take a look into a different class of RL agents which constitute the class of model-based algorithms. These agents make use of an internal model of the world in order to optimise their acting policy. This talk will present different model-based approaches and will consider their pros and cons in comparison to their model-free counterparts.

Lecture 2: Model-based reinforcement learning II

In this lecture we will dive deeper into the different approaches of model-based reinforcement learning. We will examine different modes of model learning (pixel based, implicit, stochastic models etc) and how they can be utilised for planning, augmenting real experience or as auxiliary tasks.

Lecture 3: AlphaZero: A general model-based planning reinforcement learning algorithm for board games

Board games have been widely used in the field of artificial intelligence as test-beds for the development of new algorithms. Chess and Go are among the most studied games in AI as they represent a class of complicated and self-contained environments ideal for AI research. Previous attempts to achieve super-human performance in these domains led to the development of highly specialised and domain-specific methods. In this lecture we will examine AlphaZero, a reinforcement learning algorithm which has mastered the board games of Go, Chess and Shogi achieving state-of-the-art performance without requiring any domain specific adaptations or human data. Unlike AlphaGo, AlphaZero is trained purely through self-play completely from scratch without using any prior human knowledge.

Giuseppe Di Fatta

University of Reading, UK

Topics

Data Science & Knime

Biography

Dr. Giuseppe Di Fatta is an Associate Professor of Computer Science and the Head of the Department of Computer Science at the University of Reading, UK. In 1999, he was a research fellow at the International Computer Science Institute (ICSI), Berkeley, CA, USA. From 2000 to 2004, he was with the High-Performance Computing and Networking Institute of the National Research Council, Italy. From 2004 to 2006, he was with the University of Konstanz, Germany, where he joined the initial KNIME development team until the first release of KNIME 1.0 in 2006. His research interests include data mining, machine learning, distributed and parallel computing, and data-driven multidisciplinary applications. He has published over 100 articles in peer-reviewed conferences and journals, is the founder of the IEEE ICDM Workshop on Data Mining in Networks and has chaired several other international events.

Lectures

Lecture 1: KNIME training session 1

The first session presents an overview and trends of the leading data science and machine learning platforms. Gartner defines these type of platforms as “a cohesive software application that offers a mixture of basic building blocks essential for creating all kinds of data science solution, and for incorporating those solutions into business processes, surrounding infrastructure and products.” Among them the KNIME Analytics Platform is a free, user-friendly and open-source software. In this first session KNIME is introduced in details by looking at its framework and most useful functionalities. KNIME can be a suitable tool for data-science individual users, teams as well as machine learning experts and developers.

Lecture 2: KNIME training session 2

KNIME is an open platform: it means it can integrate many other tools, software and libraries, which are useful and popular, for example, for developing specific machine learning solutions or for deploying the solution to a Big Data infrastructure. It can also be extended by means of software components (plugins) to add a specific feature or a new algorithm. In this second session dedicated to KNIME, the advanced KNIME functionalities are introduced. These includes embedded programming structures, extensions, the KNIME server and the Web portal. Finally, a brief look at the KNIME programming framework from a developer perspective to extend KNIME and to incorporate specific functionalities (e.g. data import and integration, novel algorithms, visualisation methods), which can turn a general purpose tool into a customised environment, tailored to a specific context.

Lecture 3: Data-intensive Knowledge Discovery from Brain Imaging of Alzheimer’s Disease Patients

Alzheimer’s disease (AD) is a chronic neurodegenerative disease which is largely responsible for dementia in around 6% of the population aged 65 and above. The growing availability of human brain data generated by imaging techniques, such as Magnetic Resonance Imaging (MRI), have resulted in an opportunity to apply data science approaches for the diagnosis of neurological disorders. The knowledge discovery process typically involves complex data workflows that combine pre-processing techniques, statistical methods, machine learning and data mining algorithms, post-processing and visualisation techniques. This talk presents specific research efforts in this direction and highlights some open issues and challenges that are typical of real-world applications of Data Science and Artificial Intelligence.

Phillip Isola

MIT, USA

Topics

Generative Adversarial Networks

Lectures

Lecture 1: Introduction to Generative Adversarial Networks

I will cover the basic theory and practice of generative adversarial networks (GANs). These are a powerful and currently popular class of generative model. In these models, a generator network is trained to map random noise to an output distribution that is indistinguishable from the distribution of natural photos, according to an adversary that tries to classify between these two distributions. GANs can synthesize remarkably realistic photos, sounds, and other kinds of data, but the realism often comes at the cost of limited diversity in the samples. I will cover how this can be caused by “mode collapse” and some ways to ameliorate it. I will also talk about a variety of architectures and objectives used in current GAN practice.

Lecture 2: Conditional GANs and Data Prediction

GANs can hallucinate realistic photos but what if we don’t want to just make up data from scratch? More often, we are given some data and wish to make predictions based on it. For example, given the current view of the world, predict what the future will look like. This is an application for conditional GANs, which condition on observed data and make a prediction about unobserved data. Unlike most traditional predictors, conditional GANs are adept at dealing with both high-dimensional input observations and high-dimensional output predictions. I will show a number of applications in vision and robotics.

Lecture 3: GANs for Domain Translation

GANs can be understood as a tool for mapping one data distribution into another. In this third lecture on GANs, we will adopt this perspective and see how it leads to powerful applications for domain adaptation and translation. The idea is to learn a mapping from a source domain to a target domain such that the output is identically distributed as the target domain. I will show how this can be used to translate between various visual styles (e.g., turn photos into monet paintings), make predictions from very limited supervision (e.g., in medical imaging, where supervision is expensive), and achieve “sim2real” transfer, where we train a robotic policy on simulated data but apply it in the real world.

Leslie Kaelbling

MIT - Computer Science & Artificial Intelligence Lab, USA

Topics

Artificial Intelligence & Machine Learning, Robotics

Biography

Leslie Pack Kaelbling is Professor of Computer Science and Engineering at MIT. She has previously held positions at Brown University, the Artificial Intelligence Center of SRI International, and at Teleos Research.

Prof. Kaelbling has done substantial research on designing situated agents, mobile robotics, reinforcement learning, and decision-theoretic planning. In 2000, she founded the Journal of Machine Learning Research, a high-quality journal that is both freely available electronically as well as published in archival form; she currently serves as editor-in-chief.

She is an NSF Presidential Faculty Fellow, a former member of the AAAI Executive Council, the 1997 recipient of the IJCAI Computers and Thought Award, a trustee of IJCAII and a fellow of the AAAI.

She received an A. B. in Philosophy in 1983 and a Ph. D. in Computer Science in 1990, both from Stanford University.

Her goal is to make intelligent robots: she did some of the earliest work on reinforcement learning and partially observable Markov decision process (POMDP) in robotics, and is currently focused on integrating geometric, probabilistic, and logical reasoning.

Current Projects:

Robust Intelligent Robots, Task and Motion Planning for Autonomous Robots, Learning and Optimization.

RESEARCH AREAS:

AI & Machine Learning

Graphics & Vision

Robotics

IMPACT AREAS:

Manufacturing

Lectures

Lecture 1: Learning in the factory and in the wild: designing robot systems that learn

We examine the general problem of designing robot systems from a decision-theoretic perspective that makes it clear when an individual needs to learn when it is actually performing its tasks (in the wild) and when the AI engineers need to use learning methods to design a good robot learner. We will examine several learning paradigms from this perspective and talk about methods for meta-learning, including modular meta-learning and graph element networks.

Lecture 2: Learning factored transition models for planning in complex hybrid spaces

Many robotics problem distributions are better addressed by learning models and using them to do online reasoning (an approach also known as model-predictive control) than by learning a policy or value function. We begin by discussing this claim, and then study the forms of models that are most appropriate for different types of planning problems. We then examine two new approaches for learning models that are appropriate for planning in complex hybrid (mixed discrete and continuous) problems, such as robot task and motion planning. One approach is based on Gaussian-process active learning and another on an extension of graph neural networks.

Lecture 3: Learning to speed up planning in complex hybrid spaces

An important role for learning is to speed up search: this is the critical role that learning plays in methods such as Alpha-Zero. We will examine several different mechanisms that can be used (including learning heuristic or static evaluation functions and learning to bias the action sampling distribution), with a focus on problems that require choosing actions from a continuous or hybrid space.

Raniero Romagnoli

Almawave, Italy

Lectures

Building Iride: how to mix deep learning and Ontologies techiniques to understand language

Almawave is an italian AI company based in Rome, specialized on Natural Language processing and speech analytics. NLP is multidisciplinary science that mixes computer science, psychology and linguistics. Deep learning techniques achieved important results in a wide variety of NLP tasks, but language understanding “still requires a deep knowledge of the technological domain”. In this session we present IRIDE a proprietary NLP engine, that embraces and mixes an ontologies based approach and the most recent DL techniques in order to solve NLP tasks likes sentiment analysis and named entity recognition.

Speakers:

Raniero Romagnoli – CTO Almawave & Vincenzo Sciacca – AI Technical Evangelist Almawave

Dolores Romero Morales

Copenhagen Business School, Denmark

Topics

Big Data

Lectures

Lecture 1: Latest advances in enhancing Interpretability in Data Science via means of Mathematical Optimization (Part 1)

Data Science aims to develop models that extract knowledge from complex data and represent it to aid Data Driven Decision Making. Mathematical Optimization has played a crucial role across the three main pillars of Data Science, namely Supervised Learning, Unsupervised Learning and Information Visualization. For instance, Quadratic Programming is used in Support Vector Machines, a Supervised Learning tool. Mixed-Integer Programming is used in Clustering, an Unsupervised Learning task. Global Optimization is used in MultiDimensional Scaling, an Information Visualization tool.

Data Science models should strike a balance between accuracy and interpretability. Interpretability is desirable, for instance, in medical diagnosis; it is required by regulators for models aiding, for instance, credit scoring; and since 2018 the EU extends this requirement by imposing the so-called right-to-explanation. In the first lecture, we show that Mathematical Optimization is the natural tool to model the trade-off between accuracy and interpretability.

Lecture 2: Latest advances in enhancing Interpretability in Data Science via means of Mathematical Optimization (Part 2)

In the second lecture, we zoom in and talk about the optimization of classification trees, to enhance their accuracy without harming interpretability.

Lecture 3: Latest advances in enhancing Interpretability in Data Science via means of Mathematical Optimization (part 3)

Finally, in the third lecture we discuss black-box methods such as support vector machines and how we can enhance their interpretability.

Ruslan Salakhutdinov

Carnegie Mellon University

AI Research at Apple, USA

Topics

Deep Learning - (Lectures via Video)

Lectures

Lecture 1: Unsupervised Learning: Learning Deep Generative Models

In this tutorial lecture, I will introduce mathematical basics of many popular deep generative models, including Variational Autoencoders (VAE), Generative Adversarial Networks (GANs), Deep Energy-based Models, and Deep Boltzmann Machines (DBMs), and show that they can learn useful hierarchical representations from large volumes of high-dimensional data. Throughout the tutorial I will also discuss application areas, including visual object recognition and video analysis, and language understanding.

Lecture 2: Deep Learning for Natural Language Processing/Reading Comprehension

In this lecture, I will first provide an overview of various deep learning models that can find semantically meaningful representations of words, learn to read documents and answer questions about their content. I will show how we can encode external linguistic knowledge as an explicit memory in recurrent neural networks, and use it to model co-reference relations in text. I will further discuss neural architectures, such as Transformer-XL, that enables us to learn dependency beyond a fixed length without disrupting temporal coherence.

Lecture 3: Integrating Domain-Knowledge into Deep Learning

In this talk, I will discuss various ways of incorporating domain knowledge within deep learning models, including relational and logical knowledge. I will introduce methods that can augment neural representation of text with relational data from Knowledge Bases for question answering, and show how we can use structured prior knowledge from Knowledge Graphs for image classification. Finally, I will introduce the notion of structured memory as being a crucial part of an intelligent agent’s ability to plan and reason in partially observable environments and demonstrate a deep reinforcement learning agent that can learn to store arbitrary information about the environment over long time lags. I will show that on several tasks these models significantly improve upon many of the existing techniques.

Josh Tenenbaum

MIT, USA

Topics

Computational Cognitive Science, probabilistic generative models, and probabilistic programming

Lectures

Lecture 1

Lecture 2

Lecture 3

Naftali Tishby

Hebrew University, Israel

Topics

Theory of Deep Learning – Information Bottleneck

Biography

Dr. Naftali Tishby is a professor of Computer Science, and the incumbent of the Ruth and Stan Flinkman Chair for Brain Research at the Edmond and Lily Safra Center for Brain Science (ELSC) at the Hebrew University of Jerusalem. He is one of the leaders of machine learning research and computational neuroscience in Israel and his numerous ex-students serve at key academic and industrial research positions all over the world. Prof. Tishby was the founding chair of the new computer-engineering program, and a director of the Leibnitz research center in computer science, at the Hebrew university. Tishby received his PhD in theoretical physics from the Hebrew university in 1985 and was a research staff member at MIT and Bell Labs from 1985 to 1991. Prof. Tishby was also a visiting professor at Princeton NECI, University of Pennsylvania, UCSB, and IBM research.

His current research is at the interface between computer science, statistical physics, and computational neuroscience. He pioneered various applications of statistical physics and information theory in computational learning theory. More recently, he has been working on the foundations of biological information processing and deep learning and the connections between dynamics and information. He has introduced with his colleagues new theoretical frameworks for optimal adaptation and efficient information representation in biology, such as the Information Bottleneck method and the Minimum Information principle for neural coding. This year Prof. Tishby has received the prestigious IBT award in Mathematical Neuroscience.

Lectures

Lecture 1: The Information Theory of Deep Learning: Towards Interpretable Deep Neural Networks – Rethinking Computational Learning theory

In the past several years we have developed a comprehensive theory of large scale learning with Deep Neural Networks (DNN), when optimized with Stochastic Gradient Decent (SGD). The theory is built on three theoretical components: (1) rethinking the standard (PAC like) distribution independent worse case generalisation bounds – turning them to problem dependent typical (in the Information Theory sense) bounds that are independent of the model architecture.

(2) The Information Plane theorem: For large scale typical learning the sample-complexity and accuracy tradeoff is characterized by only two numbers: the mutual information that the representation (a layer in the network) maintain on the input patterns, and the mutual information each layer has on the desired output label. The Information Theoretic optimal tradeoff between thees encoder and decoder information values is given by the Information Bottleneck (IB) bound for the rule specific input-output distribution. (3) The layers of the DNN reach this optimal bound via standard SGD training, in high (input & layers) dimension.

In these series of 3 lectures I will review these results and discuss two new outcomes of this theory: (1) The computational benefit of the hidden layers, (2) the emerging understanding of the features encoded by each layers which follows from the convergence to the IB bound.

Talk 1: Rethinking Computational Learning theory: from worse case probability independent bounds to typical case distribution dependent but algorithm independent generalization bounds. The information plane theorems and the importance of large scale machine learning.

News

From Quanta Magazine:

New Theory Cracks Open the Black Box of Deep Learning: A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.

https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/

The 2019 IBT Award in Mathematical Neuroscience

Lecture 2: The Information Theory of Deep Learning: Towards Interpretable Deep Neural Networks – The role Stochastic Gradient Descent in achieving the Information Bottleneck optimal bound

The role Stochastic Gradient Descent in achieving the Information Bottleneck optimal bound. The two phases of the gradients and the layer compression theorem.

News

From Quanta Magazine:

https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/

The 2019 IBT Award in Mathematical Neuroscience

Lecture 3: The Information Theory of Deep Learning: Towards Interpretable Deep Neural Networks – The computational benefits of the hidden layers and the role of symmetry for the interpretability of the layers

The computational benefits of the hidden layers and the role of symmetry for the interpretability of the layers. How the compression phase enhances the convergence times with more layers and the convergence of the layers to critical points along the IB curve due to critical slowing down.

News

From Quanta Magazine:

https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/

The 2019 IBT Award in Mathematical Neuroscience

Joaquin Vanschoren

Eindhoven University of Technology, The Netherlands

Topics

Automatic machine learning

Biography

Joaquin Vanschoren is Assistant Professor in Machine Learning at the Eindhoven University of Technology. His research focuses on machine learning, meta-learning, and understanding and automating learning. He founded and leads OpenML.org, an open science platform for machine learning. He received several demo and open data awards, has been tutorial speaker at NeurIPS and ECMLPKDD, and invited speaker at ECDA, StatComp, AutoML@ICML, CiML@NIPS, DEEM@SIGMOD, AutoML@PRICAI, MLOSS@NIPS, and many other occasions. He was general chair at LION 2016, program chair of Discovery Science 2018, demo chair at ECMLPKDD 2013, and he co-organizes the AutoML and meta-learning workshop series at NIPS and ICML. He is also co-editor of the book ‘Automatic Machine Learning: Methods, Systems, Challenges’.

Lectures

Lecture 1: Introduction to Automated Machine Learning (AutoML)

Automated machine learning is the science of building machine learning models in a data-driven, efficient, and objective way. It replaces manual trial-and-error with automated, guided processes. In the first lecture, we will examine the most prominent problem in automated machine learning: hyperparameter optimization. We will discuss model-free blackbox optimization methods, Bayesian optimization, as well as evolutionary and other techniques. We will also cover multi-fidelity techniques, such as multi-armed bandits, to speed up the optimization of machine learning models and pipelines.

Lecture 2: Meta-learning

When we learn new skills, we (humans) rarely start from scratch. We start from skills learned earlier in related tasks, and reuse experience accumulated over time. This allows us to learn faster, using much less data and trial-and-error. Learning how to build machine learning models based on prior experience is called meta-learning, or learning to learn. We will cover the spectrum from transferring knowledge about machine learning methods in general, via reasoning across tasks, to transferring previously trained machine learning models. We will also see practical tips on how to do meta-learning with OpenML.

Lecture 3: AutoML and meta-learning for neural networks

Finally, we focus on the automated construction of neural networks. We will survey existing approaches for neural architecture search, including differential (gradient-based) techniques, Bayesian optimization, evolutionary techniques, and reinforcement learning. We will also revisit meta-learning in the context of neural networks, to transfer information about previously tried model architecture to new problems.

Oriol Vinyals

Google DeepMind, UK

Topics

Deep Learning & Reinforcement Learning

Lectures

Lecture 1: Advanced topics: Graph Neural Networks

Recurrent Neural Networks have been the model of choice for processing sequences, but dealing with other structures such as graphs or sets requires models which preserve the invariances present on those, and presents some unique challenges. On the other hand, Transformers have been proposed recently, and have a strong connection with the Graph Neural Networks framework which proposes to structure the computation in a Neural Network as a graph. In this lecture, I’ll discuss these recent advances, and how they connect with one another. The focus of this lecture will, thus, focus on state-of-the-art architectures.

Lecture 2: Reinforcement and Imitation Learning at Scale: AlphaStar and Beyond

Deep Reinforcement Learning has emerged as a sub-field in machine learning which extends the capabilities of Deep Learning systems beyond supervised and unsupervised learning. In the last few years, we have witnessed advances on domains in which complicated decisions must be carried by an “agent” interacting with an “environment”. In this talk, I will summarise the state of deep RL, highlighting successes in StarCraft thanks in part to imitation learning and scaling up self-play. This lecture will focus on imitation learning and the scale using AlphaStar as motivating example.

Lecture 3: Representation Learning With Generative Models

Although in typical applications of machine learning data is seen as in _input_ to a model, generative models _output_ (or generate) it. These models have been extensively used to test our current capabilities and, in some cases, they help guiding our intuitions towards better architectures. Since students should, at this point, be familiar with Generative Models, in my talk I’ll talk about recent advances in representation learning, which concerns the representations learned by the model themselves instead of the qualitative evaluation of the samples generated by these models.

Past Lecturers

The Lecturers of the previous editions:

Roman Belavkin, Middlesex University London, UK
Yoshua Bengio, Head of the Montreal Institute for Learning Algorithms (MILA) & University of Montreal, Canada
Sergiy Butenko, Texas A&M University, USA
Yi-Ke Guo, Imperial College London, UK & Founding Director of Data Science Institute
Peter Norvig, Director of Research, Google
Panos Pardalos, University of Florida, USA
Alex 'Sandy' Pentland, MIT & Director of MIT’s Human Dynamics Laboratory, USA
Marc'Aurelio Ranzato, Facebook AI Research Lab, New York, USA
Aleskerov Z. Fuad, National Research University Higher School of Economics, Russia