Lectures

Lecture 1: Model-based reinforcement learning I

The field of deep reinforcement learning combines deep learning techniques with reinforcement learning algorithms to develop intelligent agents that can tackle a wide variety of challenging tasks. Recent years have seen the development of a wide class of deep reinforcement learning agents which have been successfully employed in complex environments such as video games, board games and robotics. However, most of the current state-of-the-art agents employ methods which belong to the model-free class of RL algorithms. In this lecture we will take a look into a different class of RL agents which constitute the class of model-based algorithms. These agents make use of an internal model of the world in order to optimise their acting policy. This talk will present different model-based approaches and will consider their pros and cons in comparison to their model-free counterparts.

Lecture 2: Model-based reinforcement learning II

In this lecture we will dive deeper into the different approaches of model-based reinforcement learning. We will examine different modes of model learning (pixel based, implicit, stochastic models etc) and how they can be utilised for planning, augmenting real experience or as auxiliary tasks.

Lecture 3: AlphaZero: A general model-based planning reinforcement learning algorithm for board games

Board games have been widely used in the field of artificial intelligence as test-beds for the development of new algorithms. Chess and Go are among the most studied games in AI as they represent a class of complicated and self-contained environments ideal for AI research. Previous attempts to achieve super-human performance in these domains led to the development of highly specialised and domain-specific methods. In this lecture we will examine AlphaZero, a reinforcement learning algorithm which has mastered the board games of Go, Chess and Shogi achieving state-of-the-art performance without requiring any domain specific adaptations or human data. Unlike AlphaGo, AlphaZero is trained purely through self-play completely from scratch without using any prior human knowledge.



Lecture 1: KNIME training session 1

The first session presents an overview and trends of the leading data science and machine learning platforms. Gartner defines these type of platforms as “a cohesive software application that offers a mixture of basic building blocks essential for creating all kinds of data science solution, and for incorporating those solutions into business processes, surrounding infrastructure and products.” Among them the KNIME Analytics Platform is a free, user-friendly and open-source software. In this first session KNIME is introduced in details by looking at its framework and most useful functionalities. KNIME can be a suitable tool for data-science individual users, teams as well as machine learning experts and developers.

Lecture 2: KNIME training session 2

KNIME is an open platform: it means it can integrate many other tools, software and libraries, which are useful and popular, for example, for developing specific machine learning solutions or for deploying the solution to a Big Data infrastructure. It can also be extended by means of software components (plugins) to add a specific feature or a new algorithm. In this second session dedicated to KNIME, the advanced KNIME functionalities are introduced. These includes embedded programming structures, extensions, the KNIME server and the Web portal. Finally, a brief look at the KNIME programming framework from a developer perspective to extend KNIME and to incorporate specific functionalities (e.g. data import and integration, novel algorithms, visualisation methods), which can turn a general purpose tool into a customised environment, tailored to a specific context.

Lecture 3: Data-intensive Knowledge Discovery from Brain Imaging of Alzheimer’s Disease Patients

Alzheimer’s disease (AD) is a chronic neurodegenerative disease which is largely responsible for dementia in around 6% of the population aged 65 and above. The growing availability of human brain data generated by imaging techniques, such as Magnetic Resonance Imaging (MRI), have resulted in an opportunity to apply data science approaches for the diagnosis of neurological disorders. The knowledge discovery process typically involves complex data workflows that combine pre-processing techniques, statistical methods, machine learning and data mining algorithms, post-processing and visualisation techniques. This talk presents specific research efforts in this direction and highlights some open issues and challenges that are typical of real-world applications of Data Science and Artificial Intelligence.



Lecture 1: The Principle of Least Cognitive Action

In this talk we introduce the principle of Least Cognitive Action with the purpose of understanding perceptual learning processes. The principle closely parallels related approaches in physics, and suggests to regard neural networks as systems whose weights are Lagrangian variables, namely functions depending on time. Interestingly, neural networks “conquer their own life” and there is no neat distinction between learning and test; their behavior is characterized by the stationarity of the cognitive action, an appropriate functional which contains a potential and a kinetic term. While the potential term is somewhat related to the loss function used in supervised and unsupervised learning, the kinetic term represents the energy connected with the velocity of weight change. Unlike traditional gradient descent, the stationarity of the cognitive action yields differential equations in the connection weights, and gives rise to a dissipative process which is needed to yield ordered configurations. We give conditions under which this learning process reduces to stochastic gradient descent and to Backpropagation.

Lecture 2: Quasi-Periodic Temporal Environments

The formulation of learning according to the principle of Least Cognitive Action requires an appropriate definition of the boundary conditions for the solution of the Euler-Lagrange (E-L) differential equation of Learning. We claim that real-world problems of learning can always be formulated in such a way to be consistent with boundary conditions that correspond with brief reset of the input. Interestingly, this formulation does not correspond with classic optimization approaches like gradient descent over a given training set, since the E-L differential equations turns out to be a truly learning law which consistently corresponds with the null variation of the action. Finally, the presence of statistical regularities is given a new form by assuming to process on “quasi-periodic temporal environments” where any input is “nearly-repeated” somewhere in the future.

Lecture 3: Developmental Visual Agents

The principle of Least Cognitive Action can naturally be applied in vision, especially when one is interested in processing video more than big collections of frames. We show that information-based and parsimony principles joined with a motion-invariance constraint leads to a computational model that can uniformly used for visual features, objects and abstract visual entities. The model nicely addresses a number of questions that we pose for a good theory of vision, and gives the basis for the construction of new convolutional networks and models for computer vision. Experiments are presented to show the effectiveness of the theory.



Lecture 1: Introduction to Generative Adversarial Networks

I will cover the basic theory and practice of generative adversarial networks (GANs). These are a powerful and currently popular class of generative model. In these models, a generator network is trained to map random noise to an output distribution that is indistinguishable from the distribution of natural photos, according to an adversary that tries to classify between these two distributions. GANs can synthesize remarkably realistic photos, sounds, and other kinds of data, but the realism often comes at the cost of limited diversity in the samples. I will cover how this can be caused by “mode collapse” and some ways to ameliorate it. I will also talk about a variety of architectures and objectives used in current GAN practice.

Lecture 2: Conditional GANs and Data Prediction

GANs can hallucinate realistic photos but what if we don’t want to just make up data from scratch? More often, we are given some data and wish to make predictions based on it. For example, given the current view of the world, predict what the future will look like. This is an application for conditional GANs, which condition on observed data and make a prediction about unobserved data. Unlike most traditional predictors, conditional GANs are adept at dealing with both high-dimensional input observations and high-dimensional output predictions. I will show a number of applications in vision and robotics.

Lecture 3: GANs for Domain Translation

GANs can be understood as a tool for mapping one data distribution into another. In this third lecture on GANs, we will adopt this perspective and see how it leads to powerful applications for domain adaptation and translation. The idea is to learn a mapping from a source domain to a target domain such that the output is identically distributed as the target domain. I will show how this can be used to translate between various visual styles (e.g., turn photos into monet paintings), make predictions from very limited supervision (e.g., in medical imaging, where supervision is expensive), and achieve “sim2real” transfer, where we train a robotic policy on simulated data but apply it in the real world.



Lecture 1: Learning in the factory and in the wild: designing robot systems that learn

We examine the general problem of designing robot systems from a decision-theoretic perspective that makes it clear when an individual needs to learn when it is actually performing its tasks (in the wild) and when the AI engineers need to use learning methods to design a good robot learner. We will examine several learning paradigms from this perspective and talk about methods for meta-learning, including modular meta-learning and graph element networks.

Lecture 2: Learning factored transition models for planning in complex hybrid spaces

Many robotics problem distributions are better addressed by learning models and using them to do online reasoning (an approach also known as model-predictive control) than by learning a policy or value function. We begin by discussing this claim, and then study the forms of models that are most appropriate for different types of planning problems. We then examine two new approaches for learning models that are appropriate for planning in complex hybrid (mixed discrete and continuous) problems, such as robot task and motion planning. One approach is based on Gaussian-process active learning and another on an extension of graph neural networks.

Lecture 3: Learning to speed up planning in complex hybrid spaces

An important role for learning is to speed up search: this is the critical role that learning plays in methods such as Alpha-Zero. We will examine several different mechanisms that can be used (including learning heuristic or static evaluation functions and learning to bias the action sampling distribution), with a focus on problems that require choosing actions from a continuous or hybrid space.



Building Iride: how to mix deep learning and Ontologies techiniques to understand language

Almawave is an italian AI company based in Rome, specialized on Natural Language processing and speech analytics. NLP is multidisciplinary science that mixes computer science, psychology and linguistics.  Deep learning techniques achieved important results in a wide variety of NLP tasks, but language understanding “still requires a deep knowledge of the technological domain”. In this session we present IRIDE a proprietary NLP engine, that embraces and mixes an ontologies based approach and the most recent DL techniques in order to solve NLP tasks likes sentiment analysis and named entity recognition.

 

Speakers:

Raniero Romagnoli – CTO Almawave & Vincenzo SciaccaAI Technical Evangelist Almawave

 



Lecture 1: Latest advances in enhancing Interpretability in Data Science via means of Mathematical Optimization (Part 1)

Data Science aims to develop models that extract knowledge from complex data and represent it to aid Data Driven Decision Making. Mathematical Optimization has played a crucial role across the three main pillars of Data Science, namely Supervised Learning, Unsupervised Learning and Information Visualization. For instance, Quadratic Programming is used in Support Vector Machines, a Supervised Learning tool. Mixed-Integer Programming is used in Clustering, an Unsupervised Learning task. Global Optimization is used in MultiDimensional Scaling, an Information Visualization tool.

Data Science models should strike a balance between accuracy and interpretability. Interpretability is desirable, for instance, in medical diagnosis; it is required by regulators for models aiding, for instance, credit scoring; and since 2018 the EU extends this requirement by imposing the so-called right-to-explanation. In the first lecture, we show that Mathematical Optimization is the natural tool to model the trade-off between accuracy and interpretability.

Lecture 2: Latest advances in enhancing Interpretability in Data Science via means of Mathematical Optimization (Part 2)

In the second lecture, we zoom in and talk about the optimization of classification trees, to enhance their accuracy without harming interpretability.

Lecture 3: Latest advances in enhancing Interpretability in Data Science via means of Mathematical Optimization (part 3)

Finally, in the third lecture we discuss black-box methods such as support vector machines and how we can enhance their interpretability.



Lecture 1: Unsupervised Learning: Learning Deep Generative Models

In this tutorial lecture, I will introduce mathematical basics of many popular deep generative models, including Variational Autoencoders (VAE), Generative Adversarial Networks (GANs), Deep Energy-based Models, and  Deep Boltzmann Machines (DBMs), and show that they can learn useful hierarchical representations from large volumes of high-dimensional data. Throughout the tutorial I will also discuss application areas, including visual object recognition and video analysis, and language understanding.

Lecture 2: Deep Learning for Natural Language Processing/Reading Comprehension

In this lecture,  I will first provide an overview of various deep learning models that can find semantically meaningful representations of words, learn to read documents and answer questions about their content. I will show how we can encode external linguistic knowledge as an explicit memory in recurrent neural networks, and use it to model co-reference relations in text. I will further discuss neural architectures, such as Transformer-XL, that enables us to learn dependency beyond a fixed length without disrupting temporal coherence.

Lecture 3: Integrating Domain-Knowledge into Deep Learning

In this talk, I will discuss various ways of incorporating domain knowledge within deep learning models, including relational and logical knowledge. I will introduce methods that can augment neural representation of text with relational data from Knowledge Bases for question answering, and show how we can use structured prior knowledge from Knowledge Graphs for image classification. Finally, I will introduce the notion of  structured memory as being a crucial part of an intelligent agent’s ability to plan and reason in partially observable environments and demonstrate a deep reinforcement learning agent that can learn to store arbitrary information about the environment over long time lags. I will show that on several tasks these models significantly improve upon many of the existing techniques.



Lecture 1
Lecture 2
Lecture 3


Lecture 1: The Information Theory of Deep Learning: Towards Interpretable Deep Neural Networks – Rethinking Computational Learning theory
In the past several years we have developed a comprehensive theory of large scale learning with Deep Neural Networks (DNN), when optimized with Stochastic Gradient Decent (SGD). The theory is built on three theoretical components: (1) rethinking the standard (PAC like) distribution independent worse case generalisation bounds – turning them to problem dependent typical (in the Information Theory sense) bounds that are independent of the model architecture.
(2) The Information Plane theorem: For large scale typical learning the sample-complexity and accuracy tradeoff is characterized by only two numbers: the mutual information that the representation (a layer in the network) maintain on the input patterns, and the mutual information each layer has on the desired output label. The Information Theoretic optimal tradeoff between thees encoder and decoder information values is given by the Information Bottleneck (IB) bound for the rule specific input-output distribution.  (3) The layers of the DNN reach this optimal bound via standard SGD training, in high (input & layers) dimension.
In these series of 3 lectures I will review these results and discuss two new outcomes of this theory: (1) The computational benefit of the hidden layers, (2) the emerging understanding of the features encoded by each layers which follows from the convergence to the IB bound.
Talk 1: Rethinking Computational Learning theory:  from worse case probability independent bounds to typical case distribution dependent but algorithm independent generalization bounds.  The information plane theorems and the importance of large scale machine learning.

News

New Theory Cracks Open the Black Box of Deep Learning: A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.

https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/

Lecture 2: The Information Theory of Deep Learning: Towards Interpretable Deep Neural Networks – The role Stochastic Gradient Descent in achieving the Information Bottleneck optimal bound

The role Stochastic Gradient Descent in achieving the Information Bottleneck optimal bound. The two phases of the gradients and the layer compression theorem.

 

News

New Theory Cracks Open the Black Box of Deep Learning: A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.

https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/

Lecture 3: The Information Theory of Deep Learning: Towards Interpretable Deep Neural Networks – The computational benefits of the hidden layers and the role of symmetry for the interpretability of the layers

The computational benefits of the hidden layers and the role of symmetry for the interpretability of the layers. How the compression phase enhances the convergence times with more layers and the convergence of the layers to critical points along the IB curve due to critical slowing down.

 

News

New Theory Cracks Open the Black Box of Deep Learning: A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.

https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/



Lecture 1: Introduction to Automated Machine Learning (AutoML)

Automated machine learning is the science of building machine learning models in a data-driven, efficient, and objective way. It replaces manual trial-and-error with automated, guided processes. In the first lecture, we will examine the most prominent problem in automated machine learning: hyperparameter optimization. We will discuss model-free blackbox optimization methods, Bayesian optimization, as well as evolutionary and other techniques. We will also cover multi-fidelity techniques, such as multi-armed bandits, to speed up the optimization of machine learning models and pipelines.

Lecture 2: Meta-learning

When we learn new skills, we (humans) rarely start from scratch. We start from skills learned earlier in related tasks, and reuse experience accumulated over time. This allows us to learn faster, using much less data and trial-and-error. Learning how to build machine learning models based on prior experience is called meta-learning, or learning to learn. We will cover the spectrum from transferring knowledge about machine learning methods in general, via reasoning across tasks, to transferring previously trained machine learning models. We will also see practical tips on how to do meta-learning with OpenML.

Lecture 3: AutoML and meta-learning for neural networks

Finally, we focus on the automated construction of neural networks. We will survey existing approaches for neural architecture search, including differential (gradient-based) techniques, Bayesian optimization, evolutionary techniques, and reinforcement learning. We will also revisit meta-learning in the context of neural networks, to transfer information about previously tried model architecture to new problems.



Lecture 1: Advanced topics: Graph Neural Networks

Recurrent Neural Networks have been the model of choice for processing sequences, but dealing with other structures such as graphs or sets requires models which preserve the invariances present on those, and presents some unique challenges. On the other hand, Transformers have been proposed recently, and have a strong connection with the Graph Neural Networks framework which proposes to structure the computation in a Neural Network as a graph. In this lecture, I’ll discuss these recent advances, and how they connect with one another. The focus of this lecture will, thus, focus on state-of-the-art architectures.

Lecture 2: Reinforcement and Imitation Learning at Scale: AlphaStar and Beyond

Deep Reinforcement Learning has emerged as a sub-field in machine learning which extends the capabilities of Deep Learning systems beyond supervised and unsupervised learning. In the last few years, we have witnessed advances on domains in which complicated decisions must be carried by an “agent” interacting with an “environment”. In this talk, I will summarise the state of deep RL, highlighting successes in StarCraft thanks in part to imitation learning and scaling up self-play. This lecture will focus on imitation learning and the scale using AlphaStar as motivating example.

Lecture 3: Representation Learning With Generative Models

Although in typical applications of machine learning data is seen as  in _input_ to a model, generative models _output_ (or generate) it. These models have been extensively used to test our current  capabilities and, in some cases, they help guiding our intuitions towards better architectures. Since students should, at this point, be familiar with Generative Models, in my talk I’ll talk about recent advances in representation learning, which concerns the representations learned by the model themselves instead of the qualitative evaluation of the samples generated by these models.