Deep Learning Track

Deep Learning track

Our curriculum is now organized not into tracks, but into modules described here . But for your information, we include here a description of the Deep Learning track before the reorganization.

Methods

Datasets

Data structures, representations, and transformations
Data sources for deep learning and reinforcement learning

Regression problems

Linear regression with regularization
Logistic regression with regularization
Non-linear regression and non-parametric methods, identifying model misspecification

Classification problems

Use of logistic regression for classification problems
Multinomial classification

Performance evaluation, result significance, and common mistakes

Performance metrics and their relationship to the problem’s practical objective

Dealing with imbalanced datasets

Choosing the right baselines

Ablation studies

Significance of results

Supervised machine learning: basics

Traditional machine learning vs. deep learning: performance, advantages, and disadvantages
Methods of traditional machine learning: decision trees, random forests, support vector machines, and others
Overfitting and underfitting
Regularization methods, bias vs. variance
Dropout and early stopping as regularization methods
Training sets, validation sets, and test sets
Designing a loss function reflecting the project's practical objective
Model ensembles

Optimization methods

Convex vs. non-convex optimization
Stochastic gradient descent, momentum, adaptive optimizers, results of optimizer architecture search, gradient clipping
Backpropagation through a computation graph and related computational efficiency considerations, the importance of understanding the details of backpropagation
Second order methods
Non-gradient-based methods, including evolutionary methods
Influence of the choice of optimization methods on generalization performance
Optional: Variational perspective on momentum-based optimization methods

Deep Learning libraries

Computation graphs

Graph-based computations vs. eager execution

TensorFlow and Keras

PyTorch

Production considerations

Research considerations

Practical aspects of training neural networks

Choices of activation functions and their effects on the gradient flow
Data preprocessing
Data augmentation, mixup, and manifold mixup
Weight initialization, transfer learning, fine-tuning
Batch normalization and reasons behind its good performance
Learning rate choice methods for faster optimizations, including cyclical learning rates
Snapshot ensembling

Practical aspects of training machine learning models

Data normalization and pre-processing
Data augmentation
Using imbalanced datasets
Transfer learning
Semi-supervised learning
Hyperparameter search
Implications of the extent of hyperparameter search for comparing the performance of different methods

Improving the performance of machine-learning models

Heuristic inspection of data
Dealing with a mismatch between collected data and real-world deployment data
Avoiding data leakage
Model ensembles, stacking, bagging, and boosting
Strategies for debugging and interpreting machine learning models

Neural networks based on fully connected layers

Properties and uses of neural networks based on fully connected layers

Convolutional neural networks (CNN)

First convolutional neural networks and their neuroscience motivation
Mechanics of the convolution layer
Strides, padding, pooling
Residual networks and other networks with skip connections
Spatial transformer networks
Image segmentation

Recurrent neural networks (RNN)

Simple recurrent neural networks
Backpropagation through time and the vanishing/exploding gradient problem
Long Short-term Memory Networks (LSTM) and Gated Recurrent Units (GRU)
Bi-directional recurrent neural networks

Neural networks with entity embedding layers

Visualization of high-dimensional spaces, t-Distributed Stochastic Neighbor Embedding (t-SNE)
Entity embedding, conditional entity embedding
Word vectors

Natural Language Processing (NLP)

Recurrent neural networks for NLP
Attention
Transformer networks for NLP
Models using contextual word representations

Unsupervised machine learning: basics

Clustering
Principal component analysis and its relationship to singular value decomposition, independent component analysis

Autoencoders

Autoencoders: a general introduction
Variational autoencoders (VAE)

Generative Adversarial Networks (GAN)

Generative adversarial networks: introduction
Wasserstein GAN
Image-to-image translation, conditional GANs, unpaired image-to-image translation
Adversarial domain adaptation
Combinations of GANs and VAEs

Vulnerabilities of machine-learning systems

Failure cases and their importance
Adversarial examples and possible defenses against them
Unintended biases in machine learning systems and methods for correcting them
Data privacy issues

Time-series analysis

Autoregressive processes, vector autoregression
Common mistakes when analyzing time-series data
Applying recurrent neural networks to continuous-variable time-series analysis

Causal inference

Instrumental variables: Two-stage least squares
Instrumental variables: Non-linear models/machine-learning models
Causal calculus ("do-calculus")

Reinforcement learning

Bandit problems
Dynamic programming, Bellman equations
Bootstrapping vs. sampling
Temporal difference learning: Sarsa, Q-learning, Deep Q-Networks (DQN)
Policy gradient methods: REINFORCE algorithm without and with a baseline, actor-critic methods
Deep Deterministic Policy Gradient (DDPG)
Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO)

Benchmarks

Common benchmarks for specific machine-learning tasks
Improvements of benchmark performance over time

Computational aspects

Command line interfaces and operating systems

Data science environments on Linux, MacOS, and Windows
Basics of the Ubuntu Linux operating system
Basic bash commands, Bash scripts, less known but highly convenient Bash commands

GPU computing

GPU architectures and types of processing units inside GPUs
GPU performance metrics and tradeoffs between them
Types of code that can and cannot be accelerated by GPUs
Low-level and high-level frameworks for GPU computing
GPU computing in the cloud
Systolic arrays, tensor cores, and TPUs

Optional: Building one's own GPU-powered computer

Price vs. performance considerations, component choice, assembly

Python: language structure

Guidance on code structure, style conventions, and naming of variables
Variable types and their properties
Object-oriented programming
Python packages, libraries, and frameworks
Python language pitfalls and common mistakes

Python: libraries

Libraries for data analysis and numerical computations on CPUs
RAPIDS for computations on GPUs
Visualization libraries
Deep learning libraries TensorFlow, Keras, and PyTorch
Web development frameworks
Web scraping libraries

Python: computation speed

Data structures and related tradeoffs
Broadcasting between variables of different dimensions
Vectorization, "Single Instruction Multiple Data", multiple cores, multiple processors
Python libraries designed to make computation speed comparable to C code
Dealing with insufficient memory
Profiling (processors and memory)

Data formats

Transforming and processing data in various standard and less standard formats
Relational databases, NoSQL databases, their advantages and disadvantages
Building data pipelines

Docker

Virtual environments in general
Docker properties and related tradeoffs
Multi-container Docker applications

Cloud computing

Comparison of cloud offerings of major providers
Building scalable web applications