## Deep Learning track

Here we provide a list of topics covered by the Deep Learning track, split into methods and computational aspects. The ordering of topics does not reflect the order in which they will be introduced.

As explained in the overview of courses, the track consists of four levels. If you would like to understand how the curriculum is reflected by each of the four levels, please contact us, and we'll be happy to explain.

### Methods

#### Datasets

- Data structures, representations, and transformations
- Data sources for deep learning and reinforcement learning

#### Regression problems

- Linear regression with regularization
- Logistic regression with regularization
- Non-linear regression and non-parametric methods, identifying model misspecification

#### Classification problems

- Use of logistic regression for classification problems
- Multinomial classification

#### Performance evaluation, result significance, and common mistakes

#### Supervised machine learning: basics

- Traditional machine learning vs. deep learning: performance, advantages, and disadvantages
- Methods of traditional machine learning: decision trees, random forests, support vector machines, and others
- Overfitting and underfitting
- Regularization methods, bias vs. variance
- Dropout and early stopping as regularization methods
- Training sets, validation sets, and test sets
- Designing a loss function reflecting the project's practical objective
- Model ensembles

#### Optimization methods

- Convex vs. non-convex optimization
- Stochastic gradient descent, momentum, adaptive optimizers, results of optimizer architecture search, gradient clipping
- Backpropagation through a computation graph and related computational efficiency considerations, the importance of understanding the details of backpropagation
- Second order methods
- Non-gradient-based methods, including evolutionary methods
- Influence of the choice of optimization methods on generalization performance
- Optional: Variational perspective on momentum-based optimization methods

#### Deep Learning libraries

#### Practical aspects of training neural networks

- Choices of activation functions and their effects on the gradient flow
- Data preprocessing
- Data augmentation, mixup, and manifold mixup
- Weight initialization, transfer learning, fine-tuning
- Batch normalization and reasons behind its good performance
- Learning rate choice methods for faster optimizations, including cyclical learning rates
- Snapshot ensembling

#### Practical aspects of training machine learning models

- Data normalization and pre-processing
- Data augmentation
- Using imbalanced datasets
- Transfer learning
- Semi-supervised learning
- Hyperparameter search
- Implications of the extent of hyperparameter search for comparing the performance of different methods

#### Improving the performance of machine-learning models

- Heuristic inspection of data
- Dealing with a mismatch between collected data and real-world deployment data
- Avoiding data leakage
- Model ensembles, stacking, bagging, and boosting
- Strategies for debugging and interpreting machine learning models

#### Neural networks based on fully connected layers

- Properties and uses of neural networks based on fully connected layers

#### Convolutional neural networks (CNN)

- First convolutional neural networks and their neuroscience motivation
- Mechanics of the convolution layer
- Strides, padding, pooling
- Residual networks and other networks with skip connections
- Spatial transformer networks
- Image segmentation

#### Recurrent neural networks (RNN)

- Simple recurrent neural networks
- Backpropagation through time and the vanishing/exploding gradient problem
- Long Short-term Memory Networks (LSTM) and Gated Recurrent Units (GRU)
- Bi-directional recurrent neural networks

#### Neural networks with entity embedding layers

- Visualization of high-dimensional spaces, t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Entity embedding, conditional entity embedding
- Word vectors

#### Natural Language Processing (NLP)

- Recurrent neural networks for NLP
- Attention
- Transformer networks for NLP
- Models using contextual word representations

#### Unsupervised machine learning: basics

- Clustering
- Principal component analysis and its relationship to singular value decomposition, independent component analysis

#### Autoencoders

- Autoencoders: a general introduction
- Variational autoencoders (VAE)

#### Generative Adversarial Networks (GAN)

- Generative adversarial networks: introduction
- Wasserstein GAN
- Image-to-image translation, conditional GANs, unpaired image-to-image translation
- Adversarial domain adaptation
- Combinations of GANs and VAEs

#### Vulnerabilities of machine-learning systems

- Failure cases and their importance
- Adversarial examples and possible defenses against them
- Unintended biases in machine learning systems and methods for correcting them
- Data privacy issues

#### Time-series analysis

- Autoregressive processes, vector autoregression
- Common mistakes when analyzing time-series data
- Applying recurrent neural networks to continuous-variable time-series analysis

#### Causal inference

- Instrumental variables: Two-stage least squares
- Instrumental variables: Non-linear models/machine-learning models
- Causal calculus ("do-calculus")

#### Reinforcement learning

- Bandit problems
- Dynamic programming, Bellman equations
- Bootstrapping vs. sampling
- Temporal difference learning: Sarsa, Q-learning, Deep Q-Networks (DQN)
- Policy gradient methods: REINFORCE algorithm without and with a baseline, actor-critic methods
- Deep Deterministic Policy Gradient (DDPG)
- Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO)

#### Benchmarks

- Common benchmarks for specific machine-learning tasks
- Improvements of benchmark performance over time

### Computational aspects

#### Command line interfaces and operating systems

- Data science environments on Linux, MacOS, and Windows
- Basics of the Ubuntu Linux operating system
- Basic bash commands, Bash scripts, less known but highly convenient Bash commands

#### GPU computing

- GPU architectures and types of processing units inside GPUs
- GPU performance metrics and tradeoffs between them
- Types of code that can and cannot be accelerated by GPUs
- Low-level and high-level frameworks for GPU computing
- GPU computing in the cloud
- Systolic arrays, tensor cores, and TPUs

#### Optional: Building one's own GPU-powered computer

- Price vs. performance considerations, component choice, assembly

#### Python: language structure

- Guidance on code structure, style conventions, and naming of variables
- Variable types and their properties
- Object-oriented programming
- Python packages, libraries, and frameworks
- Python language pitfalls and common mistakes

#### Python: libraries

- Libraries for data analysis and numerical computations on CPUs
- RAPIDS for computations on GPUs
- Visualization libraries
- Deep learning libraries TensorFlow, Keras, and PyTorch
- Web development frameworks
- Web scraping libraries

#### Python: computation speed

- Data structures and related tradeoffs
- Broadcasting between variables of different dimensions
- Vectorization, "Single Instruction Multiple Data", multiple cores, multiple processors
- Python libraries designed to make computation speed comparable to C code
- Dealing with insufficient memory
- Profiling (processors and memory)

#### Data formats

- Transforming and processing data in various standard and less standard formats
- Relational databases, NoSQL databases, their advantages and disadvantages
- Building data pipelines

#### Docker

- Virtual environments in general
- Docker properties and related tradeoffs
- Multi-container Docker applications

#### Cloud computing

- Comparison of cloud offerings of major providers
- Building scalable web applications