Overview of Courses
What courses can I take and how is the whole curriculum structured?
The curriculum is split into courses, and the courses are split into topics, listed below. For example, “Mathematical background for machine learning” is a course and “Linear algebra for machine learning“ is a topic. It is possible to take a given topic without taking the whole course.
What is the structure and timing of the curriculum?
Even though individual courses are cohortbased, the curriculum is structured extremely flexibly. It’s possible to take just one course/topic at a time. It’s possible to take more courses/topics at once. It’s possible to pause your studies for some time and join again when you are less busy. And it’s possible to study intensively and get to your goal quickly.
Of course, not all courses/topics are offered simultaneously. The timing of each course/topic is determined by students’ requests. Courses on more popular subjects are offered more frequently than those that interest a smaller number of students.
What about graduation?
There is no concept of graduation, so you do not have to complete any required part of the curriculum. It’s possible to start learning more intensively, achieve your immediate goal, such as getting a job or getting into a graduate school, and then continue taking our courses less intensively to achieve mastery in more subjects. You will receive guidance on what subjects are good to know to achieve your immediate goal.
What is the difficulty level of the courses? Are there any prerequisites?
Most of the courses are at the level of graduate schools or the level of professional machinelearning engineers. Their content has been designed based on experience with many learners, including university students and machinelearning engineers in corporate settings.
But it is not a problem if you are not familiar with calculus, linear algebra, and/or coding. If you are willing to make effort, you can learn the necessary skills as you take our introductory courses.
What is the length of the class meetings?
Learning each topic is supported by a series of live (now online) lectures / class meetings. Each class meeting is 4550 minutes long, to keep your mind fresh. That means you can join our school even if you have very little free time. If you have more time, you can of course join more meetings per day or per week. Our platform will allow you to do a lot of work outside of the class meetings, to make sure you master each topic.
Our courses are designed to teach skills that would be impossible to learn or hard to learn from massive open online courses. By working closely with the students taking his classes at the University of Tokyo, Michal identified many types of gaps in students' knowledge that massive open online courses are unable to fill. We fix this problem the hard way: by a highly individualized approach with close mentorship and by knowledge, skill, and education quality checks throughout the entire program.
The curriculum is organized into modules (courses), which can be taken independently. Most of our students choose to study all of the modules, but those who are too busy or who have highly focused interests are can take only those modules that are most relevant to them. The whole curriculum is designed to take one or two years, but it is possible to proceed faster.
List of topics
This list is in no particular order, so finding a given topic may require some scrolling.

Neural network introduction

A first introduction to neural networks

An overview of types of neural networks and their applications


Machine learning introduction

A first introduction to machine learning methods


Data manipulation

Data cleaning, transformation, and standardization

Identification of outliers in the data

Data visualization

Data version control


Mathematical background for machine learning

Multivariate calculus for machine learning

Linear algebra for machine learning


Backpropagation for training neural networks

Automatic differentiation and the backpropagation algorithm


Neural network initialization and component normalization

Neural network weight initialization

Batch normalization

Batch, layer, instance, and group normalizations and their conditional and adaptive versions


Model regularization, underfitting, and overfitting

Generalized linear models and regularization

Regularization methods for neural networks


Maximizing the performance of models trained with limited data

Data augmentation techniques

Semisupervised learning for computer vision


Hyperparameter optimization

Simple methods for hyperparameter optimization

Bayesian methods for hyperparameter optimization


Loss functions for machine learning

Designing loss functions for optimal training of neural networks


Optimization

Optimizers for training neural networks

Constrained optimization using Lagrange multipliers


Probability theory

Discrete, continuous, and mixed distributions

Expectations and moments of distributions

Conditional distributions and Bayesian statistics


Statistical testing

Hypothesis testing

Multiple hypothesis testing

Confusing aspects of hypothesis testing


Regression analysis

Introduction to types of regressions and their uses

Regressions under homoscedasticity

Regressions under heteroscedasticity


Maximum likelihood estimation

Discreteoutcome statistical models

Causal inference

Causal inference problems and directed acyclic graphs

Causal docalculus

Treatment effect estimation

Instrumental variable methods


Entropy and informationtheoretic concepts for machine learning

Entropy types and their uses for machine learning

Divergences between distributions and their uses for machine learning


Decision trees

Decision trees and random forests

Gradient boosted decision trees


Computer vision

Introduction to convolutional neural networks (CNNs) for computer vision

Object classification using CNNs

Transposed convolutions in CNNs and checkerboard artifacts

Object detection and image segmentation CNNs

Depthwise separable convolutions in CNNs

Computer vision using selfsupervised learning

Computer vision for video data processing

Computer vision using transformer neural networks

Adversarial attacks and possible defenses against them


Recurrent neural networks (RNN)

Recurrent neural networks and related simple statistical models

Recurrent neural networks with memory cells


Transformer neural networks

Transformer neural networks for computer vision

Transformer neural networks for image generation

Transformer neural networks for natural language processing

Transformer networks for heterogeneous data fusion


Natural language processing (NLP) and code processing

Introduction to natural language processing

Introduction to code processing similar to natural language processing

Language models

Natural language processing before transformers

Word embeddings

Natural language processing using transformers

NLP in combination with computer vision or image generation


Timeseries prediction

Timeseries prediction for tabular data

Timeseries prediction for image/video data


Optimizing neural networks for edge device deployment

Assessing tradeoffs in speed, memory requirements, and computational cost

Neural network distillation, weight quantization

Software for edgedevice or embedded system deployment


Highdimensional spaces

Properties of highdimensional statistical distributions

Data manifolds

Dimensionality reduction


Generative adversarial networks (GAN)

Basic generative adversarial networks

Generative adversarial networks and spectral normalization


Variational autoencoders (VAE)

Basic variational autoencoders

Highperformance variational autoencoders

Variational autoencoders for anomaly detection


Generative modeling

Introduction to generative modeling beyond GANs and VAEs

Energy based models

Autoregressive models

Normalizing flows


Graph neural networks

Graph data and graph neural networks

Advanced graph neural networks


Reinforcement learning

Bandit problems

Markov decision processes

Temporal difference learning

Function approximation for reinforcement learning

Valuebased methods and policy gradient methods

Advanced methods for reinforcement learning


Machinelearning system development strategies and pipelines

Strategies to improve performance on training, validation, test, and inference sets

Strategies for handling edge cases (long tail events)

Strategies for domain adaptation

Model A/B testing and progressive delivery

Machinelearning pipelines for continuous integration, continuous delivery, and continuous training

Monitoring model performance and detecting and managing data drift


Security and privacy

Anonymization of data for privacypreserving model training

Generation of artificial data that follows the same distribution as a confidential dataset

Publickey and symmetrickey cryptography

Homomorphic encryption

Differential privacy

Federated learning


Python language

Data transformations and visualizations in Python

Introduction to objectoriented programming with Python

Techniques for increasing computational speed in Python


PyTorch deep learning framework

PyTorch variables, functions and automatic differentiation

GPU computing with PyTorch

Libraries built on PyTorch


TensorFlow deep learning framework

TensorFlow variables, functions, and automatic differentiation

GPU computing with TensorFlow

Libraries built on TensorFlow


Virtual environments, Docker, and Kubernetes

Virtualenv and Conda environments

Singlecontainer and multicontainer Docker apps

Kubernetes


Spark

Using Apache Spark for distributed data processing


Notebooks for machine learning

Using Jupyter notebooks for experimenting and for preparing productionquality code
