Overview of Courses
What courses can I take and how is the whole curriculum structured?
The curriculum is split into courses, and the courses are split into topics, listed below. For example, “Mathematical background for machine learning” is a course and “Linear algebra for machine learning“ is a topic. It is possible to take a given topic without taking the whole course.
What is the structure and timing of the curriculum?
Even though individual courses are cohort-based, the curriculum is structured extremely flexibly. It’s possible to take just one course/topic at a time. It’s possible to take more courses/topics at once. It’s possible to pause your studies for some time and join again when you are less busy. And it’s possible to study intensively and get to your goal quickly.
Of course, not all courses/topics are offered simultaneously. The timing of each course/topic is determined by students’ requests. Courses on more popular subjects are offered more frequently than those that interest a smaller number of students.
What about graduation?
There is no concept of graduation, so you do not have to complete any required part of the curriculum. It’s possible to start learning more intensively, achieve your immediate goal, such as getting a job or getting into a graduate school, and then continue taking our courses less intensively to achieve mastery in more subjects. You will receive guidance on what subjects are good to know to achieve your immediate goal.
What is the difficulty level of the courses? Are there any prerequisites?
Most of the courses are at the level of graduate schools or the level of professional machine-learning engineers. Their content has been designed based on experience with many learners, including university students and machine-learning engineers in corporate settings.
But it is not a problem if you are not familiar with calculus, linear algebra, and/or coding. If you are willing to make effort, you can learn the necessary skills as you take our introductory courses.
What is the length of the class meetings?
Learning each topic is supported by a series of live (now online) lectures / class meetings. Each class meeting is 45-50 minutes long, to keep your mind fresh. That means you can join our school even if you have very little free time. If you have more time, you can of course join more meetings per day or per week. Our platform will allow you to do a lot of work outside of the class meetings, to make sure you master each topic.
Our courses are designed to teach skills that would be impossible to learn or hard to learn from massive open online courses. By working closely with the students taking his classes at the University of Tokyo, Michal identified many types of gaps in students' knowledge that massive open online courses are unable to fill. We fix this problem the hard way: by a highly individualized approach with close mentorship and by knowledge, skill, and education quality checks throughout the entire program.
The curriculum is organized into modules (courses), which can be taken independently. Most of our students choose to study all of the modules, but those who are too busy or who have highly focused interests are can take only those modules that are most relevant to them. The whole curriculum is designed to take one or two years, but it is possible to proceed faster.
List of topics
This list is in no particular order, so finding a given topic may require some scrolling.
-
Neural network introduction
-
A first introduction to neural networks
-
An overview of types of neural networks and their applications
-
-
Machine learning introduction
-
A first introduction to machine learning methods
-
-
Data manipulation
-
Data cleaning, transformation, and standardization
-
Identification of outliers in the data
-
Data visualization
-
Data version control
-
-
Mathematical background for machine learning
-
Multivariate calculus for machine learning
-
Linear algebra for machine learning
-
-
Backpropagation for training neural networks
-
Automatic differentiation and the backpropagation algorithm
-
-
Neural network initialization and component normalization
-
Neural network weight initialization
-
Batch normalization
-
Batch, layer, instance, and group normalizations and their conditional and adaptive versions
-
-
Model regularization, underfitting, and overfitting
-
Generalized linear models and regularization
-
Regularization methods for neural networks
-
-
Maximizing the performance of models trained with limited data
-
Data augmentation techniques
-
Semi-supervised learning for computer vision
-
-
Hyperparameter optimization
-
Simple methods for hyperparameter optimization
-
Bayesian methods for hyperparameter optimization
-
-
Loss functions for machine learning
-
Designing loss functions for optimal training of neural networks
-
-
Optimization
-
Optimizers for training neural networks
-
Constrained optimization using Lagrange multipliers
-
-
Probability theory
-
Discrete, continuous, and mixed distributions
-
Expectations and moments of distributions
-
Conditional distributions and Bayesian statistics
-
-
Statistical testing
-
Hypothesis testing
-
Multiple hypothesis testing
-
Confusing aspects of hypothesis testing
-
-
Regression analysis
-
Introduction to types of regressions and their uses
-
Regressions under homoscedasticity
-
Regressions under heteroscedasticity
-
-
Maximum likelihood estimation
-
Discrete-outcome statistical models
-
Causal inference
-
Causal inference problems and directed acyclic graphs
-
Causal do-calculus
-
Treatment effect estimation
-
Instrumental variable methods
-
-
Entropy and information-theoretic concepts for machine learning
-
Entropy types and their uses for machine learning
-
Divergences between distributions and their uses for machine learning
-
-
Decision trees
-
Decision trees and random forests
-
Gradient boosted decision trees
-
-
Computer vision
-
Introduction to convolutional neural networks (CNNs) for computer vision
-
Object classification using CNNs
-
Transposed convolutions in CNNs and checkerboard artifacts
-
Object detection and image segmentation CNNs
-
Depth-wise separable convolutions in CNNs
-
Computer vision using self-supervised learning
-
Computer vision for video data processing
-
Computer vision using transformer neural networks
-
Adversarial attacks and possible defenses against them
-
-
Recurrent neural networks (RNN)
-
Recurrent neural networks and related simple statistical models
-
Recurrent neural networks with memory cells
-
-
Transformer neural networks
-
Transformer neural networks for computer vision
-
Transformer neural networks for image generation
-
Transformer neural networks for natural language processing
-
Transformer networks for heterogeneous data fusion
-
-
Natural language processing (NLP) and code processing
-
Introduction to natural language processing
-
Introduction to code processing similar to natural language processing
-
Language models
-
Natural language processing before transformers
-
Word embeddings
-
Natural language processing using transformers
-
NLP in combination with computer vision or image generation
-
-
Time-series prediction
-
Time-series prediction for tabular data
-
Time-series prediction for image/video data
-
-
Optimizing neural networks for edge device deployment
-
Assessing tradeoffs in speed, memory requirements, and computational cost
-
Neural network distillation, weight quantization
-
Software for edge-device or embedded system deployment
-
-
High-dimensional spaces
-
Properties of high-dimensional statistical distributions
-
Data manifolds
-
Dimensionality reduction
-
-
Generative adversarial networks (GAN)
-
Basic generative adversarial networks
-
Generative adversarial networks and spectral normalization
-
-
Variational autoencoders (VAE)
-
Basic variational autoencoders
-
High-performance variational autoencoders
-
Variational autoencoders for anomaly detection
-
-
Generative modeling
-
Introduction to generative modeling beyond GANs and VAEs
-
Energy based models
-
Autoregressive models
-
Normalizing flows
-
-
Graph neural networks
-
Graph data and graph neural networks
-
Advanced graph neural networks
-
-
Reinforcement learning
-
Bandit problems
-
Markov decision processes
-
Temporal difference learning
-
Function approximation for reinforcement learning
-
Value-based methods and policy gradient methods
-
Advanced methods for reinforcement learning
-
-
Machine-learning system development strategies and pipelines
-
Strategies to improve performance on training, validation, test, and inference sets
-
Strategies for handling edge cases (long tail events)
-
Strategies for domain adaptation
-
Model A/B testing and progressive delivery
-
Machine-learning pipelines for continuous integration, continuous delivery, and continuous training
-
Monitoring model performance and detecting and managing data drift
-
-
Security and privacy
-
Anonymization of data for privacy-preserving model training
-
Generation of artificial data that follows the same distribution as a confidential dataset
-
Public-key and symmetric-key cryptography
-
Homomorphic encryption
-
Differential privacy
-
Federated learning
-
-
Python language
-
Data transformations and visualizations in Python
-
Introduction to object-oriented programming with Python
-
Techniques for increasing computational speed in Python
-
-
PyTorch deep learning framework
-
PyTorch variables, functions and automatic differentiation
-
GPU computing with PyTorch
-
Libraries built on PyTorch
-
-
TensorFlow deep learning framework
-
TensorFlow variables, functions, and automatic differentiation
-
GPU computing with TensorFlow
-
Libraries built on TensorFlow
-
-
Virtual environments, Docker, and Kubernetes
-
Virtualenv and Conda environments
-
Single-container and multi-container Docker apps
-
Kubernetes
-
-
Spark
-
Using Apache Spark for distributed data processing
-
-
Notebooks for machine learning
-
Using Jupyter notebooks for experimenting and for preparing production-quality code
-