Data Science Track

Our curriculum is now organized not into tracks, but into modules described here . But for your information, we include here a description of the Data Science track before the reorganization.


Visualization and exploratory data analysis


Regression problems

Classification problems

Shapes of probability distributions and hypothesis testing

Nuances of probability theory

Performance evaluation, result significance, and common mistakes

Supervised machine learning

Optimization methods

Standard neural network architectures and related training methods

Practical aspects of training machine learning models

Improving the performance of ML models

Unsupervised learning

Autoencoders, generative models

Time-series analysis

Causal inference

Functional programming

Computational aspects

Command line interfaces and operating systems

GPU computing

Optional: Building one's own GPU-powered computer

Python: language structure

Python: libraries

Python: computation speed

  • Data structures and related tradeoffs
  • Broadcasting between variables of different dimensions
  • Vectorization, "Single Instruction Multiple Data", multiple cores, multiple processors
  • Python libraries designed to make computation speed comparable to C code
  • Dealing with insufficient memory
  • Profiling (processors and memory)

  • Data formats


  • Virtual environments in general
  • Docker properties and related tradeoffs
  • Multi-container Docker applications

  • Production tools

  • Spark: Running parallel computations on a Spark cluster
  • Kubernetes: Basics of container orchestration with Kubernetes

  • Cloud computing

  • Comparison of cloud offerings of major providers
  • Building scalable web applications

  • C language