Data Science Track

Our curriculum is now organized not into tracks, but into modules described here . But for your information, we include here a description of the Data Science track before the reorganization.



Methods


Visualization and exploratory data analysis


Datasets


Regression problems


Classification problems


Shapes of probability distributions and hypothesis testing


Nuances of probability theory


Performance evaluation, result significance, and common mistakes


Supervised machine learning


Optimization methods


Standard neural network architectures and related training methods


Practical aspects of training machine learning models


Improving the performance of ML models


Unsupervised learning


Autoencoders, generative models


Time-series analysis


Causal inference


Functional programming



Computational aspects


Command line interfaces and operating systems


GPU computing


Optional: Building one's own GPU-powered computer


Python: language structure


Python: libraries


Python: computation speed

  • Data structures and related tradeoffs
  • Broadcasting between variables of different dimensions
  • Vectorization, "Single Instruction Multiple Data", multiple cores, multiple processors
  • Python libraries designed to make computation speed comparable to C code
  • Dealing with insufficient memory
  • Profiling (processors and memory)

  • Data formats


    Docker

  • Virtual environments in general
  • Docker properties and related tradeoffs
  • Multi-container Docker applications

  • Production tools

  • Spark: Running parallel computations on a Spark cluster
  • Kubernetes: Basics of container orchestration with Kubernetes

  • Cloud computing

  • Comparison of cloud offerings of major providers
  • Building scalable web applications

  • C language


    Cryptography