Foundations of Data Science
CDT in Mathematics of Random Systems
Michaelmas (September 2023)
Mihai Cucuringu
- Lecture 1: Introduction & Roadmap (Slides)
- Lecture 2: Statistical Machine Learning (Slides)
- Lecture 3: Measures of Correlation and Dependence (i) (Slides)
- Lecture 4: Measures of Correlation and Dependence (ii) - Mutual Information & MIC (Slides)
- Lecture 5: Singular Value Decomposition and Principal Component Analysis (Slides)
- Read Chapter 10.2 Principal Components Analysis in "An Introduction to Statistical Learning", available here (PDF) (especially if you have never seen PCA before; it will help provide more intuition before going over the proofs/derivations)
- Lecture 6: PCA in high dimensions, random matrix theory and financial applications (Slides)
- Lecture 7: Nonlinear dimensionality reduction: cMDS, ISOMAP, LLE, Laplacian Eigenmaps (Slides)
- Lecture 8: Nonlinear dimensionality reduction: Diffusion Maps; Graph Partitioning (Slides)
- Lecture 9: Spectral Graph Theory (Slides)
- Lecture 10 - 11: Network analysis (Slides) (overview, graph theory basics, network summaries, network models, centrality measures, modularity, miscellaneous)
- Lecture 12: Properties of random graphs (Slides)
- Lecture 13: Clustering point clouds and graphs: k-means, spectral clustering, isoperimetric number, Cheeger's inequality (Slides)
- Lecture 14: Stochastic Block Model: spectral & semidefinite programming relaxations (Slides)
- Lecture 15: Clustering signed and directed graphs (Slides)
- Lecture 16: Group synchronization and applications (Slides)
VDM
- Lecture 17: Ranking from pairwise comparisons (Slides)
- Lecture 18: Linear Regression: OLS (Slides), Ridge, LASSO (Slides)
Two spectral problems, with solution sketches (Cheeger Inequality), (The Signed Stochastic Block Model).
Applications to real data:
- A comparison of various correlation measures (PDF)
(data in CSV)
(data in R)
(R code)
- Applications of random matrix theory (PDF)
(data in CSV: SP500 2012-2014)
(data in CSV: SP1500 2013-2019).
This is how data looks like (first few rows and columns):
PNG
- Diffusion Maps example (PDF)
(2D data in CSV)
- Multidimensional Scaling (PDF)
(CSV file with matrix of distances; of size 22MB)
- A simple linear regression application to financial data (PDF)
(R code)
@ 2023 Mihai Cucuringu