Teaching : Statistical Data Mining (MS1b HT2013)
| Term: | Hilary Term, January 14 - March 8, 2013 |
| Lecturer: |
Yee Whye Teh [website] [email] |
| Teaching Assistant: |
Yuanyuan Liu [email] |
| Lectures: |
1100-1200 Wednesdays (all weeks) 1100-1200 Thursdays (odd weeks) Mathematical Institute L1 |
|
Problem Sheets: Classes: (Part C only) |
Due 1200 Mondays (Weeks 2-8) in 1 South Parks Road 1500-1700 Wednesdays (Weeks 2-8). Week 2: Seminar room in 2 SPR, Weeks 3-8: Seminar room in 1 SPR. |
|
Practicals: (Part C only) |
1100-1200 Thursdays (even weeks) 1 South Parks Road Computing Lab |
|
Miniproject: (MSc only) |
TBA To be carried out over Easter break |
| Google Group: |
https://groups.google.com/forum/?hl=en-GB#!forum/statistical-data-mining Not formally part of course, but you might find it useful to ask other students questions. |
News
- Heres solution for the problem sheet 3 EM question: EM solution.
- a specimen
paper for Part C exam.
Additional practice questions.
Note that these are just for practice and do not necessarily constitute all topics relevant to the exams. - Consultation classes:
- Friday Week 3 330pm-530pm. L1 in Math Institute. All students welcome.
- Wednesday Week 5 1130am-1pm. Seminar Room, 1 SPR. Part C only.
- Tuesday Week 6 1130am-1pm. Seminar Room, 1 SPR. Part C only.
- Solutions to some problem sheet questions are
here.
I have posted these solutions because I did not manage to go through them
in detail in the Part C classes.
NOTE: These are NOT meant to be sample solutions that you are supposed to focus on for your exams. In fact they mostly consist of programming questions with a few questions whose solutions are either really long or quite complicated.
- MSc Miniproject.
Data: X Y.
Errata:- Note: change of location for handing in to 1 SPR.
- In Question 4, quantise responses Y into 3 classes of <=8, 9 or 10, and >= 11.
- Notice to MSc students: lab work on Tuesday 2pm-6pm in 1SPR.
- I also encourage all students to make use of the google group to ask questions: I will monitor the group and answer when I can, but others can answer too, and all questions and answers will be available for everyone to read.
Compiled Slides
- Unsupervised Learning
- Supervised Learning: Parametric
- Supervised Learning: Nonparametric
- Supervised Learning: Ensemble
Schedule
| 14/1 | 16/1 Lecture: Intro PCA | 17/1 Lecture: SVD MDS Isomap |
| 21/1 Problem Sheet 1 | 23/1 Lecture: Linkage K-means VQ | 24/1 Part C: Practical 1 |
| 28/1 Problem Sheet 2 | 30/1 Lecture: Mixtures | 31/1 Lecture: decision theory |
| 04/2 Problem Sheet 3 |
06/2 Lecture:
LDA
QDA
Naive Bayes
06/2 5pm MSc Class |
07/2 Part C Practical: Practical 2 |
| 11/2 Problem Sheet 4 | 13/2 Lecture: Bayes LogReg Evaluating | 14/2 Lecture: kNN LVQ |
| 18/2 Problem Sheet 5 | 20/2 Lecture: CART ModelComplexity | 21/2 Part C Practical: Practical 3 |
| 25/2 Problem Sheet 6 | 27/2 Lecture: NeuralNets | 28/2 Lecture: Bagging |
| 04/3 Problem Sheet 7 | 06/3 Lecture: RandomForest Boosting | 07/3 Part C Practical: Practical 4 |
Data
R
- An introduction to R.
- R itself can be downloaded at http://cran.r-project.org/.
- If you happen to use Emacs, the ESS (Emacs speaks Statistics) package is recommended for interaction with R.
Some Textbooks
There are (too?) many textbooks on Data Mining. Some popular ones:- Ripley, Pattern Recognition and Neural Networks, Cambridge University Press.
- Bishop, Pattern Recognition and Machine Learning, Springer.
-
Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Springer-Verlag.
[ebook] - Duda, Hart and Stork, Pattern Classification, Wiley-Interscience.