620-472 Data Mining

Data Mining refers to the management and analysis of large data sets. As it has matured it has developed a more statistical flavour, but Data Mining still owes much of its character to disciplines such as machine learning, pattern recognition, database design and high performance computing.

Data Mining became possible with the advent of large-scale data collection and the computing power necessary to process it. Data Mining involves all of the following steps

  1. Data Warehousing
  2. Data Cleaning
  3. Data Description and Visualisation
  4. Data Analysis and Interpretation
This course deals only with step 4 of the Data Mining process: data analysis and interpretation.

Techniques covered by the course include: Market Basket Analysis; Tree based classification; Logistic Regression; Neural Networks; Hierarchical and K-means clustering; and Regression Splines.

The themes that run through the course are

  1. Model fitting and selection and how to avoid overfitting
  2. Scalable algorithms that can be used with very large data sets
  3. How to acommodate high-dimensional data
  4. Actionability and interpretability of models

Prerequisites

None required, however students would benefit from having completed an introductory probability or statistics unit, such as 620-131, 620-160, 260-201 or 620-370.

Lecturer

Dr Owen Jones, room 221 Richard Berry building.
Contact details are available on the departmental web site.

Lectures are on Mondays, 10 - 12, room 215 of the Richard Berry building.
A lab help session has been timetabled for Fridays, 3:15 - 4:15, room G70 (Wilson lab) of the Richard Berry building

Course Material

The book by Kuhnert and Venables uses R to apply many of the techniques we cover:

Assessment

20% coursework (weekly/fortnightly assignments) 80% exam

Past Exams

Online Resources

Two useful datamining websites An interactive and educational implementation of the k-means algorithm Here are links to some useful R Resources:

References


Back to Owen Jones' home page or the Department of Mathematics and Statistics home page