Lecture 6 - Machine Learning II
Musashi Harukawa, DPIR
6th Week Hilary 2021
k-means
PCA
Some general classes of supervised models include:
Although radically distinct to linear estimators such as OLS, decision trees offer a simple and intuitive approach to estimating values of \(y\) based on \(X\).
A decision tree can be understood as a mapping from the multi-dimensional feature space, \(X_{ij}\), to the label space \(y_i\).
Advantages:
Disadvantages:
These are some of the criteria you may want to consider when choosing an algorithm:
There are various algorithms that aggregate decision trees, but here I outline the logic behind the most straightforward and common one: Random Forests (RFs).
A number of papers have been published recently that use ensemble methods to estimate heterogeneous treatment effects:
These papers both focus on innovating on the meta-learner.
Cross validation is one such of these strategies. It consists of dividing the data into training and test sets:
There are multiple aggregate measures of prediction error, but a common one is mean squared (prediction) error, calculated as the sum of squared differences between prediction and test label.
Another strategy for improving the predictive accuracy of algorithms relates to choosing the right parameters.
Most, if not all algorithms have some parameters that affect predictions in very unobvious ways. For example:
k-means
: number of clustersThese trade-offs are not linear, but generally hold:
Ensemble Methods:
Elements of Statistical Learning: