Definition: The science/art of programming computers so that they can learn from data

• Linear & Polynomial Regression
• Logistic Regression
• k-nearest Neighbors
• Support Vector machines
• Decision Trees
• Random Forests
• Ensemble methods

## Neural Networks Architectures

• Feedforward Neural Nets
• Convolutional Nets
• Recurrent Nets
• Long short-term memory (LSTM) nets
• Autoencoders
• Multi-Layer Perceptons (MLPs)

## Famous Papers

• Machine Learning on handwritten digits - 2006 - <www.cs.toronto.edu/~hinton>
• The Unreasonable Effectiveness of Data - 2009

## Hands-On Machine Learning Book

For the DL part see [Deep Learning]

## Pandas / Sklearn / Numpy / Scipy Cheatsheet

`dt.describe()` → statistics about each column (count, mean, min, max 25% 50% etc.) `dt.info()` → info about dataframe (dtype index, column dtypes, not-null values, memory usage) `dt["a_col"].value_counts()` → get all the values encountered in the column `dt.corr()` → Compute standard correlation coefficient for potential linear correlations

Apply a function to a dataframe: either `dt.apply` or `dt.where(... , inplace=True)`

Use the viridis color palette: color-blind-friendly and prints better on greyscale!

https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html

### SkLearn - fill missing values in a dataset:

#### Strategies:

• Get rid of corresponding districts
• Get rid of the whole attribute
• Set the values to some value (zero, mean, median, etc.)

Pandas/SkLearn: Convert a string column/category to nums

Get Numpy dense array from Scipy sparse matrix: `sparse_mat.toarray()`

#### Feature Scaling

Machine Learning algorithms don’t perform well when the input numerical attributes have very different scales.

• min-max scaling / normalization
• standardization

### Definitions

Attribute: A data type (e.g., Mileage) Feature: Attribute + its value

#### Deep Neural Network

LTU: Neuron, a Sum using weights -> z = w1x1 + w2x2 + … + wnxn (w^Tx), gives out a step function -> e.g., Heaviside

Perceptron -> single Layer of LTUs, Each neuron is connected to all input. The enuron’s are also fed an extra bias feature x0 = 1 (`bias neuron`)

`Passthrough Input Layer`: Inputs are represented by neurons that just propagate the input to the output

Activation function (`activation_fn`): The function that evaluates the neuron inputs and dicides on the triggering of the neuron

`ReLU or Rectifier or Ramp` -> `max(0, z)`

Hint: The derivative of ReLU is the `Heaviside` and of the SmoothReLU the `logistic function`

## FAQ

• How do I tune the hyperparameters of my model?
• Grid search with cross-validation to find the right hyperparameters
• Randomised search
• Use Oscar - http://oscar.calldesk.ai/
• It helps to have an idea of what values are reasonable for each hyperparameter!