Definition: The science/art of programming computers so that they can learn from data

  • Linear & Polynomial Regression
  • Logistic Regression
  • k-nearest Neighbors
  • Support Vector machines
  • Decision Trees
  • Random Forests
  • Ensemble methods

Neural Networks Architectures

  • Feedforward Neural Nets
  • Convolutional Nets
  • Recurrent Nets
  • Long short-term memory (LSTM) nets
  • Autoencoders
  • Multi-Layer Perceptons (MLPs)

Famous Papers

  • Machine Learning on handwritten digits - 2006 - <www.cs.toronto.edu/~hinton>
  • The Unreasonable Effectiveness of Data - 2009

Hands-On Machine Learning Book

For the DL part see [Deep Learning]

Pandas / Sklearn / Numpy / Scipy Cheatsheet

dt.describe() โ†’ statistics about each column (count, mean, min, max 25% 50% etc.) dt.info() โ†’ info about dataframe (dtype index, column dtypes, not-null values, memory usage) dt["a_col"].value_counts() โ†’ get all the values encountered in the column dt.corr() โ†’ Compute standard correlation coefficient for potential linear correlations

from pandas.plotting import scatter_matrix
scatter_matrix(dt, figsize=(12,8))

Apply a function to a dataframe: either dt.apply or dt.where(... , inplace=True)

Use the viridis color palette: color-blind-friendly and prints better on greyscale!

https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html

SkLearn - fill missing values in a dataset:

from sklearn.preprocessing import Imputer
imputer = Imputer(strategy="median")
imputer.fit(dt) # dt must have numerical values only

Strategies:

  • Get rid of corresponding districts
  • Get rid of the whole attribute
  • Set the values to some value (zero, mean, median, etc.)

Pandas/SkLearn: Convert a string column/category to nums

dt_encoded, dt_categories = dt.factorize()

# OR Use One-Hot Encoding - each string in the string category becomes a

# separate attribute

sklearn.preprocessing.OneHotEncoder

Get Numpy dense array from Scipy sparse matrix: sparse_mat.toarray()

Feature Scaling

Machine Learning algorithms don’t perform well when the input numerical attributes have very different scales.

  • min-max scaling / normalization
  • standardization

Definitions

Attribute: A data type (e.g., Mileage) Feature: Attribute + its value

Deep Neural Network

LTU: Neuron, a Sum using weights -> z = w1x1 + w2x2 + … + wnxn (w^Tx), gives out a step function -> e.g., Heaviside

Perceptron -> single Layer of LTUs, Each neuron is connected to all input. The enuron’s are also fed an extra bias feature x0 = 1 (bias neuron)

Passthrough Input Layer: Inputs are represented by neurons that just propagate the input to the output

Activation function (activation_fn): The function that evaluates the neuron inputs and dicides on the triggering of the neuron

ReLU or Rectifier or Ramp -> max(0, z)

Hint: The derivative of ReLU is the Heaviside and of the SmoothReLU the logistic function

Deep Learning Theorems

FAQ

  • How do I tune the hyperparameters of my model?
    • Grid search with cross-validation to find the right hyperparameters
    • Randomised search
    • Use Oscar - http://oscar.calldesk.ai/
      • It helps to have an idea of what values are reasonable for each hyperparameter!