Machine Learning

Basic Principles

Hans Georg Schaathun

NTNU, Noregs Teknisk-Naturvitskaplege Universitet

10 March 2023

Machine Learning (Regression)

Dataset
Collection of pairs of observed variables $(\vec{x}_i,\vec{y}_i)$, for $i=1,2,\ldots,n$.
Independent variables

Observation $\vec{x}_i$ of a stochastic variable $\vec{X}$

  • Usually $\vec{X}$ is a vector or tensor.
Dependent variables

Observation $\vec{y}_i$ of a stochastic variable $\vec{Y}$

  • assumed to depend on $\vec{X}$
  • $\exists f$ such that $\vec{Y}=f(\vec{X})+\vec{\varepsilon}$
  • $\vec{\varepsilon}$ is a small stochastic error.
Goal
Find a computable function $f$.

In Practice

  • Choose family of parameterised functions $F$
    • For a given set of coefficients $\vec{c}$, we have a unique $f_{\vec{c}}\in F$
  • E.g. linear regression
    • Family of linear functions $f_{(\vec{a},b)}(\vec{x})=\vec{a}\cdot\vec{x}+b$
  • For a given dataset, each data point has an error $\epsilon_i = \vec{y}_i-f_{\vec{c}}(\vec{x}_i)$
  • The error has a cost $g(\epsilon_i)$ which we want to minimise $$\min_{\vec{c}}\sum_i g(\vec{y}_i-f_{\vec{c}}(\vec{x}_i))$$
  • E.g. least squares $\min_{\vec{c}}\sum_i ||\vec{y}_i-f_{\vec{c}}(\vec{x}_i)||^2$

$$\min_{\vec{c}}\sum_i g(\vec{y}_i-f_{\vec{c}}(\vec{x}_i))$$

  • Many choices for
    • Function families $\{f_{\vec{c}}\}$
    • Cost functions
    • Optimisation algorithms
  • Continuous and discrete variables
  • The coefficient $\vec{c}$ is a(trained) model
  • Two stages
    1. Training, to find $\vec{c}$
    2. Prediction, using $f_{\vec{c}}$ to predict $\vec{Y}$ for a given $\vec{X}$

Classification problems

  • In regression we typically consider continuous output
  • In classification we want to find a class label for each $\vec{x}_i$
    • e.g. an image could be cat, dog, car, house
  • Common approach
    • $m$ classes labelled $j=1,2,\ldots,m$
    • $\vec{y}_i=(y_{i1},y_{i2},\ldots,y_{im})$ where $y_{ij}=1$ and $y_{ik}=0$ when $k\neq j$
  • The output (prediction values) $\hat y_{ik}$ are intepreted as fitness to class $j$

Machine Learning Algorithms

  • Statistical Regression
  • SVM
  • Neural Networks