Machine Learning

Basic Principles

Hans Georg Schaathun

NTNU, Noregs Teknisk-Naturvitskaplege Universitet

10 March 2023

Dataset

Collection of pairs of observed variables $(\vec{x}_i,\vec{y}_i)$, for $i=1,2,\ldots,n$.

Independent variables

Observation $\vec{x}_i$ of a stochastic variable $\vec{X}$

Dependent variables

Observation $\vec{y}_i$ of a stochastic variable $\vec{Y}$

Goal

Find a computable function $f$.

Choose family of parameterised functions $F$
- For a given set of coefficients $\vec{c}$, we have a unique $f_{\vec{c}}\in F$
E.g. linear regression
- Family of linear functions $f_{(\vec{a},b)}(\vec{x})=\vec{a}\cdot\vec{x}+b$
For a given dataset, each data point has an error $\epsilon_i = \vec{y}_i-f_{\vec{c}}(\vec{x}_i)$
The error has a cost $g(\epsilon_i)$ which we want to minimise $$\min_{\vec{c}}\sum_i g(\vec{y}_i-f_{\vec{c}}(\vec{x}_i))$$
E.g. least squares $\min_{\vec{c}}\sum_i ||\vec{y}_i-f_{\vec{c}}(\vec{x}_i)||^2$

$$\min_{\vec{c}}\sum_i g(\vec{y}_i-f_{\vec{c}}(\vec{x}_i))$$

Many choices for
- Function families $\{f_{\vec{c}}\}$
- Cost functions
- Optimisation algorithms
Continuous and discrete variables
The coefficient $\vec{c}$ is a(trained) model
Two stages
1. Training, to find $\vec{c}$
2. Prediction, using $f_{\vec{c}}$ to predict $\vec{Y}$ for a given $\vec{X}$

In regression we typically consider continuous output
In classification we want to find a class label for each $\vec{x}_i$
- e.g. an image could be cat, dog, car, house
Common approach
- $m$ classes labelled $j=1,2,\ldots,m$
- $\vec{y}_i=(y_{i1},y_{i2},\ldots,y_{im})$ where $y_{ij}=1$ and $y_{ik}=0$ when $k\neq j$
The output (prediction values) $\hat y_{ik}$ are intepreted as fitness to class $j$