Stanford Pratical Machine Learning-线性模型
本文最后更新于:1 年前
这一章主要介绍线性模型昂!
Linear Regression
- A simple house price prediction
- Assume 3 features: $x_1 = #beds$, $x_2 = #baths$, $x_3 = #living\ sqft$
- The predicted price is $\hat{y} = w_1x_1 + w_2x_2 + w_3x_3 + b$
- Weights $w_1, w_2, w_3$ and bias $b$ will be learnt from training data
- In general, given data $x = [x_1, x_2, \dots, x_p]$, linear regression predicts $\hat{y} = <\mathbf{w}, \mathbf{x}> + b$
Objective Function
- Objective: minimize the mean square error (MSE)
- 正式房价 - 预测房价,取平方,相加,然后求平均
Use linear regression for classification problem
这里预测的是一个实数!就不太好,不好调整!我们希望概率靠近结果就可以了!!!实数的话,我希望我的模型输出直接就是类别结果,“太关注别的类”了。
- Regression: 实数范围内的continuous output
- Multi-class classification:
- One-hot label $y = [y_1, y_2, \dots, y_m]$,where $y_i = 1$ if $i = y$ otherwise 0
- $\hat{y} = 0$ where i-th output $o_i$ is the confidence score for class i
- Learn a liner model for each class $o_i = <\mathbf{x}, \mathbf{w_i}> + b$
- Minimize MSE loss $\frac{1}{m}||o - y||_{2}^{2}$
- Predict label $argmax_i{o_i}_{i=1}^{m}$
- Waste model capacity on pushing $o_i$ near 0 for off labels
Softmax Regression
这里就好一些,我别的类可以有预测结果,但是加起来“别太大”。主要的结果预测的概率够大就行昂!
One-hot label $y = [y_1, y_2, \dots, y_m]$,where $y_i = 1$ if $i = y$ otherwise 0
$\hat{y} = softmax(o)\ where\ \hat{y_i} = \frac{exp(o_i)}{\sum_{k=1}^{m}exp(o_k)}$
- Turns confidence scores into probabilities (non-negative, sum to 1)
- Ideally we want $\hat{y} = one-hot(argmax_io_i)$, softmax is a continuous approximate to that
- Still a linear model, decision made on linear transformation of the input, as $argmax_i\hat{y_i} = argmax_io_{i}$
Cross-entropy loss between two distributions $\hat{y}$ and $y$: $H(y, \hat{y}) = \sum_{i} - y_i log(\hat{y_i}) = -log\ \hat{y}_y$
- when label class is i, assigns less penalty on $o_{j}$ as long as $o_j << o_i$
Exercise: think about how to handle examples with multi labels?
Mini-batch Stochastic gradient descent (SGD)
- Train by mini-batch SGD (by various other ways as well)
- model param, batch size, learning rate at time t
- randomly initialize model param
- Repeat t = 1, 2, …until converge
- Randomly samples
- Update model params
- Pros: solve all objectives in this course except for trees
- Cons: sensitive to hyper-parameters batch size and learning rate
Code
- Train a linear regression model with min-batch SGD
- Hyperparameters
- batch_size
- learning_rate
- num_epochs
- Code fragment
1 |
|
Summary
- Linear methods linearly combine inputs to obtain predictions
- Linear regression uses MSE as the loss function
- Softmax regression is used for multiclass classification
- Turn predictions into probabilities and use cross-entropy as loss
- Cross entropy loss between two probability distribution
- Mini-batch SGD can learn both models (and later neural networks as well)
References
Stanford Pratical Machine Learning-线性模型
https://alexanderliu-creator.github.io/2023/08/24/stanford-pratical-machine-learning-xian-xing-mo-xing/