Commit f85117f4 authored by Solange Emmenegger's avatar Solange Emmenegger
Browse files

Prepare for FS24

parent c4216495
No related merge requests found
Pipeline #2037 passed with stage
in 45 seconds
Showing with 8 additions and 391 deletions
+8 -391
%% Cell type:markdown id: tags:
# Linear Regression and Regularization
%% Cell type:code id: tags:
``` python
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
from matplotlib import pyplot as plt
from sklearn import preprocessing
from sklearn.metrics import r2_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import RobustScaler
import sklearn
%matplotlib inline
import ipywidgets as widgets
from tqdm.notebook import tqdm
import warnings
# silence future deprecation warnings
warnings.filterwarnings('ignore')
```
%% Cell type:markdown id: tags:
## Prepare the data
%% Cell type:markdown id: tags:
Although linear regression is a linear machine learning method, you can have nonlinear dependencies if you transform some of the independent variables by a nonlinear function. By doing this, you can improve the fit of your method. Let us demonstrate this on a house price dataset from [Kaggle](https://www.kaggle.com/harlfoxem/housesalesprediction). Note that this dataset is not identical with one you used in the linear regression exercise, since the this dataset is too small and would cause unreliable evaluation results.
%% Cell type:code id: tags:
``` python
df_house = pd.read_csv("kc_house_data.csv")
df_house.head()
```
%% Cell type:markdown id: tags:
We would like to have a simple linear regression problem with only one independent variable. Thus, we only keep *price* and *sqft_living*.
%% Cell type:code id: tags:
``` python
df_house = df_house[["price","sqft_living"]]
df_house.head()
```
%% Cell type:markdown id: tags:
### Split the data
%% Cell type:markdown id: tags:
We split the data into a training and test set
%% Cell type:code id: tags:
``` python
train_house, test_house = train_test_split(df_house, test_size=0.5, random_state=42)
```
%% Cell type:markdown id: tags:
### Normalize the data
Let us normalize the data by using *min-max normalization*
%% Cell type:code id: tags:
``` python
scaler = MinMaxScaler()
train_house = pd.DataFrame(scaler.fit_transform(train_house), columns=train_house.columns, index=train_house.index)
test_house = pd.DataFrame(scaler.transform(test_house), columns=test_house.columns, index=test_house.index)
train_house.head()
```
%% Cell type:code id: tags:
``` python
X_train_house = train_house[["sqft_living"]]
y_train_house = train_house[["price"]]
X_test_house = test_house[["sqft_living"]]
y_test_house = test_house[["price"]]
```
%% Cell type:markdown id: tags:
## Bias term
To account for the bias term, we add a column containing only ones.
%% Cell type:code id: tags:
``` python
X_train_house["bias"] = 1
X_test_house["bias"] = 1
# Force order
X_train_house = X_train_house[["bias", "sqft_living"]]
X_test_house = X_test_house[["bias", "sqft_living"]]
X_train_house.head()
```
%% Cell type:markdown id: tags:
## Fit a linear regression model
Define a linear regression function to estimate the parameters $\theta$ based on the normal equation:
$\Theta:=(X^{\top}X)^{-1}(X^{\top}y)$
%% Cell type:code id: tags:
``` python
def fit(X, y):
# START YOUR CODE
# END YOUR CODE
return thetas
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def fit(X, y):
thetas = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
return thetas
```
%% Cell type:markdown id: tags:
Run the following code to check your implementation:
%% Cell type:code id: tags:
``` python
thetas = fit(X_train_house, y_train_house)
expected_thetas = np.array([[7.39560812e-05], [4.94185750e-01]])
np.testing.assert_array_almost_equal(thetas, expected_thetas, decimal=4)
```
%% Cell type:markdown id: tags:
## Predict prices
Using $X$ and the estimated $\theta$, predict the house prices on the training data
%% Cell type:code id: tags:
``` python
def predict(X, thetas):
# START YOUR CODE
# END YOUR CODE
return y_pred
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def predict(X, thetas):
y_pred = np.dot(X, thetas)
return y_pred
```
%% Cell type:code id: tags:
``` python
y_pred_house = predict(X_train_house, thetas)
y_pred_house
```
%% Cell type:markdown id: tags:
## Visualize predictions
Let us plot house prices and predicted house prices
%% Cell type:code id: tags:
``` python
def plot_regression_line(X, thetas, ax=None):
if ax is None:
fig, ax = plt.subplots()
deg = len(thetas)-1
poly = PolynomialFeatures(deg)
xs = np.arange(X.min(), X.max()+0.1, 0.01).reshape(-1,1)
x = poly.fit_transform(xs)
y_pred = np.dot(x, thetas)
ax.plot(xs, y_pred, color="r")
```
%% Cell type:code id: tags:
``` python
fig, ax = plt.subplots()
ax.plot(X_train_house["sqft_living"].values, y_train_house.values, "bo", markersize=1)
plot_regression_line(X_train_house["sqft_living"].values, thetas, ax)
```
%% Cell type:markdown id: tags:
## Calculate model performance
Now let's check how good our model performs by calculating the $R^2$ score on the test set.
%% Cell type:code id: tags:
``` python
# r2 = ...
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
y_pred_test_house = predict(X_test_house, thetas)
r2_house = r2_score(y_test_house, y_pred_test_house)
print("R2: ", r2_house)
```
%% Cell type:markdown id: tags:
## Adding polynomial features
%% Cell type:markdown id: tags:
We aim to improve the fit by adding $x^2$ as additional independent variable.
%% Cell type:code id: tags:
``` python
X_train_deg2 = X_train_house.copy()
X_train_deg2["sqft_living^2"] = X_train_deg2["sqft_living"] * X_train_deg2["sqft_living"]
X_test_deg2 = X_test_house.copy()
X_test_deg2["sqft_living^2"] = X_test_deg2["sqft_living"] * X_test_deg2["sqft_living"]
X_test_deg2.head()
```
%% Cell type:markdown id: tags:
### Fit the model with the additonal features
%% Cell type:code id: tags:
``` python
thetas_deg2 = fit(X_train_deg2, y_train_house)
```
%% Cell type:markdown id: tags:
### Calculate the performance
%% Cell type:code id: tags:
``` python
# r2 =
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
y_pred_test_deg2 = predict(X_test_deg2, thetas_deg2)
r2_deg2 = r2_score(y_test_house, y_pred_test_deg2)
print("R2: ", r2_deg2)
```
%% Cell type:markdown id: tags:
As we can see, by adding $x^2$ as additional independent variable we could slightly improve our performance.
%% Cell type:markdown id: tags:
Let's try if we can further improve our performance by adding more polynomial features. To generate our polynomial features we will use the Scikit-Learn function [PolynomialFeatures](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html).
%% Cell type:code id: tags:
``` python
@widgets.interact(poly_deg =(1,18,1))
def f(poly_deg=1):
poly = PolynomialFeatures(poly_deg)
X_train_deg = poly.fit_transform(X_train_house["sqft_living"].values.reshape(-1,1))
X_test_deg = poly.transform(X_test_house["sqft_living"].values.reshape(-1,1))
thetas_deg = fit(X_train_deg, y_train_house)
y_pred_test = predict(X_test_deg, thetas_deg)
y_pred_train = predict(X_train_deg, thetas_deg)
r2_test = r2_score(y_test_house, y_pred_test)
r2_train = r2_score(y_train_house, y_pred_train)
print("R2 Train {0:.5f}".format(r2_train))
print("R2 Test {0:.5f}".format(r2_test))
fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(20,10))
ax0.set_title("Training data - polynomial degree {}".format(poly_deg))
ax0.plot(X_train_house["sqft_living"], y_train_house["price"], "bo", markersize=1)
plot_regression_line(X_train_deg, thetas_deg, ax0)
ax1.set_title("Test data - polynomial degree {}".format(poly_deg))
ax1.plot(X_test_house["sqft_living"], y_test_house["price"], "bo", markersize=1)
plot_regression_line(X_test_deg, thetas_deg, ax1)
```
%% Cell type:markdown id: tags:
What do you recognize when you increase the polynomial degree?
%% Cell type:markdown id: tags:
> Answer the question on ILIAS
%% Cell type:markdown id: tags:
## Regularization
%% Cell type:markdown id: tags:
The effect of overfitting can be reduced by regularization. Implement the regularized version of linear regression: $\Theta:=(X^{\top}X+\lambda \begin{bmatrix}
0 & 0 &\ldots&0 \\
0 & 1 & \\
\ldots & & \ddots & \\
0& & & 1
\end{bmatrix} )^{-1}(X^{\top}y)$
%% Cell type:code id: tags:
``` python
def fit_reg(X, y, lam):
# START YOUR CODE
# END YOUR CODE
return thetas
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def fit_reg(X, y, lam):
Xt = np.transpose(X)
XtX = np.dot(Xt,X)
I = np.identity(XtX.shape[0])
I[0,0] = 0
XtX = XtX + (lam * I)
XtXm1 = np.linalg.inv(XtX)
Xty = np.dot(Xt,y)
thetas = np.dot(XtXm1,Xty)
return thetas
```
%% Cell type:markdown id: tags:
You can check your implementation by executing the following cell:
%% Cell type:code id: tags:
``` python
expected_thetas = np.array([[0.00178927], [0.48482755]])
actual_thetas = fit_reg(X_train_house, y_train_house, lam=2)
np.testing.assert_array_almost_equal(expected_thetas, actual_thetas)
```
%% Cell type:markdown id: tags:
We plot the graph using the regularized parameter vectors. As you can see, the effect of overfitting is strongly reduced.
%% Cell type:code id: tags:
``` python
@widgets.interact(poly_deg = (0,12,1), lam=(0,100,1))
def f(poly_deg=1, lam=4):
poly = PolynomialFeatures(poly_deg)
X_train_deg = poly.fit_transform(X_train_house["sqft_living"].values.reshape(-1,1))
X_test_deg = poly.transform(X_test_house["sqft_living"].values.reshape(-1,1))
thetas_deg = fit_reg(X_train_deg, y_train_house, lam=lam)
y_pred_test = predict(X_test_deg, thetas_deg)
y_pred_train = predict(X_train_deg, thetas_deg)
r2_test = r2_score(y_test_house, y_pred_test)
r2_train = r2_score(y_train_house, y_pred_train)
print("R2 Train", r2_train)
print("R2 Test", r2_test)
fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(20,10))
ax0.set_title("Training data - polynomial degree {}".format(poly_deg))
ax0.plot(X_train_house["sqft_living"], y_train_house["price"], "bo", markersize=1)
plot_regression_line(X_train_deg, thetas_deg, ax0)
ax0.set_title("Test data - polynomial degree {}".format(poly_deg))
ax1.plot(X_test_house["sqft_living"], y_test_house["price"], "bo", markersize=1)
plot_regression_line(X_test_deg, thetas_deg, ax1)
```
%% Cell type:markdown id: tags:
Find the best configuration of **polynomial degree** and $\lambda$
%% Cell type:markdown id: tags:
<font color='red'>PLEASE REPLACE TEXT WITH YOUR CONFIGURATION</font>
%% Cell type:markdown id: tags:
## Regularization to help with numerical issues
%% Cell type:markdown id: tags:
Another benefit of regularization is that it can help in case of numerical issues. Let us consider our original dataset.
%% Cell type:code id: tags:
``` python
df_house_2 = pd.read_csv("kc_house_data.csv")
df_house_2 = df_house_2[["price","sqft_living","bedrooms"]]
df_house_2.head()
```
%% Cell type:code id: tags:
``` python
train_house_2, test_house_2 = train_test_split(df_house_2, test_size=0.5, random_state=42)
```
%% Cell type:code id: tags:
``` python
scaler = MinMaxScaler()
train_house_2 = pd.DataFrame(scaler.fit_transform(train_house_2), columns=train_house_2.columns, index=train_house_2.index)
test_house_2 = pd.DataFrame(scaler.transform(test_house_2), columns=test_house_2.columns, index=test_house_2.index)
test_house_2.head()
```
%% Cell type:markdown id: tags:
To make the feature matrix $X^{\top}X$ singular, we just add another independent variable (Size2) to X
that amounts to just twice the Size.
%% Cell type:code id: tags:
``` python
train_house_2["sqft_living2"] = 2 * train_house_2["sqft_living"]
train_house_2["bias"] = 1
test_house_2["sqft_living2"]= 2 * test_house_2["sqft_living"]
test_house_2["bias"] = 1
test_house_2.head()
```
%% Cell type:code id: tags:
``` python
X_train_house_2 = train_house_2[["bias", "sqft_living", "bedrooms", "sqft_living2"]]
y_train_house_2 = train_house_2[["price"]]
X_test_house_2 = test_house_2[["bias", "sqft_living", "bedrooms", "sqft_living2"]]
y_test_house_2 = test_house_2[["price"]]
```
%% Cell type:markdown id: tags:
We see that the linear regression fails, since $X^{\top}X$ is not invertible.
%% Cell type:code id: tags:
``` python
thetas = fit(X_train_house_2, y_train_house_2)
```
%% Cell type:markdown id: tags:
There are two possiblities to tackle this issue, the first one is to use the pseudoinverse instead of the inverse
and the second one is using regularization.
> Try out both.
*Hint*: For conducting linear regression with the pseudoinverse, you have to slightly modify the linear_regression method given further above.
The numpy function [np.linalg.pinv](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.inv.html) becomes handy for this.
%% Cell type:code id: tags:
``` python
def fit_pseudoinverse(X,y):
# START YOUR CODE
# END YOUR CODE
return thetas
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def fit_pseudoinverse(X, y):
thetas = np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y)
return thetas
```
%% Cell type:markdown id: tags:
Run this code to check your implementation
%% Cell type:code id: tags:
``` python
thetas_pseudo_inverse = fit_pseudoinverse(X_train_house_2, y_train_house_2)
print ("thetas obtained by linear regression with pseudoinverse:\n", thetas_pseudo_inverse)
expected_thetas_pseudo_inverse = np.array([
[ 0.02902459],
[ 0.11220321],
[-0.12253607],
[ 0.22440641]])
np.testing.assert_array_almost_equal(thetas_pseudo_inverse, expected_thetas_pseudo_inverse, decimal=5)
```
%% Cell type:code id: tags:
``` python
thetas_regularization = fit_reg(X_train_house_2, y_train_house_2, lam=1)
print ("thetas obtained by linear regression with regularization:\n", thetas_regularization)
expected_thetas_regularization = np.array([
[ 0.02846346],
[ 0.11163748],
[-0.11932519],
[ 0.22327497]])
np.testing.assert_array_almost_equal(thetas_regularization, expected_thetas_regularization, decimal=5)
```
%% Cell type:markdown id: tags:
## Programming Assignment
> Solve the following Programming assignment and check your solution in the Illias Quiz **Linear Regression and Regularization - Notebook Verification**.
%% Cell type:markdown id: tags:
Before you implemented Linear Regression from Scratch in this Programming assignment you are asked to use the scikit-learn implementation of the Linear Regression. [Scikit-learn Documentation](https://scikit-learn.org/stable/). Use the same data as before.
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Use the following features: bedrooms, bathrooms, sqft_living, yr_built and grade
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Use the same train/test split as in the previous examples
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Import and train the sklearn Linear Regression
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Calculate the train and test score (threse is a function for the regressor)
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Put the test score in the Ilias Quiz 04a Notebook Verification
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Also answer if this performs better than the ones calculated previously.
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
**!!!!!Solution needs to be deleted and transferred before being published!!!!**
%% Cell type:code id: tags:
``` python
df_house3 = pd.read_csv("kc_house_data.csv")
```
%% Cell type:markdown id: tags:
Use the following Features: bedrooms, bathrooms, sqft_living, yr_built, grade
%% Cell type:code id: tags:
``` python
df_house3 = df_house3[["price","sqft_living", "bedrooms", "bathrooms", "yr_built", "grade"]]
```
%% Cell type:markdown id: tags:
Use the same train test split as in the previous examples.
%% Cell type:code id: tags:
``` python
train_house3, test_house3 = train_test_split(df_house3, test_size=0.5, random_state=42)
```
%% Cell type:markdown id: tags:
Train the sklearn Linear Regression
%% Cell type:code id: tags:
``` python
scaler = MinMaxScaler()
train_house3 = pd.DataFrame(scaler.fit_transform(train_house3), columns=train_house3.columns, index=train_house3.index)
test_house3 = pd.DataFrame(scaler.transform(test_house3), columns=test_house3.columns, index=test_house3.index)
```
%% Cell type:code id: tags:
``` python
X_train_house_3 = train_house3[["sqft_living", "bedrooms", "bathrooms","yr_built", "grade"]]
y_train_house_3 = train_house3[["price"]]
X_test_house_3 = test_house3[["sqft_living", "bedrooms", "bathrooms","yr_built", "grade"]]
y_test_house_3 = test_house3[["price"]]
```
%% Cell type:code id: tags:
``` python
from sklearn.linear_model import LinearRegression
```
%% Cell type:code id: tags:
``` python
reg = LinearRegression().fit(X_train_house_3, y_train_house_3)
```
%% Cell type:markdown id: tags:
Calculate the Train and Test score
%% Cell type:code id: tags:
``` python
print("Train score: ", reg.score(X_train_house_3,y_train_house_3))
```
%% Cell type:code id: tags:
``` python
print("Test score: ", reg.score(X_test_house_3,y_test_house_3))
```
%% Cell type:markdown id: tags:
**!!!! Evtl add question about how bias can be automatically added to the sklearn methode**
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
```
......
%% Cell type:markdown id: tags:
# Gradient Descent
This notebook demonstrates the gradient descent approach to determine the best fitting parameters by linear regression.
%% Cell type:code id: tags:
``` python
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import sklearn
import sklearn.decomposition
import math
from sklearn import preprocessing
import matplotlib
import matplotlib.mlab as mlab
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score
from collections import defaultdict
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
from tqdm.notebook import tqdm
from ipywidgets import interact
%matplotlib inline
```
%% Cell type:markdown id: tags:
## Part 1 - Toy Example
Firstly, we demonstrate gradient descent on a simple linear regression problem with one dependent and one independent variable.
%% Cell type:code id: tags:
``` python
X = np.array([1,1,2,3,4,5,6,7,8,9,10,10])
y = np.array([1,2,3,1,4,5,6,4,7,10,15,9])
```
%% Cell type:markdown id: tags:
x and y values are plotted in a diagram.
%% Cell type:code id: tags:
``` python
plt.plot(X, y, 'bo')
plt.show()
```
%% Cell type:markdown id: tags:
We then try to fit the points by a straight line.
%% Cell type:code id: tags:
``` python
theta0 = -0.5
theta1 = 1
```
%% Cell type:code id: tags:
``` python
def predict(X, theta0, theta1):
y_pred = theta0 + theta1 * X
return y_pred
y_pred = predict(X, theta0, theta1)
```
%% Cell type:code id: tags:
``` python
def plot_regression_line(X, theta0, theta1, ax=None):
if ax is None:
fig, ax = plt.subplots()
x = np.arange(X.min()-1, X.max()+1, 1).reshape(-1,1)
y_pred = predict(x, theta0, theta1)
ax.plot(x, y_pred, color="r")
ax = sns.scatterplot(X, y)
plot_regression_line(X, theta0, theta1, ax)
plt.show()
```
%% Cell type:markdown id: tags:
This does not look so bad. Let's implement a gradient descent algorithm to do this automatically.
%% Cell type:markdown id: tags:
### Cost function
We define a cost function that determines the mean squared error of the predicted and the actual y coordinates. To get rid of the factor 2 in the gradient
formula, we divide the sum by 2.
%% Cell type:markdown id: tags:
> Implement the MSE cost function
%% Cell type:code id: tags:
``` python
def cost(y, y_pred):
# START YOUR CODE
# END YOUR CODE
return cost
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def cost(y, y_pred):
cost = np.sum((y_pred - y) ** 2) / (2 * len(y))
return cost
```
%% Cell type:code id: tags:
``` python
cost(y, y_pred)
```
%% Cell type:markdown id: tags:
### Calculate gradient
Next, let us determine the gradient of y in respect to the parameters.
%% Cell type:markdown id: tags:
**Programming Assignment - Verification on Ilias**
%% Cell type:markdown id: tags:
> Implement the `gradient` function
%% Cell type:code id: tags:
``` python
def gradient(X, y, theta0, theta1):
# START YOUR CODE
# END YOUR CODE
return grad_theta0, grad_theta1
```
%% Cell type:markdown id: tags:
*Hint: Carefully look at the definition of the cost function of Linear Regression, to calculate the gradient & take care of dimensions*
%% Cell type:markdown id: tags:
**Report the value of the gradients on Ilias**
**Report the value of the gradients in the Ilias Quiz 04B Notebook Verification**
%% Cell type:code id: tags:
``` python
gradient(X, y, theta0, theta1)
```
%% Cell type:markdown id: tags:
### Batch Gradient descent
%% Cell type:markdown id: tags:
> Now complete the `fit` function by iteratively updating our model parameters.
To visualize how the parameters and cost functions change with each epoch, we store them in a dictionary.
%% Cell type:code id: tags:
``` python
def fit(X, y, alpha, num_epochs, display_every=10):
theta0 = 0.0
theta1 = np.random.randn()
hist = defaultdict(list)
for epoch in tqdm(range(1, num_epochs + 1)):
# START YOUR CODE
# END YOUR CODE
y_pred = predict(X, theta0, theta1)
curr_cost = cost(y, y_pred)
hist["cost"].append(curr_cost)
hist["theta0"].append(theta0)
hist["theta1"].append(theta1)
if epoch % display_every == 0:
print("Epoch {} - cost: {}".format(epoch, curr_cost))
return theta0, theta1, hist
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def fit(X, y, alpha, num_epochs, display_every=10):
theta0 = 0.0
theta1 = np.random.randn()
hist = defaultdict(list)
for epoch in tqdm(range(1, num_epochs + 1)):
grad_theta0, grad_theta1 = gradient(X, y, theta0, theta1)
theta0 = theta0 - alpha * grad_theta0
theta1 = theta1 - alpha * grad_theta1
y_pred = predict(X, theta0, theta1)
curr_cost = cost(y, y_pred)
hist["cost"].append(curr_cost)
hist["theta0"].append(theta0)
hist["theta1"].append(theta1)
if epoch % display_every == 0:
print("Epoch {} - cost: {}".format(epoch, curr_cost))
return theta0, theta1, hist
```
%% Cell type:code id: tags:
``` python
alpha = 0.01
num_epochs = 20
theta0, theta1, hist = fit(X, y, alpha, num_epochs, display_every=2)
```
%% Cell type:markdown id: tags:
### Visualize learning
We can now visualize the learning process by plotting the validation curve. The validation curve shows how the cost decreases by increasing number of epochs.
%% Cell type:code id: tags:
``` python
def plot_validation_curve(data, ax=None, ylim=None):
if ax is None:
fig, ax = plt.subplots()
ax.set_title("Validation Curve")
ax.set_ylabel("Cost")
if ylim is not None:
ax.set_ylim(ylim)
ax.set_xlabel("Epochs")
ax.plot(data)
plot_validation_curve(hist["cost"])
```
%% Cell type:markdown id: tags:
Using our history, we can now visualize how the parameters change by each epoch.
%% Cell type:code id: tags:
``` python
@interact(epoch=(1, len(hist["theta0"])))
def visualize_learning(epoch=1):
ax = sns.scatterplot(X, y)
plot_regression_line(X, hist["theta0"][epoch-1], hist["theta1"][epoch-1], ax)
plt.show()
```
%% Cell type:markdown id: tags:
### Contour plot
We can visualize how our model parameters $\Theta$ change after each epoch by displaying a contour plot.
%% Cell type:code id: tags:
``` python
def parallel_cost(Theta0, Theta1, X, y):
m = Theta0.shape[0]
n = Theta0.shape[1]
tot = np.zeros((m,n))
for i in range(1,len(X)):
tot += (Theta0 + Theta1 * X[i] - y[i]) ** 2;
return tot/(2*len(X))
```
%% Cell type:code id: tags:
``` python
matplotlib.rcParams['xtick.direction'] = 'out'
matplotlib.rcParams['ytick.direction'] = 'out'
def contour_plot_zoomed(X, y, ax=None):
if ax is None:
fig, ax = plt.subplots(figsize=(12,8))
delta = 0.025
t0 = np.arange(-0.5, 0.5, delta)
t1 = np.arange(0.5, 1.5, delta)
T0, T1 = np.meshgrid(t0, t1)
Z = parallel_cost(T0, T1, X, y)
CS = ax.contour(T0, T1, Z, levels = [0.25,0.5,1,2,3])
ax.clabel(CS, inline=1, fontsize=10)
ax.set_title('Contour plot')
ax.set_xlabel(r'$\theta_0$')
ax.set_ylabel(r'$\theta_1$')
return ax
```
%% Cell type:code id: tags:
``` python
@interact(epoch=(1, len(hist["theta0"])))
def visualize_contour_plot(epoch=1):
ax = contour_plot_zoomed(X, y)
for i in range(epoch):
theta0 = hist["theta0"][i]
theta1 = hist["theta1"][i]
ax.plot(theta0, theta1, "ro", linewidth=9)
if i == 0:
continue
theta0_prev = hist["theta0"][i-1]
theta1_prev = hist["theta1"][i-1]
ax.annotate('', xy=[theta0, theta1], xytext=[theta0_prev, theta1_prev],
arrowprops={'arrowstyle': '->', 'color': 'r', 'lw': 1},
va='center', ha='center')
plt.show()
```
%% Cell type:markdown id: tags:
### Normalise data
Let's run the experiment above again but this time first normalise the data and see what happens.
We use the `StandardScaler` which implements z-normalisation.
%% Cell type:code id: tags:
``` python
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X.reshape(-1, 1)).reshape(-1)
X_scaled
```
%% Cell type:markdown id: tags:
#### Apply gradient descent algorithm on normalised data.
%% Cell type:code id: tags:
``` python
alpha = 0.01
num_epochs = 20
theta0, theta1, hist_scaled = fit(X_scaled, y, alpha, num_epochs, display_every=2)
plot_validation_curve(hist_scaled["cost"])
```
%% Cell type:markdown id: tags:
It seems like it did not converge yet. Let's increase the learning rate $\alpha$ and the number of epochs and run it again.
%% Cell type:code id: tags:
``` python
alpha = 0.1
num_epochs = 50
theta0, theta1, hist_scaled = fit(X_scaled, y, alpha, num_epochs, display_every=5)
plot_validation_curve(hist_scaled["cost"])
```
%% Cell type:markdown id: tags:
That looks much better now. Okay, let's plot the contours.
%% Cell type:code id: tags:
``` python
def contour_plot(X, y, ax=None):
if ax is None:
fig, ax = plt.subplots(figsize=(12,8))
delta = 0.025
t0 = np.arange(0, 9, delta)
t1 = np.arange(0, 9, delta)
T0, T1 = np.meshgrid(t0, t1)
Z = parallel_cost(T0, T1, X, y)
CS = ax.contour(T0, T1, Z, levels = [1,2,3,4,5,6])
ax.clabel(CS, inline=1, fontsize=10)
ax.set_title('Contour plot')
ax.set_xlabel(r'$\theta_0$')
ax.set_ylabel(r'$\theta_1$')
return ax
```
%% Cell type:code id: tags:
``` python
@interact(epoch=(1, len(hist_scaled["theta0"])))
def visualize_contour_plot(epoch=1):
ax = contour_plot(X_scaled, y)
for i in range(epoch):
theta0 = hist_scaled["theta0"][i]
theta1 = hist_scaled["theta1"][i]
ax.plot(theta0, theta1, "ro", linewidth=9)
if i == 0:
continue
theta0_prev = hist_scaled["theta0"][i-1]
theta1_prev = hist_scaled["theta1"][i-1]
ax.annotate('', xy=[theta0, theta1], xytext=[theta0_prev, theta1_prev],
arrowprops={'arrowstyle': '->', 'color': 'r', 'lw': 1},
va='center', ha='center')
plt.show()
```
%% Cell type:markdown id: tags:
The contours are not as narrow as before.
<span style="color:red">
Make sure that you never forget to scale your data before applying the gradient descent algorithm!</span>
%% Cell type:markdown id: tags:
## Part 2 - House prices data set
Now that we have tested our functions with our toy datset, let's move to a the house price dataset.
%% Cell type:code id: tags:
``` python
df_house = pd.read_csv('house_prices.csv')
df_house.head()
```
%% Cell type:markdown id: tags:
We want to predict the price of a house based on its size.
%% Cell type:markdown id: tags:
Let's split the feature from the target variable.
%% Cell type:code id: tags:
``` python
X_house = df_house[["Size"]].values
y_house = df_house.Price.values
```
%% Cell type:markdown id: tags:
Next, we further split the data into a training and test set.
%% Cell type:code id: tags:
``` python
split = train_test_split(X_house, y_house, test_size=0.2, random_state=42)
(X_train_house, X_test_house, y_train_house, y_test_house) = split
X_train_house = X_train_house.reshape(-1)
X_test_house = X_test_house.reshape(-1)
```
%% Cell type:markdown id: tags:
Here we visualize our training data in a scatter plot.
%% Cell type:code id: tags:
``` python
sns.scatterplot(X_train_house.reshape(-1), y_train_house)
```
%% Cell type:markdown id: tags:
#### Apply Batch Gradient Descent
Let's use our implemented `fit` method to apply batch gradient descent to the house price dataset and see what happens.
%% Cell type:code id: tags:
``` python
alpha = 0.01
num_epochs = 300
theta0, theta1, hist_house = fit(X_train_house, y_train_house, alpha, num_epochs, display_every=20)
plot_validation_curve(hist_house["cost"])
```
%% Cell type:markdown id: tags:
It seems like our gradient descent algorithm does not converge!
> Why did that happen?
%% Cell type:markdown id: tags:
### Scaling the data
Let's try it again but this time we will scale the data accordingly.
%% Cell type:code id: tags:
``` python
X_house_scaled = df_house[["Size"]].values
y_house_scaled = df_house.Price.values
split = train_test_split(X_house_scaled, y_house_scaled, test_size=0.2, random_state=42)
(X_train_house_scaled, X_test_house_scaled, y_train_house_scaled, y_test_house_scaled) = split
```
%% Cell type:markdown id: tags:
> z-normalise the training and test data by using the [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
%% Cell type:code id: tags:
``` python
# z-normalise the training and test data.
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
scaler = StandardScaler()
X_train_house_scaled = scaler.fit_transform(X_train_house_scaled).reshape(-1)
X_test_house_scaled = scaler.transform(X_test_house_scaled)
```
%% Cell type:markdown id: tags:
Now we apply the gradient descent algorithm again.
%% Cell type:code id: tags:
``` python
alpha = 0.01
num_epochs = 300
theta0, theta1, hist_house_scaled = fit(X_train_house_scaled, y_train_house_scaled, alpha,
num_epochs, display_every=20)
plot_validation_curve(hist_house_scaled["cost"])
```
%% Cell type:markdown id: tags:
Our validation curve looks much better now. We see that the cost converges after a few epochs.
%% Cell type:markdown id: tags:
Again we can visualize how our regression line looks after each epoch.
%% Cell type:code id: tags:
``` python
@interact(epoch=(1, len(hist_house_scaled["theta0"])))
def visualize_learning(epoch=1):
ax = sns.scatterplot(X_train_house_scaled, y_train_house_scaled)
plot_regression_line(X_train_house_scaled,
hist_house_scaled["theta0"][epoch-1],
hist_house_scaled["theta1"][epoch-1], ax)
plt.show()
```
%% Cell type:markdown id: tags:
### Calculate metrics on the test set
> Now calculate the $R^2$ score on the test set by using the previously implemented `predict` function.
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
y_pred_house_scaled = predict(X_test_house_scaled, theta0, theta1)
r2 = r2_score(y_test_house_scaled, y_pred_house_scaled)
print("R2:", r2)
```
%% Cell type:markdown id: tags:
## Part 3 - Autoscout data set
We extend our code for multiple linear regression. We will use the autoscout dataset from the previous exercises. First we apply the data cleaning and then z-Normalise our data.
%% Cell type:code id: tags:
``` python
df_autoscout = pd.read_csv('cars.csv')
df_autoscout.drop(['Name', 'Registration'], axis='columns', inplace=True)
df_autoscout.drop([17010, 7734, 47002, 44369, 24720, 50574, 36542, 42611,
22513, 12773, 21501, 2424, 52910, 29735, 43004, 47125], axis='rows', inplace=True)
df_autoscout.drop(df_autoscout.index[df_autoscout.EngineSize > 7500], axis='rows', inplace=True)
df_autoscout.drop_duplicates(inplace=True)
df_autoscout.head()
numerical_cols = ['Price', 'Mileage', 'Horsepower', 'EngineSize']
df_autoscout = pd.get_dummies(df_autoscout)
train_autoscout, test_autoscout = train_test_split(df_autoscout, test_size=0.4, random_state=42)
q3 = train_autoscout.loc[:, numerical_cols].describe().loc['75%']
iqr = q3 - df_autoscout.loc[:, numerical_cols].describe().loc['25%']
upper_boundary = q3 + 1.5*iqr
upper_boundary
# And here the outliers are removed
train_autoscout = train_autoscout[(train_autoscout.Price <= upper_boundary.Price) &
(train_autoscout.Mileage <= upper_boundary.Mileage) &
(train_autoscout.Horsepower <= upper_boundary.Horsepower) &
(train_autoscout.EngineSize <= upper_boundary.EngineSize)]
test_autoscout = test_autoscout[(test_autoscout.Price <= upper_boundary.Price) &
(test_autoscout.Mileage <= upper_boundary.Mileage) &
(test_autoscout.Horsepower <= upper_boundary.Horsepower) &
(test_autoscout.EngineSize <= upper_boundary.EngineSize)]
X_train_autoscout = train_autoscout.drop(columns=["Price"]).values
X_test_autoscout = test_autoscout.drop(columns=["Price"]).values
y_train_autoscout = train_autoscout.Price.values
y_test_autoscout = test_autoscout.Price.values
# z-Normalise the data
scaler = StandardScaler()
X_train_autoscout = scaler.fit_transform(X_train_autoscout)
X_test_autoscout = scaler.transform(X_test_autoscout)
```
%% Cell type:markdown id: tags:
We modify our predict function that instead of providing $\theta_0$ and $\theta_1$ we now provide the bias ($\theta_0$) and the other parameters $\Theta$ as an array.
> Implement the `predict` function
%% Cell type:code id: tags:
``` python
def predict(X, bias, thetas):
# START YOUR CODE
# END YOUR CODE
return y_pred
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def predict(X, bias, thetas):
y_pred = bias + np.dot(X, thetas)
return y_pred
```
%% Cell type:markdown id: tags:
> Implement the `gradient` function
%% Cell type:code id: tags:
``` python
def gradient(X, y, bias, thetas):
# START YOUR CODE
# END YOUR CODE
return grad_bias, grad_thetas
```
%% Cell type:markdown id: tags:
Click on the dots to display the solution
%% Cell type:code id: tags:
``` python
def gradient(X, y, bias, thetas):
y_pred = predict(X, bias, thetas)
diff = y_pred - y
n = len(X)
grad_bias = np.sum(diff) / n
grad_thetas = np.dot(diff, X) / n
return grad_bias, grad_thetas
```
%% Cell type:markdown id: tags:
We extend our `fit` function by tracking not only the cost but also the $R^2$ score.
%% Cell type:code id: tags:
``` python
def fit(X_train, y_train, alpha, num_epochs, display_every=50):
bias = 0.0
thetas = np.random.randn(*(1, X_train.shape[1])).reshape(-1)
hist = defaultdict(list)
for epoch in tqdm(range(1, num_epochs+1)):
grad_bias, grad_thetas = gradient(X_train, y_train, bias, thetas)
bias = bias - alpha * grad_bias
thetas = thetas - alpha * grad_thetas
y_pred_train = predict(X_train, bias, thetas)
train_cost = cost(y_train, y_pred_train)
train_r2 = r2_score(y_train, y_pred_train)
hist["train_cost"].append(train_cost)
hist["train_r2"].append(train_r2)
if epoch % display_every == 0:
print("Epoch {0} - cost: {1:.2} - r2: {2:.4}"
.format(epoch, train_cost, train_r2))
return bias, thetas, hist
```
%% Cell type:code id: tags:
``` python
alpha = 0.01
num_epochs = 1000
bias, thetas, hist_autoscout = fit(X_train_autoscout, y_train_autoscout, alpha, num_epochs)
```
%% Cell type:code id: tags:
``` python
def plot_validation_curves(hist, ylim=None):
fig, ax = plt.subplots(ncols=2, figsize=(16,5))
ax[0].set_title("Train Cost")
ax[0].set_ylabel("Cost")
plot_validation_curve(hist["train_cost"], ax[0], ylim)
ax[1].set_title("Train R2")
ax[1].set_ylabel("R2")
ax[1].set_ylim(-1, 1)
plot_validation_curve(hist["train_r2"], ax[1])
plt.tight_layout()
plot_validation_curves(hist_autoscout)
```
%% Cell type:markdown id: tags:
### Calculate metrics on test set
Now we calculate the $R^2$ score on the test set.
%% Cell type:code id: tags:
``` python
y_pred_autoscout = predict(X_test_autoscout, bias, thetas)
r2 = r2_score(y_test_autoscout, y_pred_autoscout)
print("R2:", r2)
```
%% Cell type:markdown id: tags:
Compared to the previous exercise where we calculated the estimates for the $\Theta$ numerically using the normal equation we got almost the same result with the gradient descent algorithm.
%% Cell type:markdown id: tags:
### Minibatch Gradient Descent
%% Cell type:markdown id: tags:
> Now modify our `fit` function to use mini batch gradient descent. So instead of calculating the gradient on the whole dataset on each step, only use a subset of the data.
%% Cell type:code id: tags:
``` python
def fit(X_train, y_train, alpha, num_epochs, batch_size, display_every=50):
bias = 0.0
thetas = np.random.randn(*(1, X_train.shape[1])).reshape(-1)
hist = defaultdict(list)
indices_train = np.arange(len(X_train))
num_samples = len(X_train)
steps = int(num_samples/batch_size)
for epoch in tqdm(range(1, num_epochs + 1)):
# randomize inputs
np.random.shuffle(indices_train)
X_train_epoch = X_train[indices_train]
y_train_epoch = y_train[indices_train]
for step in range(steps):
start = step * batch_size
end = step * batch_size + batch_size
X_train_mini = X_train_epoch[start:end]
y_train_mini = y_train_epoch[start:end]
# START YOUR CODE
# Apply gradient descent
# END YOUR CODE
y_pred_train = predict(X_train, bias, thetas)
train_cost = cost(y_train, y_pred_train)
train_r2 = r2_score(y_train, y_pred_train)
hist["train_cost"].append(train_cost)
hist["train_r2"].append(train_r2)
if epoch % display_every == 0 or epoch == num_epochs:
print("Epoch {0} - train_cost: {1:.2} - train_r2: {2:.4}".format(epoch, train_cost, train_r2))
return bias, thetas, hist
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def fit(X_train, y_train, alpha, num_epochs, batch_size, display_every=50):
bias = 0.0
thetas = np.random.randn(*(1, X_train.shape[1])).reshape(-1)
hist = defaultdict(list)
indices_train = np.arange(len(X_train))
num_samples = len(X_train)
steps = int(num_samples/batch_size)
for epoch in tqdm(range(1, num_epochs + 1)):
# randomize inputs
np.random.shuffle(indices_train)
X_train_epoch = X_train[indices_train]
y_train_epoch = y_train[indices_train]
for step in range(steps):
start = step * batch_size
end = step * batch_size + batch_size
X_train_mini = X_train_epoch[start:end]
y_train_mini = y_train_epoch[start:end]
grad_bias, grad_thetas = gradient(X_train_mini, y_train_mini, bias, thetas)
bias = bias - alpha * grad_bias
thetas = thetas - alpha * grad_thetas
y_pred_train = predict(X_train, bias, thetas)
train_cost = cost(y_train, y_pred_train)
train_r2 = r2_score(y_train, y_pred_train)
hist["train_cost"].append(train_cost)
hist["train_r2"].append(train_r2)
if epoch % display_every == 0 or epoch == num_epochs:
print("Epoch {0} - train_cost: {1:.2} - train_r2: {2:.4}".format(epoch, train_cost, train_r2))
return bias, thetas, hist
```
%% Cell type:markdown id: tags:
Wo have now introduced an additional hyperparameter `batch_size`.
* If we set `batch_size` equal to 1, we use Stochastic Gradient Desccent: We update our model parameters $\Theta$ for each training example.
* If we set `batch_size` equal to to the number of training samples we have again Batch Gradient Descent: We use all training samples to update the model parameters $\Theta$.
%% Cell type:markdown id: tags:
#### Batch Gradient Descent
We run batch gradient descent and see what happens
%% Cell type:code id: tags:
``` python
alpha = 1e-2
num_epochs = 50
batch_size = len(X_train_autoscout)
bias, thetas, hist_autoscout_batch = fit(X_train_autoscout, y_train_autoscout, alpha, num_epochs, batch_size)
plot_validation_curves(hist_autoscout_batch)
```
%% Cell type:markdown id: tags:
We can notice the following:
* The training did not converge after those 50 epochs. We would need more epochs.
* The training cost is strictly decreasing as we take all training samples per step
%% Cell type:markdown id: tags:
#### Minibatch Gradient Descnet
Let's compare it to minibatch gradient descent with a `batch_size` of 100.
%% Cell type:code id: tags:
``` python
alpha = 1e-2
num_epochs = 50
batch_size = 100
bias, thetas, hist_autoscout_minibatch = fit(X_train_autoscout, y_train_autoscout, alpha, num_epochs, batch_size)
plot_validation_curves(hist_autoscout_minibatch)
```
%% Cell type:markdown id: tags:
* As we are taking only a subset of our data when applying gradient descent, the training cost are not stricly decreasing anymore.
* We do not need as many epochs as before as we are doing much more updates now.
%% Cell type:markdown id: tags:
### Calculate the performance on the test set
%% Cell type:code id: tags:
``` python
y_pred_autoscout = predict(X_test_autoscout, bias, thetas)
r2 = r2_score(y_test_autoscout, y_pred_autoscout)
print("R2:", r2)
```
%% Cell type:markdown id: tags:
### Answer the ILIAS Quiz
%% Cell type:markdown id: tags:
> Now that you have implemented the gradient descent algorithm from scratch, you're ready to answer the ILIAS Quiz **Gradient Descent**.
%% Cell type:markdown id: tags:
**Remove solution**
%% Cell type:code id: tags:
``` python
def gradient(X, y, theta0, theta1):
y_pred = predict(X, theta0, theta1)
diff = y_pred - y
n = len(X)
grad_theta0 = np.sum(diff) / n
grad_theta1 = np.dot(diff, X.T) / n
return grad_theta0, grad_theta1
```
......
%% Cell type:markdown id: tags:
# Logistic Regression
%% Cell type:code id: tags:
``` python
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
import math
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler
import sklearn
from collections import defaultdict
from tqdm.notebook import tqdm
from ipywidgets import interact
import seaborn as sns
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix
from sklearn.dummy import DummyClassifier
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)
```
%% Cell type:markdown id: tags:
## Part 1 - 1D Toy example
Consider the case where random numbers are created by two different Gaussian distributions with identical variance. We also know the labels from which distribution each number was originating from. The generated data could, for example, represent how many days a student has learned for the ML exam and the target variable is if they have passed.
%% Cell type:code id: tags:
``` python
students_passed = np.random.normal(5,0.7,100)
students_passed[1:20]
```
%% Cell type:code id: tags:
``` python
students_failed = np.random.normal(2,0.7,100)
students_failed[1:20]
```
%% Cell type:markdown id: tags:
To use this data for a logistic regression model, we combine the vectors $\text{students_passed}$ and $\text{students_failed}$ into a vector $X$ and create the corresponding labels $y$.
%% Cell type:code id: tags:
``` python
# label: failed
zeros = [0]*100
# label: passed
ones = [1]*100
X = np.concatenate((students_passed, students_failed))
y = np.concatenate((ones, zeros))
```
%% Cell type:markdown id: tags:
We plot both type of points in a scatter plot, where the points generated by the first distribution are plotted in blue have the label $y=0$, while the points of the second distribution are plotted in orange at $y=1$.
%% Cell type:code id: tags:
``` python
legend_map = {0: 'failed', 1: 'passed'}
ax = sns.scatterplot(X, y, hue=pd.Series(y).map(legend_map))
ax.set_xlabel('days spent learning for the ML exam')
ax.set_ylabel('if students passed')
plt.show()
```
%% Cell type:markdown id: tags:
Now we would like to determine, if an arbitrary previously unseen point belongs rather to distribution 1 or two distribution 2. For that, we want to employ logistic regression. Similar to linear regression, we first consider a model with a single independent variable and two parameters $\theta_0$ and $\theta_1$.
The probability, that $x$ belongs to either of the two classes is determined using the sigmoid function.
$$
\sigma(x) = \frac{1}{1+e^{-(\theta_0 + \theta_1x)}}
$$
%% Cell type:markdown id: tags:
> Implement the sigmoid function
%% Cell type:code id: tags:
``` python
def sigmoid(z):
# START YOUR CODE
# END YOUR CODE
return s
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
```
%% Cell type:code id: tags:
``` python
x = np.linspace(-8, 8)
plt.plot(x, sigmoid(x))
```
%% Cell type:markdown id: tags:
**Programming Assisgnment - Verifciation on Ilias**
%% Cell type:markdown id: tags:
> Implement the `predict` function. On **Ilias**, report the i=7 entry of y_pred below and check if your implementation is correct.
%% Cell type:code id: tags:
``` python
def predict(X, theta0, theta1):
# START YOUR CODE
# END YOUR CODE
return y_pred
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
theta0 = 1.0
theta1 = 1.0
y_pred = predict(X, theta0, theta1)
y_pred[0:20]
```
%% Cell type:code id: tags:
``` python
y_pred[7]
```
%% Cell type:code id: tags:
%% Cell type:markdown id: tags:
``` python
```
**The value below is the answer for the Ilias Quiz "05A Supervised Learning: Classification"**
%% Cell type:code id: tags:
``` python
y_pred[7]
```
%% Cell type:markdown id: tags:
### Visualize Decision Boundary
The decision boundary is given by the x such that: $-\theta_0-\theta_1 x=0$.
We can solve this equation for x: $x=-\frac{\theta_0}{\theta_1}$
Now let us plot the decision boundary and the logistic function
%% Cell type:code id: tags:
``` python
def plot_decision_boundary(X, theta0, theta1, ax=None):
if ax is None:
fig, ax = plt.subplots()
x = np.arange(X.min()-1, X.max()+1, 0.01).reshape(-1,1)
y_pred = predict(x, theta0, theta1)
ax.plot(x, y_pred, color="r")
ax.axvline(-theta0/theta1, color="g")
ax.set_title("Decision Boundary")
legend_map = {0: 'failed', 1: 'passed'}
ax = sns.scatterplot(X, y, hue=pd.Series(y).map(legend_map))
ax.set_xlabel('days spent learning for the ML exam')
ax.set_ylabel('if students passed')
plot_decision_boundary(X, theta0, theta1, ax)
plt.show()
```
%% Cell type:markdown id: tags:
### Cost function
The cross-entropy cost function $J(\boldsymbol\theta)$, where $\boldsymbol\theta=\left(\theta_0,\theta_1\right)$ is given by
$$
J(\boldsymbol\theta) =
- \frac{1}{n} \sum_{i=1}^n%
\left[y_i\log h(\boldsymbol\theta,\mathbf{X_i})
+ (1-y_i)\log\left(
1-h(\boldsymbol\theta,\mathbf{X_i})\right)\right]
$$
where $h(\boldsymbol\theta,\mathbf{X_i})=\sigma\left(\mathbf{X_i}^T\boldsymbol\theta\right)=\sigma\left(\theta_0+\theta_1 x\right)$ and $\sigma$ is the sigmoid function.
%% Cell type:markdown id: tags:
> Implement the cost function. Verify your code by running the next cell.
%% Cell type:code id: tags:
``` python
def cost_function(y, y_pred):
# START YOUR CODE
# END YOUR CODE
return cost
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def cost_function(y, y_pred):
n = y.shape[0]
cost = -(1.0 / n) * np.sum(y * np.log(y_pred) + (1 - y) * np.log(1 - y_pred))
return cost
```
%% Cell type:markdown id: tags:
If your code is correct, you should be able to run the following cell:
%% Cell type:code id: tags:
``` python
y_assert = np.array([1, 0, 0])
y_pred = np.array([0.98, 0.2, 0.6])
expected_cost = 0.38654566350196135
actual_cost = cost_function(y_assert, y_pred)
np.testing.assert_almost_equal(actual_cost, expected_cost, decimal=3)
```
%% Cell type:markdown id: tags:
### Gradients
For applying gradient descent, we define the gradient.
%% Cell type:code id: tags:
``` python
def gradient(X, y, theta0, theta1):
y_pred = predict(X, theta0, theta1)
diff = y_pred - y
n = len(X)
grad_theta0 = np.sum(diff) / n
grad_theta1 = np.dot(diff, X.T) / n
return grad_theta0, grad_theta1
```
%% Cell type:markdown id: tags:
### Gradient Descent
Now we are ready to determine the optimal values for the parameters $\theta_0$ and $\theta_1$ using the gradient descent algorithm.
$$
\mathbf{Repeat}\;\mathrm{(until}\;\mathrm{convergence)} \left\{\right.
\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\\
\boldsymbol\theta_{k+1} = \boldsymbol\theta_{k}-\alpha\frac{1}{n}\sum_{i=1}^n
\left(h(\boldsymbol\theta_k,\mathbf{x}^{(i)})-y^{(i)}\right)\mathbf{x}^{(i)},
\quad k = 0,\,1,\,2,\,3,\,\ldots,\mathtt{kmax}\\
\left.\right\}\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad
$$
> Implement the `fit` function
%% Cell type:code id: tags:
``` python
def fit(X, y, alpha, num_epochs, display_every=10):
theta0 = 0.0
theta1 = np.random.randn()
hist = defaultdict(list)
for i in tqdm(range(num_epochs)):
# START YOUR CODE
# calculate gradients
# update model parameters theta0 and theta1
# calculate the current costs
# END YOUR CODE
grad_theta0, grad_theta1 = gradient(X, y, theta0, theta1)
theta0 = theta0 - alpha * grad_theta0
theta1 = theta1 - alpha * grad_theta1
y_pred = predict(X, theta0, theta1)
curr_cost = cost_function(y, y_pred)
hist["cost"].append(curr_cost)
hist["theta0"].append(theta0)
hist["theta1"].append(theta1)
if i == 0 or (i+1) % display_every == 0:
print("Epoch {} - cost: {}".format(i+1, curr_cost))
return theta0, theta1, hist
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def fit(X, y, alpha, num_epochs, display_every=10):
theta0 = 0.0
theta1 = np.random.randn()
hist = defaultdict(list)
for i in tqdm(range(num_epochs)):
# calculate gradients
grad_theta0, grad_theta1 = gradient(X, y, theta0, theta1)
# update model parameters theta0 and theta1
theta0 = theta0 - alpha * grad_theta0
theta1 = theta1 - alpha * grad_theta1
# calculate the current costs
y_pred = predict(X, theta0, theta1)
curr_cost = cost_function(y, y_pred)
hist["cost"].append(curr_cost)
hist["theta0"].append(theta0)
hist["theta1"].append(theta1)
if i == 0 or (i+1) % display_every == 0:
print("Epoch {} - cost: {}".format(i+1, curr_cost))
return theta0, theta1, hist
```
%% Cell type:markdown id: tags:
#### Plot validation curve
We implement a function that allows us to plot the validation curve.
%% Cell type:code id: tags:
``` python
def plot_validation_curve(costs, ax=None):
if ax is None:
fig, ax = plt.subplots()
ax.set_ylabel("Cost")
ax.set_title("Validation Curve")
ax.set_xlabel("Epochs")
ax.plot(costs)
```
%% Cell type:markdown id: tags:
#### Run gradient descent algorithm
Let's run the gradient descent algorithm!
%% Cell type:code id: tags:
``` python
alpha = 0.1
num_epochs = 10000
theta0, theta1, hist = fit(X, y, alpha, num_epochs, display_every=1000)
fig, ax = plt.subplots(ncols=2, figsize=(15,4))
# scatter plot
legend_map = {0: 'failed', 1: 'passed'}
ax[0] = sns.scatterplot(X, y, hue=pd.Series(y).map(legend_map), ax=ax[0])
ax[0].set_xlabel('days spent learning for the ML exam')
ax[0].set_ylabel('if students passed')
plot_decision_boundary(X, theta0, theta1, ax[0])
# validation curve
plot_validation_curve(hist["cost"], ax=ax[1])
```
%% Cell type:markdown id: tags:
### Visualize Learning
Let's visualize how the decision boundary changes over time.
%% Cell type:code id: tags:
``` python
@interact(epoch=(0, len(hist["theta0"]), 100))
def visualize_learning(epoch=100):
legend_map = {0: 'failed', 1: 'passed'}
ax = sns.scatterplot(X, y, hue=pd.Series(y).map(legend_map))
ax.set_xlabel('days spent learning for the ML exam')
ax.set_ylabel('if students passed')
if epoch == 0:
epoch += 1
plot_decision_boundary(X, hist["theta0"][epoch-1], hist["theta1"][epoch-1], ax)
plt.show()
```
%% Cell type:markdown id: tags:
### Metrics
Let's calculate the accuracy
%% Cell type:code id: tags:
``` python
y_pred = predict(X, theta0, theta1)
y_pred[0:10]
```
%% Cell type:markdown id: tags:
We label a point as 1 if the predicted value is larger than 0.5
%% Cell type:code id: tags:
``` python
# y_pred_class = ...
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
y_pred_class = y_pred > 0.5
```
%% Cell type:code id: tags:
``` python
accuracy = accuracy_score(y, y_pred_class)
print("Accuracy: ", accuracy)
```
%% Cell type:markdown id: tags:
## Part 2 - Multiple Logistic Regression - Toy example
In the second part, logistic regression is used in a 2D toy example. Here the data is loaded from a `.csv` file, but it was also generated artificially for illustration purposes. Here the data can, for example, correspond to
* feature 1: days spent learning for the ML exam
* feature 2: days spent working in the ML domain (prior experience)
* target variable: if students have passed the exam
%% Cell type:code id: tags:
``` python
df = pd.read_csv("classification_data.csv", header=None)
df.columns = ['days spent learning', 'prior experience', 'exam passed']
df.head()
```
%% Cell type:code id: tags:
``` python
n = len(df)
X_2d = df.values[:, 0:2]
y_2d = df.values[:,2]
```
%% Cell type:markdown id: tags:
Split the data into training and test set.
%% Cell type:code id: tags:
``` python
X_train, X_test, y_train, y_test = train_test_split(X_2d, y_2d, test_size=0.1)
print("X_train:", X_train.shape)
print("y_train:", y_train.shape)
```
%% Cell type:markdown id: tags:
### Predict function
The first step is to modify our `predict` function to handle multiple thetas.
%% Cell type:code id: tags:
``` python
def predict(X, bias, thetas):
# START YOUR CODE
# END YOUR CODE
return y_pred
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def predict(X, bias, thetas):
z = bias + np.dot(X, thetas)
y_pred = sigmoid(z)
return y_pred
```
%% Cell type:markdown id: tags:
### Gradient function
Let's modify the `gradient` function.
%% Cell type:code id: tags:
``` python
def gradient(X, y, bias, thetas):
# START YOUR CODE
# END YOUR CODE
return grad_bias, grad_thetas
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def gradient(X, y, bias, thetas):
y_pred = predict(X, bias, thetas)
diff = y_pred - y
n = len(X)
grad_bias = np.sum(diff) / n
grad_thetas = np.dot(diff, X) / n
return grad_bias, grad_thetas
```
%% Cell type:markdown id: tags:
### Gradient descent algorithm
%% Cell type:code id: tags:
``` python
def fit(X, y, alpha, num_epochs, display_every=100):
bias = 0.0
thetas = np.random.randn(*(1, X.shape[1])).reshape(-1)
hist = defaultdict(list)
for epoch in tqdm(range(1, num_epochs+1)):
# calculate gradients
grad_bias, grad_thetas = gradient(X, y, bias, thetas)
# update model parameters
bias = bias - alpha * grad_bias
thetas = thetas - alpha * grad_thetas
# calculate the current costs
y_pred = predict(X, bias, thetas)
curr_cost = cost_function(y, y_pred)
hist["cost"].append(curr_cost)
if epoch % display_every == 0:
print("Epoch {} - cost: {}".format(epoch, curr_cost))
return bias, thetas, hist
```
%% Cell type:markdown id: tags:
### Apply Gradient Descent
> Apply the gradient descent algorithm. Use the learning rate 0.1. Plot the validation curve.
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
bias_2d, thetas_2d, hist_2d = fit(X_train, y_train, alpha=0.1, num_epochs=10000, display_every=1000)
plot_validation_curve(hist_2d["cost"])
```
%% Cell type:markdown id: tags:
### Plot decision boundary
%% Cell type:code id: tags:
``` python
print("decision boundary: %.3f + %.3f * x1 + %.3f * x2 = 0"
% (bias_2d, thetas_2d[0], thetas_2d[1]))
x1 = np.array(X_train[:,0].T)
x2 = np.array(X_train[:,1].T)
fig, ax = plt.subplots(1,1, figsize=(6,6))
ax.set_xlabel('days spent learning')
ax.set_ylabel('prior knowledge (in days)')
color = ['blue' if l == 0 else 'green' for l in y_train]
scat = ax.scatter(x1, x2, color=color)
# inline function for decision boundary (unless vertical)
y_ = lambda x: ((-1)*(bias_2d + thetas_2d[0]*x) / thetas_2d[1])
def plot_line(y, data_pts):
x_vals = [i for i in
range(int(min(data_pts)-1),
int(max(data_pts))+2)]
y_vals = [y(x) for x in x_vals]
plt.plot(x_vals,y_vals, 'r')
plot_line(y_, x1)
plt.show()
```
%% Cell type:markdown id: tags:
### Evaluation
How should we evaluate our result? Of course this is highly dependent on both our original business problem and the data at hand. Questions such as
* Does the evaluation result need to be explainable to management, without using formulas and technical terms?
* Do we have a high class imbalance?
* Are False Positives and False Negatives equally bad? Does one of the two incur a high cost for our business and needs to be avoided?
* How do we rate the confidence? Do we want to penalise a classifier when it classifies a sample wrongly but is very sure of this result?
We will look at the metrics Accuracy and F1-Score.
%% Cell type:markdown id: tags:
> Predict the data on the test set.
%% Cell type:code id: tags:
``` python
# y_pred = ...
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
y_pred = predict(X_test, bias_2d, thetas_2d)
y_pred = (y_pred > 0.5).astype(int)
y_pred
```
%% Cell type:markdown id: tags:
#### Confusion Matrix
First we compute and plot the confusion matrix using the utility methods `compute_confusion_matrix` and `plot_confusion_matrix`.
%% Cell type:code id: tags:
``` python
def compute_confusion_matrix(true, pred):
# number of classes
K = len(np.unique(true))
c_mat = np.zeros((K, K))
for i in range(len(true)):
c_mat[int(true[i])][int(pred[i])] += 1
return c_mat
def plot_confusion_matrix(cm):
fig, (ax1) = plt.subplots(ncols=1, figsize=(5,5))
sns.heatmap(cm,
xticklabels=['True', 'False'],
yticklabels=['True', 'False'],
annot=True,ax=ax1,
linewidths=.2,linecolor="Darkblue", cmap="Blues")
plt.title('Confusion Matrix', fontsize=14)
plt.show()
```
%% Cell type:code id: tags:
``` python
# cm = ...
# plot...
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
cm = compute_confusion_matrix(y_test, y_pred)
plot_confusion_matrix(cm)
```
%% Cell type:markdown id: tags:
> Finally calculate and print the accuracy and the f1 score.
%% Cell type:code id: tags:
``` python
def extract_scores(confusion_matrix):
"""
Extracts the tp, tn, fp, fn from the
confusion matrix.
"""
# true positive
tp = confusion_matrix[0][0]
# true negative
tn = confusion_matrix[1][1]
# false positive
fp = confusion_matrix[0][1]
# false negative
fn = confusion_matrix[1][0]
return tp, tn, fp, fn
def accuracy_score(confusion_matrix):
"""
Computes the accuracy from a confusion matrix.
"""
tp, tn, fp, fn = extract_scores(confusion_matrix)
acc = (tp + tn)/np.sum(confusion_matrix)
return acc
def f1_score(confusion_matrix):
"""
Computes the f1 score from a confusion matrix.
"""
tp, tn, fp, fn = extract_scores(confusion_matrix)
precision = tp/(tp+fp)
recall = tp/(tp+fn)
f1 = (2*precision*recall)/(precision+recall)
return f1
```
%% Cell type:code id: tags:
``` python
# accuracy = ...
# f1 = ...
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
accuracy = accuracy_score(cm)
f1 = f1_score(cm)
print ("test accuracy: %.2f" % accuracy)
print ("test f1 score: %.2f" % f1)
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
## Programming Assignment
> Solve the following Programming assignment and check your solution in the Illias Quiz **Classification - Notebook Verification**.
> Solve the following Programming assignment and check your solution in the Illias Quiz **05A Supervised Learning: Classification - Notebook Verification**.
%% Cell type:markdown id: tags:
In the previous examples you implemented Logistic Regression from scratch. Now you are going to repeat the calulcations using scikit-learn's implementation of Logistic Regression. Use the data of the Multiple Logistic Regression example.
In the previous examples you implemented Logistic Regression from scratch. Now you are going to repeat the calulcations using scikit-learn's implementation of Logistic Regression. Use the data of the Multiple Logistic Regression example also using the identical train/test splits.
%% Cell type:markdown id: tags:
Train the Logistic Regression, what is the score on the test set?
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
What is the standard metric of the score function of scikitlearn's implementation of LogisticRegression?
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Check your answers in the Ilias Quiz.
%% Cell type:markdown id: tags:
**Solution needs to be removed later**
%% Cell type:code id: tags:
``` python
from sklearn.linear_model import LogisticRegression
```
%% Cell type:code id: tags:
``` python
reg = LogisticRegression().fit(np.asarray(X_train), y_train)
```
%% Cell type:code id: tags:
``` python
reg
```
%% Cell type:code id: tags:
``` python
X_test
```
%% Cell type:code id: tags:
``` python
reg.predict(X_test)
```
%% Cell type:code id: tags:
``` python
reg.score(X_test, y_test)
```
%% Cell type:markdown id: tags:
What is the standard metric of the score function of scikitlearn's implementation of LogisticRegression?
%% Cell type:markdown id: tags:
Accuracy
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
**Solution of predict implementation**
%% Cell type:code id: tags:
``` python
def predict(X, theta0, theta1):
z = theta0 + theta1 * X
y_pred = sigmoid(z)
return y_pred
```
%% Cell type:markdown id: tags:
Solution
%% Cell type:markdown id: tags:
0.99828623
%% Cell type:code id: tags:
``` python
```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment