Prepare for FS24

f85117f4 · Solange Emmenegger · c4216495 · f85117f4 · f85117f4 · f85117f4
Commit f85117f4 authored 1 year ago by Solange Emmenegger
Hide whitespace changes
Inline Side-by-side

Showing

with 8 additions and 391 deletions
+8 -391
--- a/notebooks/04A Linear Regression/Linear_Regression_and_Regularization.ipynb
+++ b/notebooks/04A Linear Regression/Linear_Regression_and_Regularization.ipynb
 %% Cell type:markdown id: tags:

 # Linear Regression and Regularization

 %% Cell type:code id: tags:

 ``` python
 import pandas as pd
 from sklearn.model_selection import train_test_split
 import numpy as np
 from matplotlib import pyplot as plt
 from sklearn import preprocessing
 from sklearn.metrics import r2_score
 from sklearn.metrics import accuracy_score
 from sklearn.metrics import f1_score
 from sklearn.linear_model import LogisticRegression
 from sklearn.model_selection import GridSearchCV
 from sklearn.preprocessing import MinMaxScaler
 from sklearn.preprocessing import PolynomialFeatures
 from sklearn.preprocessing import RobustScaler

 import sklearn
 %matplotlib inline

 import ipywidgets as widgets
 from tqdm.notebook import tqdm

 import warnings
 # silence future deprecation warnings
 warnings.filterwarnings('ignore')
 ```

 %% Cell type:markdown id: tags:

 ## Prepare the data

 %% Cell type:markdown id: tags:

 Although linear regression is a linear machine learning method, you can have nonlinear dependencies if you transform some of the independent variables by a nonlinear function. By doing this, you can improve the fit of your method. Let us demonstrate this on a house price dataset from [Kaggle](https://www.kaggle.com/harlfoxem/housesalesprediction). Note that this dataset is not identical with one you used in the linear regression exercise, since the this dataset is too small and would cause unreliable evaluation results.

 %% Cell type:code id: tags:

 ``` python
 df_house = pd.read_csv("kc_house_data.csv")
 df_house.head()
 ```

 %% Cell type:markdown id: tags:

 We would like to have a simple linear regression problem with only one independent variable. Thus, we only keep *price* and *sqft_living*.

 %% Cell type:code id: tags:

 ``` python
 df_house = df_house[["price","sqft_living"]]
 df_house.head()
 ```

 %% Cell type:markdown id: tags:

 ### Split the data

 %% Cell type:markdown id: tags:

 We split the data into a training and test set

 %% Cell type:code id: tags:

 ``` python
 train_house, test_house = train_test_split(df_house, test_size=0.5, random_state=42)
 ```

 %% Cell type:markdown id: tags:

 ### Normalize the data
 Let us normalize the data by using *min-max normalization*

 %% Cell type:code id: tags:

 ``` python
 scaler = MinMaxScaler()

 train_house = pd.DataFrame(scaler.fit_transform(train_house), columns=train_house.columns, index=train_house.index)
 test_house = pd.DataFrame(scaler.transform(test_house), columns=test_house.columns, index=test_house.index)

 train_house.head()
 ```

 %% Cell type:code id: tags:

 ``` python
 X_train_house = train_house[["sqft_living"]]
 y_train_house = train_house[["price"]]

 X_test_house = test_house[["sqft_living"]]
 y_test_house = test_house[["price"]]
 ```

 %% Cell type:markdown id: tags:

 ## Bias term
 To account for the bias term, we add a column containing only ones.

 %% Cell type:code id: tags:

 ``` python
 X_train_house["bias"] = 1
 X_test_house["bias"] = 1

 # Force order
 X_train_house = X_train_house[["bias", "sqft_living"]]
 X_test_house = X_test_house[["bias", "sqft_living"]]

 X_train_house.head()
 ```

 %% Cell type:markdown id: tags:

 ## Fit a linear regression model
 Define a linear regression function to estimate the parameters $\theta$ based on the normal equation:

  $\Theta:=(X^{\top}X)^{-1}(X^{\top}y)$

 %% Cell type:code id: tags:

 ``` python
 def fit(X, y):
    # START YOUR CODE

    # END YOUR CODE
    return thetas
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def fit(X, y):
    thetas = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
    return thetas
 ```

 %% Cell type:markdown id: tags:

 Run the following code to check your implementation:

 %% Cell type:code id: tags:

 ``` python
 thetas = fit(X_train_house, y_train_house)

 expected_thetas = np.array([[7.39560812e-05], [4.94185750e-01]])
 np.testing.assert_array_almost_equal(thetas, expected_thetas, decimal=4)
 ```

 %% Cell type:markdown id: tags:

 ## Predict prices
 Using $X$ and the estimated $\theta$, predict the house prices on the training data

 %% Cell type:code id: tags:

 ``` python
 def predict(X, thetas):
    # START YOUR CODE

    # END YOUR CODE
    return y_pred
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def predict(X, thetas):
    y_pred = np.dot(X, thetas)
    return y_pred
 ```

 %% Cell type:code id: tags:

 ``` python
 y_pred_house = predict(X_train_house, thetas)
 y_pred_house
 ```

 %% Cell type:markdown id: tags:

 ## Visualize predictions
 Let us plot house prices and predicted house prices

 %% Cell type:code id: tags:

 ``` python
 def plot_regression_line(X, thetas, ax=None):
    if ax is None:
        fig, ax = plt.subplots()
    deg = len(thetas)-1
    poly = PolynomialFeatures(deg)

    xs = np.arange(X.min(), X.max()+0.1, 0.01).reshape(-1,1)
    x = poly.fit_transform(xs)
    y_pred = np.dot(x, thetas)

    ax.plot(xs, y_pred, color="r")
 ```

 %% Cell type:code id: tags:

 ``` python
 fig, ax = plt.subplots()
 ax.plot(X_train_house["sqft_living"].values, y_train_house.values, "bo", markersize=1)
 plot_regression_line(X_train_house["sqft_living"].values, thetas, ax)
 ```

 %% Cell type:markdown id: tags:

 ## Calculate model performance
 Now let's check how good our model performs by calculating the $R^2$ score on the test set.

 %% Cell type:code id: tags:

 ``` python
 # r2 = ...
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 y_pred_test_house = predict(X_test_house, thetas)
 r2_house = r2_score(y_test_house, y_pred_test_house)
 print("R2: ", r2_house)
 ```

 %% Cell type:markdown id: tags:

 ## Adding polynomial features

 %% Cell type:markdown id: tags:

 We aim to improve the fit by adding $x^2$ as additional independent variable.

 %% Cell type:code id: tags:

 ``` python
 X_train_deg2 = X_train_house.copy()
 X_train_deg2["sqft_living^2"] = X_train_deg2["sqft_living"] * X_train_deg2["sqft_living"]

 X_test_deg2 = X_test_house.copy()
 X_test_deg2["sqft_living^2"] = X_test_deg2["sqft_living"] * X_test_deg2["sqft_living"]
 X_test_deg2.head()
 ```

 %% Cell type:markdown id: tags:

 ### Fit the model with the additonal features

 %% Cell type:code id: tags:

 ``` python
 thetas_deg2 = fit(X_train_deg2, y_train_house)
 ```

 %% Cell type:markdown id: tags:

 ### Calculate the performance

 %% Cell type:code id: tags:

 ``` python
 # r2 =
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 y_pred_test_deg2 = predict(X_test_deg2, thetas_deg2)
 r2_deg2 = r2_score(y_test_house, y_pred_test_deg2)
 print("R2: ", r2_deg2)
 ```

 %% Cell type:markdown id: tags:

 As we can see, by adding $x^2$ as additional independent variable we could slightly improve our performance.

 %% Cell type:markdown id: tags:

 Let's try if we can further improve our performance by adding more polynomial features. To generate our polynomial features we will use the Scikit-Learn function [PolynomialFeatures](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html).

 %% Cell type:code id: tags:

 ``` python
 @widgets.interact(poly_deg =(1,18,1))
 def f(poly_deg=1):
    poly = PolynomialFeatures(poly_deg)
    X_train_deg = poly.fit_transform(X_train_house["sqft_living"].values.reshape(-1,1))
    X_test_deg = poly.transform(X_test_house["sqft_living"].values.reshape(-1,1))

    thetas_deg = fit(X_train_deg, y_train_house)

    y_pred_test = predict(X_test_deg, thetas_deg)
    y_pred_train = predict(X_train_deg, thetas_deg)

    r2_test = r2_score(y_test_house, y_pred_test)
    r2_train = r2_score(y_train_house, y_pred_train)
    print("R2 Train {0:.5f}".format(r2_train))
    print("R2 Test {0:.5f}".format(r2_test))

    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(20,10))
    ax0.set_title("Training data - polynomial degree {}".format(poly_deg))
    ax0.plot(X_train_house["sqft_living"], y_train_house["price"], "bo", markersize=1)
    plot_regression_line(X_train_deg, thetas_deg, ax0)

    ax1.set_title("Test data - polynomial degree {}".format(poly_deg))
    ax1.plot(X_test_house["sqft_living"], y_test_house["price"], "bo", markersize=1)
    plot_regression_line(X_test_deg, thetas_deg, ax1)
 ```

 %% Cell type:markdown id: tags:

 What do you recognize when you increase the polynomial degree?

 %% Cell type:markdown id: tags:

 > Answer the question on ILIAS

 %% Cell type:markdown id: tags:

 ## Regularization

 %% Cell type:markdown id: tags:

 The effect of overfitting can be reduced by regularization. Implement the regularized version of linear regression: $\Theta:=(X^{\top}X+\lambda \begin{bmatrix}
    0  & 0 &\ldots&0 \\
    0 & 1 & \\
    \ldots & & \ddots & \\
    0& & & 1
  \end{bmatrix} )^{-1}(X^{\top}y)$

 %% Cell type:code id: tags:

 ``` python
 def fit_reg(X, y, lam):
    # START YOUR CODE

    # END YOUR CODE
    return thetas
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def fit_reg(X, y, lam):
    Xt = np.transpose(X)
    XtX = np.dot(Xt,X)
    I = np.identity(XtX.shape[0])
    I[0,0] = 0
    XtX = XtX + (lam * I)
    XtXm1 = np.linalg.inv(XtX)
    Xty = np.dot(Xt,y)
    thetas = np.dot(XtXm1,Xty)
    return thetas
 ```

 %% Cell type:markdown id: tags:

 You can check your implementation by executing the following cell:

 %% Cell type:code id: tags:

 ``` python
 expected_thetas = np.array([[0.00178927], [0.48482755]])
 actual_thetas = fit_reg(X_train_house, y_train_house, lam=2)

 np.testing.assert_array_almost_equal(expected_thetas, actual_thetas)
 ```

 %% Cell type:markdown id: tags:

 We  plot the graph using the regularized parameter vectors. As you can see, the effect of overfitting is strongly reduced.

 %% Cell type:code id: tags:

 ``` python
 @widgets.interact(poly_deg = (0,12,1), lam=(0,100,1))
 def f(poly_deg=1, lam=4):
    poly = PolynomialFeatures(poly_deg)
    X_train_deg = poly.fit_transform(X_train_house["sqft_living"].values.reshape(-1,1))
    X_test_deg = poly.transform(X_test_house["sqft_living"].values.reshape(-1,1))

    thetas_deg = fit_reg(X_train_deg, y_train_house, lam=lam)

    y_pred_test = predict(X_test_deg, thetas_deg)
    y_pred_train = predict(X_train_deg, thetas_deg)

    r2_test = r2_score(y_test_house, y_pred_test)
    r2_train = r2_score(y_train_house, y_pred_train)
    print("R2 Train", r2_train)
    print("R2 Test", r2_test)

    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(20,10))
    ax0.set_title("Training data - polynomial degree {}".format(poly_deg))
    ax0.plot(X_train_house["sqft_living"], y_train_house["price"], "bo", markersize=1)
    plot_regression_line(X_train_deg, thetas_deg, ax0)

    ax0.set_title("Test data - polynomial degree {}".format(poly_deg))
    ax1.plot(X_test_house["sqft_living"], y_test_house["price"], "bo", markersize=1)
    plot_regression_line(X_test_deg, thetas_deg, ax1)
 ```

 %% Cell type:markdown id: tags:

 Find the best configuration of **polynomial degree** and $\lambda$

 %% Cell type:markdown id: tags:

 <font color='red'>PLEASE REPLACE TEXT WITH YOUR CONFIGURATION</font>

 %% Cell type:markdown id: tags:

 ## Regularization to help with numerical issues

 %% Cell type:markdown id: tags:

 Another benefit of regularization is that it can help in case of numerical issues. Let us consider our original dataset.

 %% Cell type:code id: tags:

 ``` python
 df_house_2 = pd.read_csv("kc_house_data.csv")
 df_house_2 = df_house_2[["price","sqft_living","bedrooms"]]
 df_house_2.head()
 ```

 %% Cell type:code id: tags:

 ``` python
 train_house_2, test_house_2 = train_test_split(df_house_2, test_size=0.5, random_state=42)
 ```

 %% Cell type:code id: tags:

 ``` python
 scaler = MinMaxScaler()

 train_house_2 = pd.DataFrame(scaler.fit_transform(train_house_2), columns=train_house_2.columns, index=train_house_2.index)
 test_house_2 = pd.DataFrame(scaler.transform(test_house_2), columns=test_house_2.columns, index=test_house_2.index)

 test_house_2.head()
 ```

 %% Cell type:markdown id: tags:

 To make the feature matrix $X^{\top}X$ singular, we just add  another independent variable (Size2) to X
 that amounts to just twice the Size.

 %% Cell type:code id: tags:

 ``` python
 train_house_2["sqft_living2"] = 2 * train_house_2["sqft_living"]
 train_house_2["bias"] = 1

 test_house_2["sqft_living2"]= 2 * test_house_2["sqft_living"]
 test_house_2["bias"] = 1

 test_house_2.head()
 ```

 %% Cell type:code id: tags:

 ``` python
 X_train_house_2 = train_house_2[["bias", "sqft_living", "bedrooms", "sqft_living2"]]
 y_train_house_2 = train_house_2[["price"]]

 X_test_house_2 = test_house_2[["bias", "sqft_living", "bedrooms", "sqft_living2"]]
 y_test_house_2 = test_house_2[["price"]]
 ```

 %% Cell type:markdown id: tags:

 We see that the linear regression fails, since $X^{\top}X$ is not invertible.

 %% Cell type:code id: tags:

 ``` python
 thetas = fit(X_train_house_2, y_train_house_2)
 ```

 %% Cell type:markdown id: tags:

 There are two possiblities to tackle this issue, the first one is to use the pseudoinverse instead of the inverse
 and the second one is using regularization.

 > Try out both.

 *Hint*: For conducting linear regression with the pseudoinverse, you have to slightly modify the linear_regression method given further above.
 The numpy function [np.linalg.pinv](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.inv.html) becomes handy for this.

 %% Cell type:code id: tags:

 ``` python
 def fit_pseudoinverse(X,y):
    # START YOUR CODE

    # END YOUR CODE
    return thetas
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def fit_pseudoinverse(X, y):
    thetas = np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y)
    return thetas
 ```

 %% Cell type:markdown id: tags:

 Run this code to check your implementation

 %% Cell type:code id: tags:

 ``` python
 thetas_pseudo_inverse = fit_pseudoinverse(X_train_house_2, y_train_house_2)
 print ("thetas obtained by linear regression with pseudoinverse:\n", thetas_pseudo_inverse)

 expected_thetas_pseudo_inverse = np.array([
    [ 0.02902459],
    [ 0.11220321],
    [-0.12253607],
    [ 0.22440641]])

 np.testing.assert_array_almost_equal(thetas_pseudo_inverse, expected_thetas_pseudo_inverse, decimal=5)
 ```

 %% Cell type:code id: tags:

 ``` python
 thetas_regularization = fit_reg(X_train_house_2, y_train_house_2, lam=1)
 print ("thetas obtained by linear regression with regularization:\n", thetas_regularization)

 expected_thetas_regularization = np.array([
    [ 0.02846346],
    [ 0.11163748],
    [-0.11932519],
    [ 0.22327497]])

 np.testing.assert_array_almost_equal(thetas_regularization, expected_thetas_regularization, decimal=5)
 ```

 %% Cell type:markdown id: tags:

 ## Programming Assignment
 > Solve the following Programming assignment and check your solution in the Illias Quiz **Linear Regression and Regularization - Notebook Verification**.

 %% Cell type:markdown id: tags:

 Before you implemented Linear Regression from Scratch in this Programming assignment you are asked to use the scikit-learn implementation of the Linear Regression. [Scikit-learn Documentation](https://scikit-learn.org/stable/). Use the same data as before.

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 Use the following features: bedrooms, bathrooms, sqft_living, yr_built and grade

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 Use the same train/test split as in the previous examples

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 Import and train the sklearn Linear Regression

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 Calculate the train and test score (threse is a function for the regressor)

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 Put the test score in the Ilias Quiz 04a Notebook Verification

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 Also answer if this performs better than the ones calculated previously.

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:code id: tags:

 ``` python
 ```
-
-%% Cell type:code id: tags:
-
-``` python
-```
-
-%% Cell type:markdown id: tags:
-
-**!!!!!Solution needs to be deleted and transferred before being published!!!!**
-
-%% Cell type:code id: tags:
-
-``` python
-df_house3 = pd.read_csv("kc_house_data.csv")
-```
-
-%% Cell type:markdown id: tags:
-
-Use the following Features: bedrooms, bathrooms, sqft_living, yr_built, grade
-
-%% Cell type:code id: tags:
-
-``` python
-df_house3 = df_house3[["price","sqft_living", "bedrooms", "bathrooms", "yr_built", "grade"]]
-```
-
-%% Cell type:markdown id: tags:
-
-Use the same train test split as in the previous examples.
-
-%% Cell type:code id: tags:
-
-``` python
-train_house3, test_house3 = train_test_split(df_house3, test_size=0.5, random_state=42)
-```
-
-%% Cell type:markdown id: tags:
-
-Train the sklearn Linear Regression
-
-%% Cell type:code id: tags:
-
-``` python
-scaler = MinMaxScaler()
-
-train_house3 = pd.DataFrame(scaler.fit_transform(train_house3), columns=train_house3.columns, index=train_house3.index)
-test_house3 = pd.DataFrame(scaler.transform(test_house3), columns=test_house3.columns, index=test_house3.index)
-```
-
-%% Cell type:code id: tags:
-
-``` python
-X_train_house_3 = train_house3[["sqft_living", "bedrooms", "bathrooms","yr_built", "grade"]]
-y_train_house_3 = train_house3[["price"]]
-
-X_test_house_3 = test_house3[["sqft_living", "bedrooms", "bathrooms","yr_built", "grade"]]
-y_test_house_3 = test_house3[["price"]]
-```
-
-%% Cell type:code id: tags:
-
-``` python
-from sklearn.linear_model import LinearRegression
-```
-
-%% Cell type:code id: tags:
-
-``` python
-reg = LinearRegression().fit(X_train_house_3, y_train_house_3)
-```
-
-%% Cell type:markdown id: tags:
-
-Calculate the Train and Test score
-
-%% Cell type:code id: tags:
-
-``` python
-print("Train score: ", reg.score(X_train_house_3,y_train_house_3))
-```
-
-%% Cell type:code id: tags:
-
-``` python
-print("Test score: ", reg.score(X_test_house_3,y_test_house_3))
-```
-
-%% Cell type:markdown id: tags:
-
-**!!!! Evtl add question about how bias can be automatically added to the sklearn methode**
-
-%% Cell type:code id: tags:
-
-``` python
-```
-
-%% Cell type:code id: tags:
-
-``` python
-```

--- a/notebooks/04B Gradient Descent/Gradient Descent.ipynb
+++ b/notebooks/04B Gradient Descent/Gradient Descent.ipynb
 %% Cell type:markdown id: tags:

 # Gradient Descent
 This notebook demonstrates the gradient descent approach to determine the best fitting parameters by linear regression.


 %% Cell type:code id: tags:

 ``` python
 import numpy as np
 import pandas as pd
 from matplotlib import pyplot as plt
 import seaborn as sns
 import sklearn
 import sklearn.decomposition
 import math
 from sklearn import preprocessing

 import matplotlib
 import matplotlib.mlab as mlab
 from sklearn.model_selection import train_test_split
 from sklearn.preprocessing import StandardScaler
 from sklearn.metrics import r2_score
 from collections import defaultdict

 import warnings
 warnings.filterwarnings("ignore", category=FutureWarning)

 from tqdm.notebook import tqdm
 from ipywidgets import interact
 %matplotlib inline
 ```

 %% Cell type:markdown id: tags:

 ## Part 1 - Toy Example
 Firstly, we demonstrate gradient descent on a simple linear regression problem with one dependent and one independent variable.

 %% Cell type:code id: tags:

 ``` python
 X = np.array([1,1,2,3,4,5,6,7,8,9,10,10])
 y = np.array([1,2,3,1,4,5,6,4,7,10,15,9])
 ```

 %% Cell type:markdown id: tags:

 x and y values are plotted in a diagram.

 %% Cell type:code id: tags:

 ``` python
 plt.plot(X, y, 'bo')
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 We then try to fit the points by a straight line.

 %% Cell type:code id: tags:

 ``` python
 theta0 = -0.5
 theta1 = 1
 ```

 %% Cell type:code id: tags:

 ``` python
 def predict(X, theta0, theta1):
    y_pred = theta0 + theta1 * X
    return y_pred

 y_pred = predict(X, theta0, theta1)
 ```

 %% Cell type:code id: tags:

 ``` python
 def plot_regression_line(X, theta0, theta1, ax=None):
    if ax is None:
        fig, ax = plt.subplots()
    x = np.arange(X.min()-1, X.max()+1, 1).reshape(-1,1)
    y_pred = predict(x, theta0, theta1)
    ax.plot(x, y_pred, color="r")

 ax = sns.scatterplot(X, y)
 plot_regression_line(X, theta0, theta1, ax)
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 This does not look so bad. Let's implement a gradient descent algorithm to do this automatically.

 %% Cell type:markdown id: tags:

 ### Cost function
 We define a cost function that determines the mean squared error of the predicted and the actual y coordinates. To get rid of the factor 2 in the gradient
 formula, we divide the sum by 2.

 %% Cell type:markdown id: tags:

 > Implement the MSE cost function

 %% Cell type:code id: tags:

 ``` python
 def cost(y, y_pred):
    # START YOUR CODE

    # END YOUR CODE
    return cost
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def cost(y, y_pred):
    cost = np.sum((y_pred - y) ** 2) / (2 * len(y))
    return cost
 ```

 %% Cell type:code id: tags:

 ``` python
 cost(y, y_pred)
 ```

 %% Cell type:markdown id: tags:

 ### Calculate gradient

 Next, let us determine the gradient of y in respect to the parameters.

 %% Cell type:markdown id: tags:

 **Programming Assignment - Verification on Ilias**

 %% Cell type:markdown id: tags:

 > Implement the `gradient` function

 %% Cell type:code id: tags:

 ``` python
 def gradient(X, y, theta0, theta1):
    # START YOUR CODE



    # END YOUR CODE
    return grad_theta0, grad_theta1
 ```

 %% Cell type:markdown id: tags:

 *Hint: Carefully look at the definition of the cost function of Linear Regression, to calculate the gradient & take care of dimensions*

 %% Cell type:markdown id: tags:

-**Report the value of the gradients on Ilias**
+**Report the value of the gradients in the Ilias Quiz 04B Notebook Verification**

 %% Cell type:code id: tags:

 ``` python
 gradient(X, y, theta0, theta1)
 ```

 %% Cell type:markdown id: tags:

 ### Batch Gradient descent

 %% Cell type:markdown id: tags:

 > Now complete the `fit` function by iteratively updating our model parameters.

 To visualize how the parameters and cost functions change with each epoch, we store them in a dictionary.

 %% Cell type:code id: tags:

 ``` python
 def fit(X, y, alpha, num_epochs, display_every=10):
    theta0 = 0.0
    theta1 = np.random.randn()

    hist = defaultdict(list)
    for epoch in tqdm(range(1, num_epochs + 1)):
        # START YOUR CODE






        # END YOUR CODE
        y_pred = predict(X, theta0, theta1)
        curr_cost = cost(y, y_pred)

        hist["cost"].append(curr_cost)
        hist["theta0"].append(theta0)
        hist["theta1"].append(theta1)

        if epoch % display_every == 0:
            print("Epoch {} -  cost: {}".format(epoch, curr_cost))

    return theta0, theta1, hist
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def fit(X, y, alpha, num_epochs, display_every=10):
    theta0 = 0.0
    theta1 = np.random.randn()

    hist = defaultdict(list)
    for epoch in tqdm(range(1, num_epochs + 1)):
        grad_theta0, grad_theta1 = gradient(X, y, theta0, theta1)
        theta0 = theta0 - alpha * grad_theta0
        theta1 = theta1 - alpha * grad_theta1

        y_pred = predict(X, theta0, theta1)
        curr_cost = cost(y, y_pred)

        hist["cost"].append(curr_cost)
        hist["theta0"].append(theta0)
        hist["theta1"].append(theta1)

        if epoch % display_every == 0:
            print("Epoch {} -  cost: {}".format(epoch, curr_cost))

    return theta0, theta1, hist
 ```

 %% Cell type:code id: tags:

 ``` python
 alpha = 0.01
 num_epochs = 20

 theta0, theta1, hist = fit(X, y, alpha, num_epochs, display_every=2)
 ```

 %% Cell type:markdown id: tags:

 ### Visualize learning
 We can now visualize the learning process by plotting the validation curve. The validation curve shows how the cost decreases by increasing number of epochs.

 %% Cell type:code id: tags:

 ``` python
 def plot_validation_curve(data, ax=None, ylim=None):
    if ax is None:
        fig, ax = plt.subplots()
        ax.set_title("Validation Curve")
        ax.set_ylabel("Cost")
    if ylim is not None:
        ax.set_ylim(ylim)
    ax.set_xlabel("Epochs")
    ax.plot(data)

 plot_validation_curve(hist["cost"])
 ```

 %% Cell type:markdown id: tags:

 Using our history, we can now visualize how the parameters change by each epoch.

 %% Cell type:code id: tags:

 ``` python
 @interact(epoch=(1, len(hist["theta0"])))
 def visualize_learning(epoch=1):
    ax = sns.scatterplot(X, y)
    plot_regression_line(X, hist["theta0"][epoch-1], hist["theta1"][epoch-1], ax)
    plt.show()
 ```

 %% Cell type:markdown id: tags:

 ### Contour plot
 We can visualize how our model parameters $\Theta$ change after each epoch by displaying a contour plot.

 %% Cell type:code id: tags:

 ``` python
 def parallel_cost(Theta0, Theta1, X, y):
    m = Theta0.shape[0]
    n = Theta0.shape[1]
    tot = np.zeros((m,n))
    for i in range(1,len(X)):
        tot += (Theta0 + Theta1 * X[i] - y[i]) ** 2;
    return tot/(2*len(X))
 ```

 %% Cell type:code id: tags:

 ``` python
 matplotlib.rcParams['xtick.direction'] = 'out'
 matplotlib.rcParams['ytick.direction'] = 'out'

 def contour_plot_zoomed(X, y, ax=None):
    if ax is None:
        fig, ax = plt.subplots(figsize=(12,8))
    delta = 0.025
    t0 = np.arange(-0.5, 0.5, delta)
    t1 = np.arange(0.5, 1.5, delta)
    T0, T1 = np.meshgrid(t0, t1)
    Z = parallel_cost(T0, T1, X, y)
    CS = ax.contour(T0, T1, Z, levels = [0.25,0.5,1,2,3])
    ax.clabel(CS, inline=1, fontsize=10)
    ax.set_title('Contour plot')
    ax.set_xlabel(r'$\theta_0$')
    ax.set_ylabel(r'$\theta_1$')
    return ax
 ```

 %% Cell type:code id: tags:

 ``` python
 @interact(epoch=(1, len(hist["theta0"])))
 def visualize_contour_plot(epoch=1):
    ax = contour_plot_zoomed(X, y)
    for i in range(epoch):
        theta0 = hist["theta0"][i]
        theta1 = hist["theta1"][i]
        ax.plot(theta0, theta1, "ro", linewidth=9)
        if i == 0:
            continue

        theta0_prev = hist["theta0"][i-1]
        theta1_prev = hist["theta1"][i-1]

        ax.annotate('', xy=[theta0, theta1], xytext=[theta0_prev, theta1_prev],
                   arrowprops={'arrowstyle': '->', 'color': 'r', 'lw': 1},
                   va='center', ha='center')
    plt.show()
 ```

 %% Cell type:markdown id: tags:

 ### Normalise data
 Let's run the experiment above again but this time first normalise the data and see what happens.
 We use the `StandardScaler` which implements z-normalisation.

 %% Cell type:code id: tags:

 ``` python
 scaler = StandardScaler()
 X_scaled = scaler.fit_transform(X.reshape(-1, 1)).reshape(-1)
 X_scaled
 ```

 %% Cell type:markdown id: tags:

 #### Apply gradient descent algorithm on normalised data.

 %% Cell type:code id: tags:

 ``` python
 alpha = 0.01
 num_epochs = 20

 theta0, theta1, hist_scaled = fit(X_scaled, y, alpha, num_epochs, display_every=2)
 plot_validation_curve(hist_scaled["cost"])
 ```

 %% Cell type:markdown id: tags:

 It seems like it did not converge yet. Let's increase the learning rate $\alpha$ and the number of epochs and run it again.

 %% Cell type:code id: tags:

 ``` python
 alpha = 0.1
 num_epochs = 50

 theta0, theta1, hist_scaled = fit(X_scaled, y, alpha, num_epochs, display_every=5)
 plot_validation_curve(hist_scaled["cost"])
 ```

 %% Cell type:markdown id: tags:

 That looks much better now. Okay, let's plot the contours.

 %% Cell type:code id: tags:

 ``` python
 def contour_plot(X, y, ax=None):
    if ax is None:
        fig, ax = plt.subplots(figsize=(12,8))
    delta = 0.025
    t0 = np.arange(0, 9, delta)
    t1 = np.arange(0, 9, delta)
    T0, T1 = np.meshgrid(t0, t1)
    Z = parallel_cost(T0, T1, X, y)
    CS = ax.contour(T0, T1, Z, levels = [1,2,3,4,5,6])
    ax.clabel(CS, inline=1, fontsize=10)
    ax.set_title('Contour plot')
    ax.set_xlabel(r'$\theta_0$')
    ax.set_ylabel(r'$\theta_1$')
    return ax
 ```

 %% Cell type:code id: tags:

 ``` python
 @interact(epoch=(1, len(hist_scaled["theta0"])))
 def visualize_contour_plot(epoch=1):
    ax = contour_plot(X_scaled, y)
    for i in range(epoch):
        theta0 = hist_scaled["theta0"][i]
        theta1 = hist_scaled["theta1"][i]
        ax.plot(theta0, theta1, "ro", linewidth=9)
        if i == 0:
            continue

        theta0_prev = hist_scaled["theta0"][i-1]
        theta1_prev = hist_scaled["theta1"][i-1]

        ax.annotate('', xy=[theta0, theta1], xytext=[theta0_prev, theta1_prev],
                   arrowprops={'arrowstyle': '->', 'color': 'r', 'lw': 1},
                   va='center', ha='center')
    plt.show()
 ```

 %% Cell type:markdown id: tags:

 The contours are not as narrow as before.

 <span style="color:red">
    Make sure that you never forget to scale your data before applying the gradient descent algorithm!</span>

 %% Cell type:markdown id: tags:

 ## Part 2 - House prices data set
 Now that we have tested our functions with our toy datset, let's move to a the house price dataset.

 %% Cell type:code id: tags:

 ``` python
 df_house = pd.read_csv('house_prices.csv')
 df_house.head()
 ```

 %% Cell type:markdown id: tags:

 We want to predict the price of a house based on its size.

 %% Cell type:markdown id: tags:

 Let's split the feature from the target variable.

 %% Cell type:code id: tags:

 ``` python
 X_house = df_house[["Size"]].values
 y_house = df_house.Price.values
 ```

 %% Cell type:markdown id: tags:

 Next, we further split the data into a training and test set.

 %% Cell type:code id: tags:

 ``` python
 split = train_test_split(X_house, y_house, test_size=0.2, random_state=42)
 (X_train_house, X_test_house, y_train_house, y_test_house) = split
 X_train_house = X_train_house.reshape(-1)
 X_test_house = X_test_house.reshape(-1)
 ```

 %% Cell type:markdown id: tags:

 Here we visualize our training data in a scatter plot.

 %% Cell type:code id: tags:

 ``` python
 sns.scatterplot(X_train_house.reshape(-1), y_train_house)
 ```

 %% Cell type:markdown id: tags:

 #### Apply Batch Gradient Descent
 Let's use our implemented `fit` method to apply batch gradient descent to the house price dataset and see what happens.

 %% Cell type:code id: tags:

 ``` python
 alpha = 0.01
 num_epochs = 300

 theta0, theta1, hist_house = fit(X_train_house, y_train_house, alpha, num_epochs, display_every=20)
 plot_validation_curve(hist_house["cost"])
 ```

 %% Cell type:markdown id: tags:

 It seems like our gradient descent algorithm does not converge!

 > Why did that happen?

 %% Cell type:markdown id: tags:

 ### Scaling the data
 Let's try it again but this time we will scale the data accordingly.

 %% Cell type:code id: tags:

 ``` python
 X_house_scaled = df_house[["Size"]].values
 y_house_scaled = df_house.Price.values

 split = train_test_split(X_house_scaled, y_house_scaled, test_size=0.2, random_state=42)
 (X_train_house_scaled, X_test_house_scaled, y_train_house_scaled, y_test_house_scaled) = split
 ```

 %% Cell type:markdown id: tags:

 > z-normalise the training and test data by using the [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)

 %% Cell type:code id: tags:

 ``` python
 # z-normalise the training and test data.
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 scaler = StandardScaler()
 X_train_house_scaled = scaler.fit_transform(X_train_house_scaled).reshape(-1)
 X_test_house_scaled = scaler.transform(X_test_house_scaled)
 ```

 %% Cell type:markdown id: tags:

 Now we apply the gradient descent algorithm again.

 %% Cell type:code id: tags:

 ``` python
 alpha = 0.01
 num_epochs = 300

 theta0, theta1, hist_house_scaled = fit(X_train_house_scaled, y_train_house_scaled, alpha,
                                        num_epochs, display_every=20)
 plot_validation_curve(hist_house_scaled["cost"])
 ```

 %% Cell type:markdown id: tags:

 Our validation curve looks much better now. We see that the cost converges after a few epochs.

 %% Cell type:markdown id: tags:

 Again we can visualize how our regression line looks after each epoch.

 %% Cell type:code id: tags:

 ``` python
 @interact(epoch=(1, len(hist_house_scaled["theta0"])))
 def visualize_learning(epoch=1):
    ax = sns.scatterplot(X_train_house_scaled, y_train_house_scaled)
    plot_regression_line(X_train_house_scaled,
                         hist_house_scaled["theta0"][epoch-1],
                         hist_house_scaled["theta1"][epoch-1], ax)
    plt.show()
 ```

 %% Cell type:markdown id: tags:

 ### Calculate metrics on the test set
 > Now calculate the $R^2$ score on the test set by using the previously implemented `predict` function.

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 y_pred_house_scaled = predict(X_test_house_scaled, theta0, theta1)
 r2 = r2_score(y_test_house_scaled, y_pred_house_scaled)
 print("R2:", r2)
 ```

 %% Cell type:markdown id: tags:

 ## Part 3 -  Autoscout data set
 We extend our code for multiple linear regression. We will use the autoscout dataset from the previous exercises. First we apply the data cleaning and then z-Normalise our data.

 %% Cell type:code id: tags:

 ``` python
 df_autoscout = pd.read_csv('cars.csv')
 df_autoscout.drop(['Name', 'Registration'], axis='columns', inplace=True)
 df_autoscout.drop([17010, 7734, 47002, 44369, 24720, 50574, 36542, 42611,
         22513, 12773, 21501, 2424, 52910, 29735, 43004, 47125], axis='rows', inplace=True)
 df_autoscout.drop(df_autoscout.index[df_autoscout.EngineSize > 7500], axis='rows', inplace=True)
 df_autoscout.drop_duplicates(inplace=True)
 df_autoscout.head()

 numerical_cols = ['Price', 'Mileage', 'Horsepower', 'EngineSize']

 df_autoscout = pd.get_dummies(df_autoscout)

 train_autoscout, test_autoscout = train_test_split(df_autoscout, test_size=0.4, random_state=42)

 q3 = train_autoscout.loc[:, numerical_cols].describe().loc['75%']
 iqr = q3 - df_autoscout.loc[:, numerical_cols].describe().loc['25%']
 upper_boundary = q3 + 1.5*iqr
 upper_boundary

 # And here the outliers are removed
 train_autoscout = train_autoscout[(train_autoscout.Price <= upper_boundary.Price) &
        (train_autoscout.Mileage <= upper_boundary.Mileage) &
        (train_autoscout.Horsepower <= upper_boundary.Horsepower) &
        (train_autoscout.EngineSize <= upper_boundary.EngineSize)]

 test_autoscout = test_autoscout[(test_autoscout.Price <= upper_boundary.Price) &
        (test_autoscout.Mileage <= upper_boundary.Mileage) &
        (test_autoscout.Horsepower <= upper_boundary.Horsepower) &
        (test_autoscout.EngineSize <= upper_boundary.EngineSize)]

 X_train_autoscout = train_autoscout.drop(columns=["Price"]).values
 X_test_autoscout = test_autoscout.drop(columns=["Price"]).values

 y_train_autoscout = train_autoscout.Price.values
 y_test_autoscout = test_autoscout.Price.values

 # z-Normalise the data
 scaler = StandardScaler()
 X_train_autoscout = scaler.fit_transform(X_train_autoscout)
 X_test_autoscout = scaler.transform(X_test_autoscout)
 ```

 %% Cell type:markdown id: tags:

 We modify our predict function that instead of providing $\theta_0$ and $\theta_1$ we now provide the bias ($\theta_0$) and the other parameters $\Theta$ as an array.
 > Implement the `predict` function

 %% Cell type:code id: tags:

 ``` python
 def predict(X, bias, thetas):
    # START YOUR CODE

    # END YOUR CODE
    return y_pred
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def predict(X, bias, thetas):
    y_pred = bias + np.dot(X, thetas)
    return y_pred
 ```

 %% Cell type:markdown id: tags:

 > Implement the `gradient` function

 %% Cell type:code id: tags:

 ``` python
 def gradient(X, y, bias, thetas):
    # START YOUR CODE



    # END YOUR CODE
    return grad_bias, grad_thetas
 ```

 %% Cell type:markdown id: tags:

 Click on the dots to display the solution

 %% Cell type:code id: tags:

 ``` python
 def gradient(X, y, bias, thetas):
    y_pred = predict(X, bias, thetas)
    diff = y_pred - y

    n = len(X)
    grad_bias = np.sum(diff) / n
    grad_thetas = np.dot(diff, X) / n

    return grad_bias, grad_thetas
 ```

 %% Cell type:markdown id: tags:

 We extend our `fit` function by tracking not only the cost but also the $R^2$ score.

 %% Cell type:code id: tags:

 ``` python
 def fit(X_train, y_train, alpha, num_epochs, display_every=50):
    bias = 0.0
    thetas = np.random.randn(*(1, X_train.shape[1])).reshape(-1)

    hist = defaultdict(list)
    for epoch in tqdm(range(1, num_epochs+1)):
        grad_bias, grad_thetas = gradient(X_train, y_train, bias, thetas)
        bias = bias - alpha * grad_bias
        thetas = thetas - alpha * grad_thetas

        y_pred_train = predict(X_train, bias, thetas)
        train_cost = cost(y_train, y_pred_train)
        train_r2 = r2_score(y_train, y_pred_train)

        hist["train_cost"].append(train_cost)
        hist["train_r2"].append(train_r2)

        if epoch % display_every == 0:
            print("Epoch {0} - cost: {1:.2} - r2: {2:.4}"
                  .format(epoch, train_cost, train_r2))

    return bias, thetas, hist
 ```

 %% Cell type:code id: tags:

 ``` python
 alpha = 0.01
 num_epochs = 1000
 bias, thetas, hist_autoscout = fit(X_train_autoscout, y_train_autoscout, alpha, num_epochs)
 ```

 %% Cell type:code id: tags:

 ``` python
 def plot_validation_curves(hist, ylim=None):
    fig, ax = plt.subplots(ncols=2, figsize=(16,5))

    ax[0].set_title("Train Cost")
    ax[0].set_ylabel("Cost")
    plot_validation_curve(hist["train_cost"], ax[0], ylim)

    ax[1].set_title("Train R2")
    ax[1].set_ylabel("R2")
    ax[1].set_ylim(-1, 1)
    plot_validation_curve(hist["train_r2"], ax[1])

    plt.tight_layout()

 plot_validation_curves(hist_autoscout)
 ```

 %% Cell type:markdown id: tags:

 ### Calculate metrics on test set
 Now we calculate the $R^2$ score on the test set.

 %% Cell type:code id: tags:

 ``` python
 y_pred_autoscout = predict(X_test_autoscout, bias, thetas)
 r2 = r2_score(y_test_autoscout, y_pred_autoscout)
 print("R2:", r2)
 ```

 %% Cell type:markdown id: tags:

 Compared to the previous exercise where we calculated the estimates for the $\Theta$ numerically using the normal equation we got almost the same result with the gradient descent algorithm.

 %% Cell type:markdown id: tags:

 ### Minibatch Gradient Descent

 %% Cell type:markdown id: tags:

 > Now  modify our `fit` function to use mini batch gradient descent. So instead of calculating the gradient on the whole dataset on each step, only use a subset of the data.

 %% Cell type:code id: tags:

 ``` python
 def fit(X_train, y_train, alpha, num_epochs, batch_size, display_every=50):
    bias = 0.0
    thetas = np.random.randn(*(1, X_train.shape[1])).reshape(-1)
    hist = defaultdict(list)

    indices_train = np.arange(len(X_train))

    num_samples = len(X_train)
    steps = int(num_samples/batch_size)

    for epoch in tqdm(range(1, num_epochs + 1)):
        # randomize inputs
        np.random.shuffle(indices_train)

        X_train_epoch = X_train[indices_train]
        y_train_epoch = y_train[indices_train]

        for step in range(steps):
            start = step * batch_size
            end = step * batch_size + batch_size

            X_train_mini = X_train_epoch[start:end]
            y_train_mini = y_train_epoch[start:end]

            # START YOUR CODE
            # Apply gradient descent



            # END YOUR CODE

        y_pred_train = predict(X_train, bias, thetas)

        train_cost = cost(y_train, y_pred_train)
        train_r2 = r2_score(y_train, y_pred_train)

        hist["train_cost"].append(train_cost)
        hist["train_r2"].append(train_r2)

        if epoch % display_every == 0 or epoch == num_epochs:
            print("Epoch {0} - train_cost: {1:.2} - train_r2: {2:.4}".format(epoch, train_cost, train_r2))

    return bias, thetas, hist
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def fit(X_train, y_train, alpha, num_epochs, batch_size, display_every=50):
    bias = 0.0
    thetas = np.random.randn(*(1, X_train.shape[1])).reshape(-1)
    hist = defaultdict(list)

    indices_train = np.arange(len(X_train))

    num_samples = len(X_train)
    steps = int(num_samples/batch_size)

    for epoch in tqdm(range(1, num_epochs + 1)):
        # randomize inputs
        np.random.shuffle(indices_train)

        X_train_epoch = X_train[indices_train]
        y_train_epoch = y_train[indices_train]

        for step in range(steps):
            start = step * batch_size
            end = step * batch_size + batch_size

            X_train_mini = X_train_epoch[start:end]
            y_train_mini = y_train_epoch[start:end]

            grad_bias, grad_thetas = gradient(X_train_mini, y_train_mini, bias, thetas)
            bias = bias - alpha * grad_bias
            thetas = thetas - alpha * grad_thetas

        y_pred_train = predict(X_train, bias, thetas)

        train_cost = cost(y_train, y_pred_train)
        train_r2 = r2_score(y_train, y_pred_train)

        hist["train_cost"].append(train_cost)
        hist["train_r2"].append(train_r2)

        if epoch % display_every == 0 or epoch == num_epochs:
            print("Epoch {0} - train_cost: {1:.2} - train_r2: {2:.4}".format(epoch, train_cost, train_r2))

    return bias, thetas, hist
 ```

 %% Cell type:markdown id: tags:

 Wo have now introduced an additional hyperparameter `batch_size`.
 * If we set `batch_size` equal to 1, we use Stochastic Gradient Desccent: We update our model parameters $\Theta$ for each training example.
 * If we set `batch_size` equal to to the number of training samples we have again Batch Gradient Descent: We use all training samples to update the model parameters $\Theta$.

 %% Cell type:markdown id: tags:

 #### Batch Gradient Descent
 We run batch gradient descent and see what happens

 %% Cell type:code id: tags:

 ``` python
 alpha = 1e-2
 num_epochs = 50
 batch_size = len(X_train_autoscout)

 bias, thetas, hist_autoscout_batch = fit(X_train_autoscout, y_train_autoscout, alpha, num_epochs, batch_size)
 plot_validation_curves(hist_autoscout_batch)
 ```

 %% Cell type:markdown id: tags:

 We can notice the following:
 * The training did not converge after those 50 epochs. We would need more epochs.
 * The training cost is strictly decreasing as we take all training samples per step

 %% Cell type:markdown id: tags:

 #### Minibatch Gradient Descnet
 Let's compare it to minibatch gradient descent with a `batch_size` of 100.

 %% Cell type:code id: tags:

 ``` python
 alpha = 1e-2
 num_epochs = 50
 batch_size = 100

 bias, thetas, hist_autoscout_minibatch = fit(X_train_autoscout, y_train_autoscout, alpha, num_epochs, batch_size)
 plot_validation_curves(hist_autoscout_minibatch)
 ```

 %% Cell type:markdown id: tags:

 * As we are taking only a subset of our data when applying gradient descent, the training cost are not stricly decreasing anymore.
 * We do not need as many epochs as before as we are doing much more updates now.

 %% Cell type:markdown id: tags:

 ### Calculate the performance on the test set

 %% Cell type:code id: tags:

 ``` python
 y_pred_autoscout = predict(X_test_autoscout, bias, thetas)
 r2 = r2_score(y_test_autoscout, y_pred_autoscout)
 print("R2:", r2)
 ```
-
-%% Cell type:markdown id: tags:
-
-### Answer the ILIAS Quiz
-
-%% Cell type:markdown id: tags:
-
-> Now that you have implemented the gradient descent algorithm from scratch, you're ready to answer the ILIAS Quiz **Gradient Descent**.
-
-%% Cell type:markdown id: tags:
-
-**Remove solution**
-
-%% Cell type:code id: tags:
-
-``` python
-def gradient(X, y, theta0, theta1):
-    y_pred = predict(X, theta0, theta1)
-    diff = y_pred - y
-
-    n = len(X)
-    grad_theta0 = np.sum(diff) / n
-    grad_theta1 = np.dot(diff, X.T) / n
-    return grad_theta0, grad_theta1
-```

--- a/notebooks/05A Classification/Logistic Regression.ipynb
+++ b/notebooks/05A Classification/Logistic Regression.ipynb
 %% Cell type:markdown id: tags:

 # Logistic Regression

 %% Cell type:code id: tags:

 ``` python
 %matplotlib inline

 import matplotlib
 import matplotlib.pyplot as plt
 import numpy as np
 from sklearn.linear_model import LogisticRegression
 import math
 import pandas as pd
 from sklearn.model_selection import train_test_split
 from sklearn.preprocessing import RobustScaler
 import sklearn
 from collections import defaultdict
 from tqdm.notebook import tqdm
 from ipywidgets import interact
 import seaborn as sns

 from sklearn.metrics import accuracy_score, f1_score, confusion_matrix
 from sklearn.dummy import DummyClassifier

 from warnings import simplefilter
 simplefilter(action='ignore', category=FutureWarning)
 ```

 %% Cell type:markdown id: tags:

 ## Part 1 - 1D Toy example
 Consider the case where random numbers are created by two different Gaussian distributions with identical variance. We also know the labels from which distribution each number was originating from. The generated data could, for example, represent how many days a student has learned for the ML exam and the target variable is if they have passed.

 %% Cell type:code id: tags:

 ``` python
 students_passed = np.random.normal(5,0.7,100)
 students_passed[1:20]
 ```

 %% Cell type:code id: tags:

 ``` python
 students_failed = np.random.normal(2,0.7,100)
 students_failed[1:20]
 ```

 %% Cell type:markdown id: tags:

 To use this data for a logistic regression model, we combine the vectors $\text{students_passed}$ and $\text{students_failed}$ into a vector $X$ and create the corresponding labels $y$.

 %% Cell type:code id: tags:

 ``` python
 # label: failed
 zeros = [0]*100
 # label: passed
 ones = [1]*100

 X = np.concatenate((students_passed, students_failed))
 y = np.concatenate((ones, zeros))
 ```

 %% Cell type:markdown id: tags:

 We plot both type of points in a scatter plot, where the points generated by the first distribution are plotted in blue have the label $y=0$, while the points of the second distribution are plotted in orange at $y=1$.

 %% Cell type:code id: tags:

 ``` python
 legend_map = {0: 'failed', 1: 'passed'}
 ax = sns.scatterplot(X, y, hue=pd.Series(y).map(legend_map))
 ax.set_xlabel('days spent learning for the ML exam')
 ax.set_ylabel('if students passed')
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 Now we would like to determine, if an arbitrary previously unseen point belongs rather to distribution 1 or two distribution 2. For that, we want to employ logistic regression. Similar to linear regression, we first consider a model with a single independent variable and two parameters $\theta_0$ and $\theta_1$.

 The probability, that $x$ belongs to either of the two classes is determined using the sigmoid function.

 $$
  \sigma(x) = \frac{1}{1+e^{-(\theta_0 + \theta_1x)}}
 $$

 %% Cell type:markdown id: tags:

 > Implement the sigmoid function

 %% Cell type:code id: tags:

 ``` python
 def sigmoid(z):
    # START YOUR CODE

    # END YOUR CODE
    return s
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def sigmoid(z):
    s = 1 / (1 + np.exp(-z))
    return s
 ```

 %% Cell type:code id: tags:

 ``` python
 x = np.linspace(-8, 8)
 plt.plot(x, sigmoid(x))
 ```

 %% Cell type:markdown id: tags:

 **Programming Assisgnment - Verifciation on Ilias**

 %% Cell type:markdown id: tags:

 > Implement the `predict` function. On **Ilias**, report the i=7 entry of y_pred below and check if your implementation is correct.

 %% Cell type:code id: tags:

 ``` python
 def predict(X, theta0, theta1):
    # START YOUR CODE

    # END YOUR CODE
    return y_pred
 ```

-%% Cell type:markdown id: tags:
-
-*Click on the dots to display the solution*
-
-%% Cell type:code id: tags:
-
-``` python
-```
-
 %% Cell type:code id: tags:

 ``` python
 theta0 = 1.0
 theta1 = 1.0

 y_pred = predict(X, theta0, theta1)
 y_pred[0:20]
 ```

-%% Cell type:code id: tags:
-
-``` python
-y_pred[7]
-```
-
-%% Cell type:code id: tags:
+%% Cell type:markdown id: tags:

-``` python
-```
+**The value below is the answer for the Ilias Quiz "05A Supervised Learning: Classification"**

 %% Cell type:code id: tags:

 ``` python
+y_pred[7]
 ```

 %% Cell type:markdown id: tags:

 ### Visualize Decision Boundary
 The decision boundary is given by the x such that: $-\theta_0-\theta_1 x=0$.

 We can solve this equation for x: $x=-\frac{\theta_0}{\theta_1}$
 Now let us plot the decision boundary and the logistic function

 %% Cell type:code id: tags:

 ``` python
 def plot_decision_boundary(X, theta0, theta1, ax=None):
    if ax is None:
        fig, ax = plt.subplots()
    x = np.arange(X.min()-1, X.max()+1, 0.01).reshape(-1,1)
    y_pred = predict(x, theta0, theta1)
    ax.plot(x, y_pred, color="r")
    ax.axvline(-theta0/theta1, color="g")
    ax.set_title("Decision Boundary")

 legend_map = {0: 'failed', 1: 'passed'}
 ax = sns.scatterplot(X, y, hue=pd.Series(y).map(legend_map))
 ax.set_xlabel('days spent learning for the ML exam')
 ax.set_ylabel('if students passed')
 plot_decision_boundary(X, theta0, theta1, ax)
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ### Cost function
 The cross-entropy cost function $J(\boldsymbol\theta)$, where $\boldsymbol\theta=\left(\theta_0,\theta_1\right)$ is given by

 $$
    J(\boldsymbol\theta) =
      - \frac{1}{n} \sum_{i=1}^n%
        \left[y_i\log h(\boldsymbol\theta,\mathbf{X_i})
            + (1-y_i)\log\left(
               1-h(\boldsymbol\theta,\mathbf{X_i})\right)\right]
 $$

 where $h(\boldsymbol\theta,\mathbf{X_i})=\sigma\left(\mathbf{X_i}^T\boldsymbol\theta\right)=\sigma\left(\theta_0+\theta_1 x\right)$ and $\sigma$ is the sigmoid function.

 %% Cell type:markdown id: tags:

 > Implement the cost function. Verify your code by running the next cell.

 %% Cell type:code id: tags:

 ``` python
 def cost_function(y, y_pred):
    # START YOUR CODE


    # END YOUR CODE
    return cost
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def cost_function(y, y_pred):
    n = y.shape[0]
    cost = -(1.0 / n) * np.sum(y * np.log(y_pred) + (1 - y) * np.log(1 - y_pred))
    return cost
 ```

 %% Cell type:markdown id: tags:

 If your code is correct, you should be able to run the following cell:

 %% Cell type:code id: tags:

 ``` python
 y_assert = np.array([1, 0, 0])
 y_pred = np.array([0.98, 0.2, 0.6])

 expected_cost = 0.38654566350196135
 actual_cost = cost_function(y_assert, y_pred)

 np.testing.assert_almost_equal(actual_cost, expected_cost, decimal=3)
 ```

 %% Cell type:markdown id: tags:

 ### Gradients
 For applying gradient descent, we define the gradient.

 %% Cell type:code id: tags:

 ``` python
 def gradient(X, y, theta0, theta1):
    y_pred = predict(X, theta0, theta1)
    diff = y_pred - y

    n = len(X)
    grad_theta0 = np.sum(diff) / n
    grad_theta1 = np.dot(diff, X.T) / n

    return grad_theta0, grad_theta1
 ```

 %% Cell type:markdown id: tags:

 ### Gradient Descent
 Now we are ready to determine the optimal values for the parameters $\theta_0$ and $\theta_1$ using the gradient descent algorithm.


 $$
 \mathbf{Repeat}\;\mathrm{(until}\;\mathrm{convergence)} \left\{\right.
  \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\\
      \boldsymbol\theta_{k+1} = \boldsymbol\theta_{k}-\alpha\frac{1}{n}\sum_{i=1}^n
          \left(h(\boldsymbol\theta_k,\mathbf{x}^{(i)})-y^{(i)}\right)\mathbf{x}^{(i)},
          \quad k = 0,\,1,\,2,\,3,\,\ldots,\mathtt{kmax}\\
    \left.\right\}\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad
 $$



 > Implement the `fit` function

 %% Cell type:code id: tags:

 ``` python
 def fit(X, y, alpha, num_epochs, display_every=10):
    theta0 = 0.0
    theta1 = np.random.randn()

    hist = defaultdict(list)
    for i in tqdm(range(num_epochs)):
        # START YOUR CODE
        # calculate gradients


        # update model parameters theta0 and theta1


        # calculate the current costs


        # END YOUR CODE
        grad_theta0, grad_theta1 = gradient(X, y, theta0, theta1)
        theta0 = theta0 - alpha * grad_theta0
        theta1 = theta1 - alpha * grad_theta1

        y_pred = predict(X, theta0, theta1)
        curr_cost = cost_function(y, y_pred)

        hist["cost"].append(curr_cost)
        hist["theta0"].append(theta0)
        hist["theta1"].append(theta1)

        if i == 0 or (i+1) % display_every == 0:
            print("Epoch {} -  cost: {}".format(i+1, curr_cost))

    return theta0, theta1, hist
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def fit(X, y, alpha, num_epochs, display_every=10):
    theta0 = 0.0
    theta1 = np.random.randn()

    hist = defaultdict(list)
    for i in tqdm(range(num_epochs)):
        # calculate gradients
        grad_theta0, grad_theta1 = gradient(X, y, theta0, theta1)

        # update model parameters theta0 and theta1
        theta0 = theta0 - alpha * grad_theta0
        theta1 = theta1 - alpha * grad_theta1

        # calculate the current costs
        y_pred = predict(X, theta0, theta1)
        curr_cost = cost_function(y, y_pred)

        hist["cost"].append(curr_cost)
        hist["theta0"].append(theta0)
        hist["theta1"].append(theta1)

        if i == 0 or (i+1) % display_every == 0:
            print("Epoch {} -  cost: {}".format(i+1, curr_cost))

    return theta0, theta1, hist
 ```

 %% Cell type:markdown id: tags:

 #### Plot validation curve
 We implement a function that allows us to plot the validation curve.

 %% Cell type:code id: tags:

 ``` python
 def plot_validation_curve(costs, ax=None):
    if ax is None:
        fig, ax = plt.subplots()
    ax.set_ylabel("Cost")
    ax.set_title("Validation Curve")
    ax.set_xlabel("Epochs")
    ax.plot(costs)
 ```

 %% Cell type:markdown id: tags:

 #### Run gradient descent algorithm
 Let's run the gradient descent algorithm!

 %% Cell type:code id: tags:

 ``` python
 alpha = 0.1
 num_epochs = 10000

 theta0, theta1, hist = fit(X, y, alpha, num_epochs, display_every=1000)

 fig, ax = plt.subplots(ncols=2, figsize=(15,4))

 # scatter plot
 legend_map = {0: 'failed', 1: 'passed'}
 ax[0] = sns.scatterplot(X, y, hue=pd.Series(y).map(legend_map), ax=ax[0])
 ax[0].set_xlabel('days spent learning for the ML exam')
 ax[0].set_ylabel('if students passed')
 plot_decision_boundary(X, theta0, theta1, ax[0])

 # validation curve
 plot_validation_curve(hist["cost"], ax=ax[1])
 ```

 %% Cell type:markdown id: tags:

 ### Visualize Learning
 Let's visualize how the decision boundary changes over time.

 %% Cell type:code id: tags:

 ``` python
 @interact(epoch=(0, len(hist["theta0"]), 100))
 def visualize_learning(epoch=100):
    legend_map = {0: 'failed', 1: 'passed'}
    ax = sns.scatterplot(X, y, hue=pd.Series(y).map(legend_map))
    ax.set_xlabel('days spent learning for the ML exam')
    ax.set_ylabel('if students passed')
    if epoch == 0:
        epoch += 1
    plot_decision_boundary(X, hist["theta0"][epoch-1], hist["theta1"][epoch-1], ax)
    plt.show()
 ```

 %% Cell type:markdown id: tags:

 ### Metrics
 Let's calculate the accuracy

 %% Cell type:code id: tags:

 ``` python
 y_pred = predict(X, theta0, theta1)
 y_pred[0:10]
 ```

 %% Cell type:markdown id: tags:

 We label a point as 1 if the predicted value is larger than 0.5

 %% Cell type:code id: tags:

 ``` python
 # y_pred_class = ...
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 y_pred_class = y_pred > 0.5
 ```

 %% Cell type:code id: tags:

 ``` python
 accuracy = accuracy_score(y, y_pred_class)

 print("Accuracy: ", accuracy)
 ```

 %% Cell type:markdown id: tags:

 ## Part 2 - Multiple Logistic Regression - Toy example
 In the second part, logistic regression is used in a 2D toy example. Here the data is loaded from a `.csv` file, but it was also generated artificially for illustration purposes. Here the data can, for example, correspond to
 * feature 1: days spent learning for the ML exam
 * feature 2: days spent working in the ML domain (prior experience)
 * target variable: if students have passed the exam

 %% Cell type:code id: tags:

 ``` python
 df = pd.read_csv("classification_data.csv", header=None)
 df.columns = ['days spent learning', 'prior experience', 'exam passed']
 df.head()
 ```

 %% Cell type:code id: tags:

 ``` python
 n = len(df)
 X_2d = df.values[:, 0:2]
 y_2d = df.values[:,2]
 ```

 %% Cell type:markdown id: tags:

 Split the data into training and test set.

 %% Cell type:code id: tags:

 ``` python
 X_train, X_test, y_train, y_test = train_test_split(X_2d, y_2d, test_size=0.1)

 print("X_train:", X_train.shape)
 print("y_train:", y_train.shape)
 ```

 %% Cell type:markdown id: tags:

 ### Predict function
 The first step is to modify our `predict` function to handle multiple thetas.

 %% Cell type:code id: tags:

 ``` python
 def predict(X, bias, thetas):
    # START YOUR CODE


    # END YOUR CODE
    return y_pred
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def predict(X, bias, thetas):
    z = bias + np.dot(X, thetas)
    y_pred = sigmoid(z)
    return y_pred
 ```

 %% Cell type:markdown id: tags:

 ### Gradient function
 Let's modify the `gradient` function.

 %% Cell type:code id: tags:

 ``` python
 def gradient(X, y, bias, thetas):
    # START YOUR CODE



    # END YOUR CODE
    return grad_bias, grad_thetas
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 def gradient(X, y, bias, thetas):
    y_pred = predict(X, bias, thetas)
    diff = y_pred - y

    n = len(X)
    grad_bias = np.sum(diff) / n
    grad_thetas = np.dot(diff, X) / n

    return grad_bias, grad_thetas
 ```

 %% Cell type:markdown id: tags:

 ### Gradient descent algorithm

 %% Cell type:code id: tags:

 ``` python
 def fit(X, y, alpha, num_epochs, display_every=100):
    bias = 0.0
    thetas = np.random.randn(*(1, X.shape[1])).reshape(-1)

    hist = defaultdict(list)
    for epoch in tqdm(range(1, num_epochs+1)):
        # calculate gradients
        grad_bias, grad_thetas = gradient(X, y, bias, thetas)

        # update model parameters
        bias = bias - alpha * grad_bias
        thetas = thetas - alpha * grad_thetas

        # calculate the current costs
        y_pred = predict(X, bias, thetas)
        curr_cost = cost_function(y, y_pred)

        hist["cost"].append(curr_cost)

        if epoch % display_every == 0:
            print("Epoch {} -  cost: {}".format(epoch, curr_cost))

    return bias, thetas, hist
 ```

 %% Cell type:markdown id: tags:

 ### Apply Gradient Descent
 > Apply the gradient descent algorithm. Use the learning rate 0.1. Plot the validation curve.

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 bias_2d, thetas_2d, hist_2d = fit(X_train, y_train, alpha=0.1, num_epochs=10000, display_every=1000)
 plot_validation_curve(hist_2d["cost"])
 ```

 %% Cell type:markdown id: tags:

 ### Plot decision boundary

 %% Cell type:code id: tags:

 ``` python
 print("decision boundary: %.3f + %.3f * x1 + %.3f * x2 = 0"
      % (bias_2d, thetas_2d[0], thetas_2d[1]))

 x1 = np.array(X_train[:,0].T)
 x2 = np.array(X_train[:,1].T)

 fig, ax = plt.subplots(1,1, figsize=(6,6))
 ax.set_xlabel('days spent learning')
 ax.set_ylabel('prior knowledge (in days)')
 color = ['blue' if l == 0 else 'green' for l in y_train]
 scat = ax.scatter(x1, x2, color=color)

 # inline function for decision boundary (unless vertical)
 y_ = lambda x: ((-1)*(bias_2d + thetas_2d[0]*x) / thetas_2d[1])

 def plot_line(y, data_pts):
    x_vals = [i for i in
              range(int(min(data_pts)-1),
                    int(max(data_pts))+2)]
    y_vals = [y(x) for x in x_vals]
    plt.plot(x_vals,y_vals, 'r')

 plot_line(y_, x1)
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ### Evaluation
 How should we evaluate our result? Of course this is highly dependent on both our original business problem and the data at hand. Questions such as
 * Does the evaluation result need to be explainable to management, without using formulas and technical terms?
 * Do we have a high class imbalance?
 * Are False Positives and False Negatives equally bad? Does one of the two incur a high cost for our business and needs to be avoided?
 * How do we rate the confidence? Do we want to penalise a classifier when it classifies a sample wrongly but is very sure of this result?

 We will look at the metrics Accuracy and F1-Score.

 %% Cell type:markdown id: tags:

 > Predict the data on the test set.

 %% Cell type:code id: tags:

 ``` python
 # y_pred = ...
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 y_pred = predict(X_test, bias_2d, thetas_2d)
 y_pred = (y_pred > 0.5).astype(int)
 y_pred
 ```

 %% Cell type:markdown id: tags:

 #### Confusion Matrix
 First we compute and plot the confusion matrix using the utility methods `compute_confusion_matrix` and `plot_confusion_matrix`.

 %% Cell type:code id: tags:

 ``` python
 def compute_confusion_matrix(true, pred):
    # number of classes
    K = len(np.unique(true))
    c_mat = np.zeros((K, K))

    for i in range(len(true)):
        c_mat[int(true[i])][int(pred[i])] += 1

    return c_mat

 def plot_confusion_matrix(cm):
    fig, (ax1) = plt.subplots(ncols=1, figsize=(5,5))
    sns.heatmap(cm,
                xticklabels=['True', 'False'],
                yticklabels=['True', 'False'],
                annot=True,ax=ax1,
                linewidths=.2,linecolor="Darkblue", cmap="Blues")
    plt.title('Confusion Matrix', fontsize=14)
    plt.show()
 ```

 %% Cell type:code id: tags:

 ``` python
 # cm = ...
 # plot...
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 cm = compute_confusion_matrix(y_test, y_pred)
 plot_confusion_matrix(cm)
 ```

 %% Cell type:markdown id: tags:

 > Finally calculate and print the accuracy and the f1 score.

 %% Cell type:code id: tags:

 ``` python
 def extract_scores(confusion_matrix):
    """
    Extracts the tp, tn, fp, fn from the
    confusion matrix.
    """
    # true positive
    tp = confusion_matrix[0][0]
    # true negative
    tn = confusion_matrix[1][1]
    # false positive
    fp = confusion_matrix[0][1]
    # false negative
    fn = confusion_matrix[1][0]

    return tp, tn, fp, fn

 def accuracy_score(confusion_matrix):
    """
    Computes the accuracy from a confusion matrix.
    """
    tp, tn, fp, fn = extract_scores(confusion_matrix)
    acc = (tp + tn)/np.sum(confusion_matrix)

    return acc

 def f1_score(confusion_matrix):
    """
    Computes the f1 score from a confusion matrix.
    """
    tp, tn, fp, fn = extract_scores(confusion_matrix)
    precision = tp/(tp+fp)
    recall = tp/(tp+fn)
    f1 = (2*precision*recall)/(precision+recall)

    return f1
 ```

 %% Cell type:code id: tags:

 ``` python
 # accuracy = ...
 # f1 = ...
 ```

 %% Cell type:markdown id: tags:

 *Click on the dots to display the solution*

 %% Cell type:code id: tags:

 ``` python
 accuracy = accuracy_score(cm)
 f1 = f1_score(cm)

 print ("test accuracy: %.2f" % accuracy)
 print ("test f1 score: %.2f" % f1)
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 ## Programming Assignment
-> Solve the following Programming assignment and check your solution in the Illias Quiz **Classification - Notebook Verification**.
+> Solve the following Programming assignment and check your solution in the Illias Quiz **05A Supervised Learning: Classification - Notebook Verification**.

 %% Cell type:markdown id: tags:

-In the previous examples you implemented Logistic Regression from scratch. Now you are going to repeat the calulcations using scikit-learn's implementation of Logistic Regression. Use the data of the Multiple Logistic Regression example.
+In the previous examples you implemented Logistic Regression from scratch. Now you are going to repeat the calulcations using scikit-learn's implementation of Logistic Regression. Use the data of the Multiple Logistic Regression example also using the identical train/test splits.

 %% Cell type:markdown id: tags:

 Train the Logistic Regression, what is the score on the test set?

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 What is the standard metric of the score function of scikitlearn's implementation of LogisticRegression?

 %% Cell type:code id: tags:

 ``` python
 ```

 %% Cell type:markdown id: tags:

 Check your answers in the Ilias Quiz.
-
-%% Cell type:markdown id: tags:
-
-**Solution needs to be removed later**
-
-%% Cell type:code id: tags:
-
-``` python
-from sklearn.linear_model import LogisticRegression
-```
-
-%% Cell type:code id: tags:
-
-``` python
-reg = LogisticRegression().fit(np.asarray(X_train), y_train)
-```
-
-%% Cell type:code id: tags:
-
-``` python
-reg
-```
-
-%% Cell type:code id: tags:
-
-``` python
-X_test
-```
-
-%% Cell type:code id: tags:
-
-``` python
-reg.predict(X_test)
-```
-
-%% Cell type:code id: tags:
-
-``` python
-reg.score(X_test, y_test)
-```
-
-%% Cell type:markdown id: tags:
-
-What is the standard metric of the score function of scikitlearn's implementation of LogisticRegression?
-
-%% Cell type:markdown id: tags:
-
-Accuracy
-
-%% Cell type:code id: tags:
-
-``` python
-```
-
-%% Cell type:markdown id: tags:
-
-**Solution of predict implementation**
-
-%% Cell type:code id: tags:
-
-``` python
-def predict(X, theta0, theta1):
-    z = theta0 +  theta1 * X
-    y_pred = sigmoid(z)
-    return y_pred
-```
-
-%% Cell type:markdown id: tags:
-
-Solution
-
-%% Cell type:markdown id: tags:
-
-0.99828623
-
-%% Cell type:code id: tags:
-
-``` python
-```