Although linear regression is a linear machine learning method, you can have nonlinear dependencies if you transform some of the independent variables by a nonlinear function. By doing this, you can improve the fit of your method. Let us demonstrate this on a house price dataset from [Kaggle](https://www.kaggle.com/harlfoxem/housesalesprediction). Note that this dataset is not identical with one you used in the linear regression exercise, since the this dataset is too small and would cause unreliable evaluation results.
%% Cell type:code id: tags:
``` python
df_house=pd.read_csv("kc_house_data.csv")
df_house.head()
```
%% Cell type:markdown id: tags:
We would like to have a simple linear regression problem with only one independent variable. Thus, we only keep *price* and *sqft_living*.
As we can see, by adding $x^2$ as additional independent variable we could slightly improve our performance.
%% Cell type:markdown id: tags:
Let's try if we can further improve our performance by adding more polynomial features. To generate our polynomial features we will use the Scikit-Learn function [PolynomialFeatures](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html).
What do you recognize when you increase the polynomial degree?
%% Cell type:markdown id: tags:
> Answer the question on ILIAS
%% Cell type:markdown id: tags:
## Regularization
%% Cell type:markdown id: tags:
The effect of overfitting can be reduced by regularization. Implement the regularized version of linear regression: $\Theta:=(X^{\top}X+\lambda \begin{bmatrix}
0 & 0 &\ldots&0 \\
0 & 1 & \\
\ldots & & \ddots & \\
0& & & 1
\end{bmatrix} )^{-1}(X^{\top}y)$
%% Cell type:code id: tags:
``` python
deffit_reg(X,y,lam):
# START YOUR CODE
# END YOUR CODE
returnthetas
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
deffit_reg(X,y,lam):
Xt=np.transpose(X)
XtX=np.dot(Xt,X)
I=np.identity(XtX.shape[0])
I[0,0]=0
XtX=XtX+(lam*I)
XtXm1=np.linalg.inv(XtX)
Xty=np.dot(Xt,y)
thetas=np.dot(XtXm1,Xty)
returnthetas
```
%% Cell type:markdown id: tags:
You can check your implementation by executing the following cell:
> Solve the following Programming assignment and check your solution in the Illias Quiz **Linear Regression and Regularization - Notebook Verification**.
%% Cell type:markdown id: tags:
Before you implemented Linear Regression from Scratch in this Programming assignment you are asked to use the scikit-learn implementation of the Linear Regression. [Scikit-learn Documentation](https://scikit-learn.org/stable/). Use the same data as before.
Firstly, we demonstrate gradient descent on a simple linear regression problem with one dependent and one independent variable.
%% Cell type:code id: tags:
``` python
X=np.array([1,1,2,3,4,5,6,7,8,9,10,10])
y=np.array([1,2,3,1,4,5,6,4,7,10,15,9])
```
%% Cell type:markdown id: tags:
x and y values are plotted in a diagram.
%% Cell type:code id: tags:
``` python
plt.plot(X,y,'bo')
plt.show()
```
%% Cell type:markdown id: tags:
We then try to fit the points by a straight line.
%% Cell type:code id: tags:
``` python
theta0=-0.5
theta1=1
```
%% Cell type:code id: tags:
``` python
defpredict(X,theta0,theta1):
y_pred=theta0+theta1*X
returny_pred
y_pred=predict(X,theta0,theta1)
```
%% Cell type:code id: tags:
``` python
defplot_regression_line(X,theta0,theta1,ax=None):
ifaxisNone:
fig,ax=plt.subplots()
x=np.arange(X.min()-1,X.max()+1,1).reshape(-1,1)
y_pred=predict(x,theta0,theta1)
ax.plot(x,y_pred,color="r")
ax=sns.scatterplot(X,y)
plot_regression_line(X,theta0,theta1,ax)
plt.show()
```
%% Cell type:markdown id: tags:
This does not look so bad. Let's implement a gradient descent algorithm to do this automatically.
%% Cell type:markdown id: tags:
### Cost function
We define a cost function that determines the mean squared error of the predicted and the actual y coordinates. To get rid of the factor 2 in the gradient
formula, we divide the sum by 2.
%% Cell type:markdown id: tags:
> Implement the MSE cost function
%% Cell type:code id: tags:
``` python
defcost(y,y_pred):
# START YOUR CODE
# END YOUR CODE
returncost
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
defcost(y,y_pred):
cost=np.sum((y_pred-y)**2)/(2*len(y))
returncost
```
%% Cell type:code id: tags:
``` python
cost(y,y_pred)
```
%% Cell type:markdown id: tags:
### Calculate gradient
Next, let us determine the gradient of y in respect to the parameters.
%% Cell type:markdown id: tags:
**Programming Assignment - Verification on Ilias**
%% Cell type:markdown id: tags:
> Implement the `gradient` function
%% Cell type:code id: tags:
``` python
defgradient(X,y,theta0,theta1):
# START YOUR CODE
# END YOUR CODE
returngrad_theta0,grad_theta1
```
%% Cell type:markdown id: tags:
*Hint: Carefully look at the definition of the cost function of Linear Regression, to calculate the gradient & take care of dimensions*
%% Cell type:markdown id: tags:
**Report the value of the gradients on Ilias**
**Report the value of the gradients in the Ilias Quiz 04B Notebook Verification**
%% Cell type:code id: tags:
``` python
gradient(X,y,theta0,theta1)
```
%% Cell type:markdown id: tags:
### Batch Gradient descent
%% Cell type:markdown id: tags:
> Now complete the `fit` function by iteratively updating our model parameters.
To visualize how the parameters and cost functions change with each epoch, we store them in a dictionary.
We can now visualize the learning process by plotting the validation curve. The validation curve shows how the cost decreases by increasing number of epochs.
%% Cell type:code id: tags:
``` python
defplot_validation_curve(data,ax=None,ylim=None):
ifaxisNone:
fig,ax=plt.subplots()
ax.set_title("Validation Curve")
ax.set_ylabel("Cost")
ifylimisnotNone:
ax.set_ylim(ylim)
ax.set_xlabel("Epochs")
ax.plot(data)
plot_validation_curve(hist["cost"])
```
%% Cell type:markdown id: tags:
Using our history, we can now visualize how the parameters change by each epoch.
> z-normalise the training and test data by using the [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
We extend our code for multiple linear regression. We will use the autoscout dataset from the previous exercises. First we apply the data cleaning and then z-Normalise our data.
We modify our predict function that instead of providing $\theta_0$ and $\theta_1$ we now provide the bias ($\theta_0$) and the other parameters $\Theta$ as an array.
> Implement the `predict` function
%% Cell type:code id: tags:
``` python
def predict(X, bias, thetas):
# START YOUR CODE
# END YOUR CODE
return y_pred
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def predict(X, bias, thetas):
y_pred = bias + np.dot(X, thetas)
return y_pred
```
%% Cell type:markdown id: tags:
> Implement the `gradient` function
%% Cell type:code id: tags:
``` python
def gradient(X, y, bias, thetas):
# START YOUR CODE
# END YOUR CODE
return grad_bias, grad_thetas
```
%% Cell type:markdown id: tags:
Click on the dots to display the solution
%% Cell type:code id: tags:
``` python
def gradient(X, y, bias, thetas):
y_pred = predict(X, bias, thetas)
diff = y_pred - y
n = len(X)
grad_bias = np.sum(diff) / n
grad_thetas = np.dot(diff, X) / n
return grad_bias, grad_thetas
```
%% Cell type:markdown id: tags:
We extend our `fit` function by tracking not only the cost but also the $R^2$ score.
Compared to the previous exercise where we calculated the estimates for the $\Theta$ numerically using the normal equation we got almost the same result with the gradient descent algorithm.
%% Cell type:markdown id: tags:
### Minibatch Gradient Descent
%% Cell type:markdown id: tags:
> Now modify our `fit` function to use mini batch gradient descent. So instead of calculating the gradient on the whole dataset on each step, only use a subset of the data.
Wo have now introduced an additional hyperparameter `batch_size`.
* If we set `batch_size` equal to 1, we use Stochastic Gradient Desccent: We update our model parameters $\Theta$ for each training example.
* If we set `batch_size` equal to to the number of training samples we have again Batch Gradient Descent: We use all training samples to update the model parameters $\Theta$.
%% Cell type:markdown id: tags:
#### Batch Gradient Descent
We run batch gradient descent and see what happens
Consider the case where random numbers are created by two different Gaussian distributions with identical variance. We also know the labels from which distribution each number was originating from. The generated data could, for example, represent how many days a student has learned for the ML exam and the target variable is if they have passed.
%% Cell type:code id: tags:
``` python
students_passed=np.random.normal(5,0.7,100)
students_passed[1:20]
```
%% Cell type:code id: tags:
``` python
students_failed=np.random.normal(2,0.7,100)
students_failed[1:20]
```
%% Cell type:markdown id: tags:
To use this data for a logistic regression model, we combine the vectors $\text{students_passed}$ and $\text{students_failed}$ into a vector $X$ and create the corresponding labels $y$.
We plot both type of points in a scatter plot, where the points generated by the first distribution are plotted in blue have the label $y=0$, while the points of the second distribution are plotted in orange at $y=1$.
ax.set_xlabel('days spent learning for the ML exam')
ax.set_ylabel('if students passed')
plt.show()
```
%% Cell type:markdown id: tags:
Now we would like to determine, if an arbitrary previously unseen point belongs rather to distribution 1 or two distribution 2. For that, we want to employ logistic regression. Similar to linear regression, we first consider a model with a single independent variable and two parameters $\theta_0$ and $\theta_1$.
The probability, that $x$ belongs to either of the two classes is determined using the sigmoid function.
ax.set_xlabel('days spent learning for the ML exam')
ax.set_ylabel('if students passed')
plot_decision_boundary(X,theta0,theta1,ax)
plt.show()
```
%% Cell type:markdown id: tags:
### Cost function
The cross-entropy cost function $J(\boldsymbol\theta)$, where $\boldsymbol\theta=\left(\theta_0,\theta_1\right)$ is given by
$$
J(\boldsymbol\theta) =
-\frac{1}{n} \sum_{i=1}^n%
\left[y_i\log h(\boldsymbol\theta,\mathbf{X_i})
+ (1-y_i)\log\left(
1-h(\boldsymbol\theta,\mathbf{X_i})\right)\right]
$$
where $h(\boldsymbol\theta,\mathbf{X_i})=\sigma\left(\mathbf{X_i}^T\boldsymbol\theta\right)=\sigma\left(\theta_0+\theta_1 x\right)$ and $\sigma$ is the sigmoid function.
%% Cell type:markdown id: tags:
> Implement the cost function. Verify your code by running the next cell.
We label a point as 1 if the predicted value is larger than 0.5
%% Cell type:code id: tags:
``` python
# y_pred_class = ...
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
y_pred_class=y_pred>0.5
```
%% Cell type:code id: tags:
``` python
accuracy=accuracy_score(y,y_pred_class)
print("Accuracy: ",accuracy)
```
%% Cell type:markdown id: tags:
## Part 2 - Multiple Logistic Regression - Toy example
In the second part, logistic regression is used in a 2D toy example. Here the data is loaded from a `.csv` file, but it was also generated artificially for illustration purposes. Here the data can, for example, correspond to
* feature 1: days spent learning for the ML exam
* feature 2: days spent working in the ML domain (prior experience)
* target variable: if students have passed the exam
How should we evaluate our result? Of course this is highly dependent on both our original business problem and the data at hand. Questions such as
* Does the evaluation result need to be explainable to management, without using formulas and technical terms?
* Do we have a high class imbalance?
* Are False Positives and False Negatives equally bad? Does one of the two incur a high cost for our business and needs to be avoided?
* How do we rate the confidence? Do we want to penalise a classifier when it classifies a sample wrongly but is very sure of this result?
We will look at the metrics Accuracy and F1-Score.
%% Cell type:markdown id: tags:
> Predict the data on the test set.
%% Cell type:code id: tags:
``` python
# y_pred = ...
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
y_pred = predict(X_test, bias_2d, thetas_2d)
y_pred = (y_pred > 0.5).astype(int)
y_pred
```
%% Cell type:markdown id: tags:
#### Confusion Matrix
First we compute and plot the confusion matrix using the utility methods `compute_confusion_matrix` and `plot_confusion_matrix`.
%% Cell type:code id: tags:
``` python
def compute_confusion_matrix(true, pred):
# number of classes
K = len(np.unique(true))
c_mat = np.zeros((K, K))
for i in range(len(true)):
c_mat[int(true[i])][int(pred[i])] += 1
return c_mat
def plot_confusion_matrix(cm):
fig, (ax1) = plt.subplots(ncols=1, figsize=(5,5))
sns.heatmap(cm,
xticklabels=['True', 'False'],
yticklabels=['True', 'False'],
annot=True,ax=ax1,
linewidths=.2,linecolor="Darkblue", cmap="Blues")
plt.title('Confusion Matrix', fontsize=14)
plt.show()
```
%% Cell type:code id: tags:
``` python
# cm = ...
# plot...
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
cm = compute_confusion_matrix(y_test, y_pred)
plot_confusion_matrix(cm)
```
%% Cell type:markdown id: tags:
> Finally calculate and print the accuracy and the f1 score.
%% Cell type:code id: tags:
``` python
def extract_scores(confusion_matrix):
"""
Extracts the tp, tn, fp, fn from the
confusion matrix.
"""
# true positive
tp = confusion_matrix[0][0]
# true negative
tn = confusion_matrix[1][1]
# false positive
fp = confusion_matrix[0][1]
# false negative
fn = confusion_matrix[1][0]
return tp, tn, fp, fn
def accuracy_score(confusion_matrix):
"""
Computes the accuracy from a confusion matrix.
"""
tp, tn, fp, fn = extract_scores(confusion_matrix)
acc = (tp + tn)/np.sum(confusion_matrix)
return acc
def f1_score(confusion_matrix):
"""
Computes the f1 score from a confusion matrix.
"""
tp, tn, fp, fn = extract_scores(confusion_matrix)
precision = tp/(tp+fp)
recall = tp/(tp+fn)
f1 = (2*precision*recall)/(precision+recall)
return f1
```
%% Cell type:code id: tags:
``` python
# accuracy = ...
# f1 = ...
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
accuracy = accuracy_score(cm)
f1 = f1_score(cm)
print ("test accuracy: %.2f" % accuracy)
print ("test f1 score: %.2f" % f1)
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
## Programming Assignment
> Solve the following Programming assignment and check your solution in the Illias Quiz **Classification - Notebook Verification**.
> Solve the following Programming assignment and check your solution in the Illias Quiz **05A Supervised Learning: Classification - Notebook Verification**.
%% Cell type:markdown id: tags:
In the previous examples you implemented Logistic Regression from scratch. Now you are going to repeat the calulcations using scikit-learn's implementation of Logistic Regression. Use the data of the Multiple Logistic Regression example.
In the previous examples you implemented Logistic Regression from scratch. Now you are going to repeat the calulcations using scikit-learn's implementation of Logistic Regression. Use the data of the Multiple Logistic Regression example also using the identical train/test splits.
%% Cell type:markdown id: tags:
Train the Logistic Regression, what is the score on the test set?
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
What is the standard metric of the score function of scikitlearn's implementation of LogisticRegression?
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Check your answers in the Ilias Quiz.
%% Cell type:markdown id: tags:
**Solution needs to be removed later**
%% Cell type:code id: tags:
``` python
from sklearn.linear_model import LogisticRegression