Commit ee310676 authored by Solange Emmenegger's avatar Solange Emmenegger
Browse files

Auto-saving for solange.emmenegger@hslu.ch on branch master from commit 98a315e6

parent 98a315e6
No related merge requests found
Showing with 32 additions and 34 deletions
+32 -34
%% Cell type:markdown id: tags:
# Introduction to Neural Networks
This tutorial is inspired by an excellent blog post from [Victor Zhou](https://victorzhou.com/blog/intro-to-neural-networks/). We encourage everyone to check out the post and as well Victor's blog for great and easy to understand resources.
**DISCLAIMER**:
>The code below is intended to be educational and straightforward, **not** optimal. Do not use this implementation for projects. A deeper understanding of neural networks is paramount for this course and can best be developed by implementing a neural network and its learning procedure from scratch. We try to leave all unnecessary complexity away, i.e. there is no error handling, output formatting, fancy visualization, etc. Just beautifully minimal code :-)
%% Cell type:code id: tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
```
%% Cell type:markdown id: tags:
# A single Neuron
![neuron.jpg](attachment:neuron.jpg)
%% Cell type:markdown id: tags:
### Activation Function
We take the [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function) as activation function $\varphi$ in this tutorial.
%% Cell type:code id: tags:
``` python
def sigmoid(x):
# START YOUR CODE
# END YOUR CODE
pass
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def sigmoid(x):
# activation function: f(x) = 1 / (1 + e^(-x))
return 1 / (1 + np.exp(-x))
```
%% Cell type:markdown id: tags:
### Implementation of a single Neuron
%% Cell type:code id: tags:
``` python
class Neuron:
def __init__(self, weights, bias):
# weights as vector, bias as number, weights and input have same length
self.weights = weights
self.bias = bias
def feedforward(self, inputs):
# weight inputs, add bias and apply the activation function
# START YOUR CODE
# END YOUR CODE
pass
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
class Neuron:
def __init__(self, weights, bias):
# weights as vector, bias as number, weights and input have same length
self.weights = weights
self.bias = bias
def feedforward(self, inputs):
# weight inputs, add bias and apply the activation function
total = np.dot(self.weights, inputs) + self.bias
return sigmoid(total)
```
%% Cell type:markdown id: tags:
For testing, let us define a two-input neuron with weights $w = [w_1, w_2] = [4,5]$ and bias $b = 2$. We take input $x = [x_1, x_2] = [2,3]$.
%% Cell type:code id: tags:
``` python
# Define the neuron as specified above
#bias = ...
#weights = ...
#neuron = ...
# Run the feed-forward pass with the given input
#x = ...
#output_neuron = ...
#print("Result of the feed-forward pass: {:.6f}".format(output_neuron))
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
# Define the neuron as specified above
bias = 2
weights = np.asarray([4,5])
neuron = Neuron(weights, bias)
# Run the feed-forward pass with the given input
x = np.asarray([2,3])
output_neuron = neuron.feedforward(x)
print("Result of the feed-forward pass: {:.6f}".format(output_neuron))
```
%%%% Output: stream
Result of the feed-forward pass: 1.000000
%% Cell type:markdown id: tags:
# A not so Deep Neural Network
![deepnet.jpg](attachment:deepnet.jpg)
%% Cell type:code id: tags:
``` python
class FirstNeuralNetwork:
"""
Neural Network consisting of:
- 2 inputs (x1, x2)
- 1 hidden layer, with 2 Neurons (h1, h2)
- 1 output layer, with 1 Neuron (o1)
All three neurons have the same weight and bias for the moment.
This is really just for illustration. The more general case with trainable weights follows below.
"""
def __init__(self, weights, bias):
self.weights = weights
self.bias = bias
# define neurons of the hidden layer
self.h1 = Neuron(self.weights, self.bias)
self.h2 = Neuron(self.weights, self.bias)
# define neuron of the output layer
self.o1 = Neuron(self.weights, self.bias)
def feedforward(self, x):
# calculate the output of the hidden layer
out_h1 = self.h1.feedforward(x)
out_h2 = self.h2.feedforward(x)
# calculate the output of the neural network
out = self.o1.feedforward(np.asarray([out_h1, out_h2]))
return out
```
%% Cell type:markdown id: tags:
For testing, let us define weights $w = [w_1, w_2] = [0,1]$ and bias $b = 0$. We take input $x = [x_1, x_2] = [2,3]$.
%% Cell type:code id: tags:
``` python
# Define the neural network as specified above
#bias = ...
#weights = ...
#network = ...
# Run the feed-forward pass through the network
#x = ...
#output_nn = ...
#print("Result of the feed-forward pass: {:.6}".format(output_nn))
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
# Define the neural network as specified above
bias = 0
weights = np.asarray([0,1])
network = FirstNeuralNetwork(weights, bias)
# Run the feed-forward pass through the network
x = np.asarray([2,3])
output_nn = network.feedforward(x)
print("Result of the feed-forward pass: {:.6}".format(output_nn))
```
%%%% Output: stream
Result of the feed-forward pass: 0.721633
%% Cell type:markdown id: tags:
# Training a Neural Network
We implement back-propagation to determine weights and bias values that optimize some loss or cost function.
%% Cell type:markdown id: tags:
### Mean Squared Erros as Loss Function
%% Cell type:code id: tags:
``` python
def mse_loss(y_true, y_pred):
# START YOUR CODE
# END YOUR CODE
pass
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def mse_loss(y_true, y_pred):
# y_true and y_pred are arrays of the same length
return ((y_true - y_pred) ** 2).mean()
```
%% Cell type:code id: tags:
``` python
y_true = np.array([1, 0, 0, 1])
y_pred = np.array([0, 0, 1, 0])
print("Calculated loss: {}".format(mse_loss(y_true, y_pred)))
```
%% Cell type:markdown id: tags:
### Derivative of the Activation Function
%% Cell type:code id: tags:
``` python
def deriv_sigmoid(x):
# START YOUR CODE
# END YOUR CODE
pass
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
def deriv_sigmoid(x):
# derivative of the sigmoid: f'(x) = f(x) * (1 - f(x))
fx = sigmoid(x)
return fx * (1 - fx)
```
%% Cell type:markdown id: tags:
### Trainable Neural Network
%% Cell type:code id: tags:
``` python
class FullNeuralNetwork:
"""
Neural Network consisting of:
- 2 inputs (x1, x2)
- 1 hidden layer, with 2 neurons (h1, h2)
- 1 output layer, with 1 neuron (o1)
"""
def __init__(self):
raise NotImplementedError()
def feedforward(self, x):
raise NotImplementedError()
def train(self, data, labels):
raise NotImplementedError()
```
%% Cell type:markdown id: tags:
We implement the `__init__()` function by setting all parameters (weights and biases) to random values.
%% Cell type:code id: tags:
``` python
class FullNeuralNetwork(FullNeuralNetwork):
def __init__(self):
# Define the weights in the network
self.w1 = np.random.normal()
self.w2 = np.random.normal()
self.w3 = np.random.normal()
self.w4 = np.random.normal()
self.w5 = np.random.normal()
self.w6 = np.random.normal()
# Define the biases in the network
self.b1 = np.random.normal()
self.b2 = np.random.normal()
self.b3 = np.random.normal()
# Step-size in gradient descent as hyperparameter
self.learn_rate = 0.1
# Number of loops over the entire dataset as hyperparameter
self.epochs = 1000
```
%% Cell type:markdown id: tags:
We implement the `feedforward()` function just like above.
%% Cell type:code id: tags:
``` python
class FullNeuralNetwork(FullNeuralNetwork):
def feedforward(self, x):
# START YOUR CODE
# END YOUR CODE
pass
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
class FullNeuralNetwork(FullNeuralNetwork):
def feedforward(self, x):
# Output of the hidden layers (h1, h2)
out_h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)
out_h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)
# Output of the neural network (o1)
out_o1 = sigmoid(self.w5 * out_h1 + self.w6 * out_h2 + self.b3)
return out_o1
```
%% Cell type:markdown id: tags:
Training with stochastic gradient descent and back-propagation is split into three parts. For each sample in the dataset:
1. Run a forward pass through the network and save the intermediate results
2. Calculate the partial derivates
3. Update the weights and biases
But first, we will look at how we can calculate partial derivates and use them to perform back-propagation.
Imagine we wanted to tweak $w_1$. How would the loss $L$ change if we changed $w_1$? That’s a question the partial derivative $\frac{\partial L}{\partial w_1}$ can answer. So now comes the next question, but how do we calculate it?
To start, let’s rewrite the partial derivative in terms of $\frac{\partial y_{pred}}{\partial w_1}$ instead, using the [Chain Rule](https://en.wikipedia.org/wiki/Chain_rule):
$$\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial y_{pred}} * \frac{\partial y_{pred}}{\partial w_1}$$
We can calculate $\frac{\partial L}{\partial y_{pred}}$ because our loss function (the mean squared error, MSE) is $L = (1 - y_{pred})^2$:
$$\frac{\partial L}{\partial y_{pred}} = \frac{\partial (1 - y_{pred})^2}{\partial y_{pred}} = -2(1 - y_{pred})$$
Now, let’s figure out what to do with $\frac{\partial y_{pred}}{\partial w_1}$. Let $h_1, h_2, o_1$ be the outputs of the neurons they represent and $f$ is the sigmoid activation function. Then
$$y_{pred} = o_1 = f(w_5h_1 + w_6h_2 + b_3)$$
And since $w_1$ only affects $h_1$ (not $h_2$), we can write
$$\frac{\partial y_{pred}}{\partial w_1} = \frac{\partial y_{pred}}{\partial h_1} * \frac{\partial h_1}{\partial w_1}$$
$$\frac{\partial y_{pred}}{\partial h_1} = w_5 * f'(w_5h_1 + w_6h_2 + b_3)$$
We do the same thing for $\frac{\partial h_1}{\partial w_1}$:
$$h_1 = f(w_1x_1 + w_3x_2 + b_1)$$
$$\frac{\partial h_1}{\partial w_1} = x_1 * f'(w_1x_1 + w_3x_2 + b_1)$$
This is the second time we’ve seen $f'(x)$ (the derivate of the sigmoid function), let’s derive it:
$$ f(x) = \frac{1}{1 + e^{-x}}$$
$$f'(x) = \frac{e^{-x}}{(1 + e^{-x})^2} = f(x) * (1 - f(x))$$
We’re done! We’ve managed to break down $\frac{\partial L}{\partial w_1}$ into several parts we can calculate:
$$\boxed{\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial y_{pred}} * \frac{\partial y_{pred}}{\partial h_1} * \frac{\partial h_1}{\partial w_1}}$$
%% Cell type:code id: tags:
``` python
class FullNeuralNetwork(FullNeuralNetwork):
def train(self, data, labels):
"""
- data: array of size (n x 2), n = number of samples in the dataset, 2 input features
- labels: numpy array with n elements, one label per sample
"""
history_loss = []
# Perform several loops over the entire training set (epochs)
for epoch in range(self.epochs):
# Loop over the entire dataset
for x, y_true in zip(data, labels):
# 1. Forward pass through the network
sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
out_h1 = sigmoid(sum_h1)
sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
out_h2 = sigmoid(sum_h2)
sum_o1 = self.w5 * out_h1 + self.w6 * out_h2 + self.b3
out_o1 = sigmoid(sum_o1) # output of the neuron (prediction)
# 2. Calculate the partial derivates.
# Naming: d_L_d_w1 stands for derivative of L with respect to w1
# Derivative of loss with respect to neuron o1
d_L_d_o1 = -2 * (y_true - out_o1)
# For neuron o1
d_o1_d_w5 = out_h1 * deriv_sigmoid(sum_o1)
d_o1_d_w6 = out_h2 * deriv_sigmoid(sum_o1)
d_o1_d_b3 = deriv_sigmoid(sum_o1)
d_o1_d_h1 = self.w5 * deriv_sigmoid(sum_o1)
d_o1_d_h2 = self.w6 * deriv_sigmoid(sum_o1)
# For neuron h1
d_h1_d_w1 = x[0] * deriv_sigmoid(sum_h1)
d_h1_d_w2 = x[1] * deriv_sigmoid(sum_h1)
d_h1_d_b1 = deriv_sigmoid(sum_h1)
# For neuron h2
d_h2_d_w3 = x[0] * deriv_sigmoid(sum_h2)
d_h2_d_w4 = x[1] * deriv_sigmoid(sum_h2)
d_h2_d_b2 = deriv_sigmoid(sum_h2)
# 3. Update weights and biases
# For neuron h1
self.w1 -= self.learn_rate * d_L_d_o1 * d_o1_d_h1 * d_h1_d_w1
self.w2 -= self.learn_rate * d_L_d_o1 * d_o1_d_h1 * d_h1_d_w2
self.b1 -= self.learn_rate * d_L_d_o1 * d_o1_d_h1 * d_h1_d_b1
# For neuron h2
self.w3 -= self.learn_rate * d_L_d_o1 * d_o1_d_h2 * d_h2_d_w3
self.w4 -= self.learn_rate * d_L_d_o1 * d_o1_d_h2 * d_h2_d_w4
self.b2 -= self.learn_rate * d_L_d_o1 * d_o1_d_h2 * d_h2_d_b2
# For neuron o1
self.w5 -= self.learn_rate * d_L_d_o1 * d_o1_d_w5
self.w6 -= self.learn_rate * d_L_d_o1 * d_o1_d_w6
self.b3 -= self.learn_rate * d_L_d_o1 * d_o1_d_b3
# At the end of each tenth epoch, memorize the loss
if epoch % 10 == 0:
# Prediction of the network for the dataset
y_preds = np.apply_along_axis(self.feedforward, axis=1, arr=data)
loss = mse_loss(y_true=labels, y_pred=y_preds)
history_loss.append(loss)
print("Epoch: {}, Loss: {:.6f}".format(epoch, loss))
return history_loss
```
%% Cell type:markdown id: tags:
### Example
Let us apply our neural network to a simple dataset. From weight ($x_1$) and height ($x_2$) of a person we want to predict the gender ($y$). Female is encoded as $1$, male as $0$. The dataset includes 4 people.
%% Cell type:code id: tags:
``` python
data = np.array([
[60, 165], # Alice
[72, 182], # Bob
[68, 177], # Charlie
[54, 152], # Diana
])
labels = np.array([
1, # Alice
0, # Bob
0, # Charlie
1, # Diana
])
# Mean of the weight and height
mean_weight = data[:, 0].mean()
mean_height = data[:, 1].mean()
# Apply mean shift to the dataset.
# Remember, one does not train a machine learning model without normalization.
data[:, 0] = (data[:, 0] - mean_weight).astype(int)
data[:, 1] = (data[:, 1] - mean_height).astype(int)
data
```
%% Cell type:markdown id: tags:
We now have all the building blocks and can train the model.
%% Cell type:code id: tags:
``` python
network = FullNeuralNetwork()
history_loss = network.train(data=data, labels=labels)
```
%% Cell type:markdown id: tags:
### Visualization of Loss during Training
We stored the loss value after 10 epochs in order to visualize its behavior.
%% Cell type:code id: tags:
``` python
plt.plot(history_loss)
plt.title("Loss value over epochs")
plt.xlabel("Epochs (in 10th)")
plt.ylabel("Loss value")
plt.show()
```
%% Cell type:markdown id: tags:
Finally, can now use our trained neural network for predicting the gender of yet unknown people from their weight and height information.
%% Cell type:code id: tags:
``` python
emily = np.array([58, 160]) # 58 kg, 160 cm
frank = np.array([70, 172]) # 70 kg, 172 cm
# predict
# ...
```
%% Cell type:markdown id: tags:
*Click on the dots to display the solution*
%% Cell type:code id: tags:
``` python
emily = np.array([58, 160]) # 58 kg, 160 cm
frank = np.array([70, 172]) # 70 kg, 172 cm
# Apply mean shift
emily = (emily - np.array([mean_weight, mean_height])).astype(int)
frank = (frank - np.array([mean_weight, mean_height])).astype(int)
print("Emily - Probability of being female is: {:0.3f}".format(network.feedforward(emily)))
print("Frank - Probability of being female is: {:0.3f}".format(network.feedforward(frank)))
```
%% Cell type:markdown id: tags:
# Assignment
%% Cell type:markdown id: tags:
>Now answer the Ilias Quiz 08B Backpropagation - Notebook Verification using this notebook.
%% Cell type:code id: tags:
``` python
```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment