This tutorial is inspired by an excellent blog post from [Victor Zhou](https://victorzhou.com/blog/intro-to-neural-networks/). We encourage everyone to check out the post and as well Victor's blog for great and easy to understand resources.
**DISCLAIMER**:
>The code below is intended to be educational and straightforward, **not** optimal. Do not use this implementation for projects. A deeper understanding of neural networks is paramount for this course and can best be developed by implementing a neural network and its learning procedure from scratch. We try to leave all unnecessary complexity away, i.e. there is no error handling, output formatting, fancy visualization, etc. Just beautifully minimal code :-)
Training with stochastic gradient descent and back-propagation is split into three parts. For each sample in the dataset:
1. Run a forward pass through the network and save the intermediate results
2. Calculate the partial derivates
3. Update the weights and biases
But first, we will look at how we can calculate partial derivates and use them to perform back-propagation.
Imagine we wanted to tweak $w_1$. How would the loss $L$ change if we changed $w_1$? That’s a question the partial derivative $\frac{\partial L}{\partial w_1}$ can answer. So now comes the next question, but how do we calculate it?
To start, let’s rewrite the partial derivative in terms of $\frac{\partial y_{pred}}{\partial w_1}$ instead, using the [Chain Rule](https://en.wikipedia.org/wiki/Chain_rule):
Now, let’s figure out what to do with $\frac{\partial y_{pred}}{\partial w_1}$. Let $h_1, h_2, o_1$ be the outputs of the neurons they represent and $f$ is the sigmoid activation function. Then
$$y_{pred} = o_1 = f(w_5h_1 + w_6h_2 + b_3)$$
And since $w_1$ only affects $h_1$ (not $h_2$), we can write
Let us apply our neural network to a simple dataset. From weight ($x_1$) and height ($x_2$) of a person we want to predict the gender ($y$). Female is encoded as $1$, male as $0$. The dataset includes 4 people.
%% Cell type:code id: tags:
``` python
data=np.array([
[60,165],# Alice
[72,182],# Bob
[68,177],# Charlie
[54,152],# Diana
])
labels=np.array([
1,# Alice
0,# Bob
0,# Charlie
1,# Diana
])
# Mean of the weight and height
mean_weight=data[:,0].mean()
mean_height=data[:,1].mean()
# Apply mean shift to the dataset.
# Remember, one does not train a machine learning model without normalization.
data[:,0]=(data[:,0]-mean_weight).astype(int)
data[:,1]=(data[:,1]-mean_height).astype(int)
data
```
%% Cell type:markdown id: tags:
We now have all the building blocks and can train the model.