# IOGS Machine Learning course - Neural Networks

## Objective

The objectives of this practical session:
* introduction to neural network library PyTorch
* the creation of a simple neural network for classification
* implementation or usage of metrics and visualization tools to evaluate the performance of the training

It includes the implementation of:
* the neural network
* the optimization loop
* the test and evaluation

In [None]:
# imports for the practical session
import numpy as np
import torch
import matplotlib.pyplot as plt
import torch
from tqdm import tqdm, tnrange

## Language and libraries

It is a first contact with [Pytorch](https://pytorch.org/), one of the mainly used Deep Learning frameworks (along with TensorFlow).

We use it as tensor library. Note that in this session, Numpy could be sufficient to do everything.


PyTorch implements a tensor library, mathematical functions, deep learning layers and utilities for designing and learning complex models.
While most of the following practical session could be coded using Numpy, we will focus on PyTorch for several reasons:
- we can switch to GPU computation if needed (parallel operations on GPU make computation more efficient, not needed for this course)
- a lot of optimizers, dataloaders, layers are already coded, which will (in the end, after familiarization with the library) fast coding for deep learning models
- coding with at least one deep learning framework is a common job/position requirement

The documentation of PyTorch is your friend [here](https://pytorch.org/docs/stable/index.html).

<font color='red'>Question</font>: Understand the two version of the same code, one using Numpy, the other using PyTorch

In [None]:
print("NUMPY ----")
data_np = np.ones((10,7), dtype=np.float32) # defintion of a Matrix shape (10,7) with ones inside
data_np[0] = 0 # set first line to 0
data_np[5:7, 2:3] = 5 # slicing to set the values
print(data_np)
print(np.tanh(data_np)) # call a function
data_np = np.expand_dims(data_np, axis=2) # adding a dimension
print(data_np.shape)
# convert from float to long int
data_np = data_np.astype(np.int64)
# convert from int to float
data_np = data_np.astype(np.float32)

print("PYTORCH ----")
data_th = torch.ones((10,7), dtype=torch.float)
data_th[0] = 0
data_th[5:7, 2:3] = 5
print(data_th)
print(torch.tanh(data_th)) # call a function
data_th = data_th.unsqueeze(2) # adding a dimension
print(data_th.shape)
# convert from float to int
data_th = data_th.long()
# convert from int to float
data_th = data_th.float()

# conversion from PyTorch to Numpy
data = data_th.numpy()

# covnersion from Numpy to PyTorch
data = torch.tensor(data_np)

## Data generation functions

We will create three function to generate different type of data.
The data is two numpy arrays, type `np.float`: the input points of shape `(npts, dim)`, and $y$ the labels for each point, shape `(npts, 1)`.

We will generate random points in $[-1,1] \times [-1,1]$.

1. **Linear classification**: $y = 1$ if $x_1 + x_2 > 0$, $y=0$ otherwise
2. **Bar classification**: $y==1$ if $0.25<x<0.75$
2. **Donut classification**: $y = 1$ if $0.09<(x_1)^2 + (x_2)^2 < 0.49$, $y=0$ otherwise

<font color='red'>Question</font> convert the commented numpy function for linear data to PyTorch.

**Note:** visualization code is provided (and commented), you can uncomment the code to assess your code is good.

In [None]:
# linear
# def generate_data_linear(npts, dim=2):
#     data = np.random.rand(npts, dim)
#     target = ((data[:,0] + data[:,1]) > 1).astype(int)
#     return data, target

def generate_data_linear(npts, dim=2): # Torch version
    # code goes here
    pass

# visualization function
def visualization_simple(data, target):
    data_np = data.numpy()
    target_np = target.numpy()
    fig = plt.figure()
    fig.set_size_inches(5, 5)
    plt.scatter(data_np[:,0], data_np[:,1], c=target_np)

# visualization of groundtruth and predictions
def visualization_gt_pred(data, predictions, target):
    data_np = data.numpy()
    pred_np = predictions.numpy()
    target_np = target.numpy()
    fig, (ax1, ax2) = plt.subplots(1,2)
    fig.set_size_inches(11, 5)
    ax1.scatter(data_np[:,0], data_np[:,1], c=target_np)
    ax2.scatter(data_np[:,0], data_np[:,1], c=pred_np)

# visu linear
# data, target = generate_data_linear(1000)
# visualization_simple(data, target)

<font color='red'>Question</font>: fill the functions for bar labels and donut labels, generate data and display.

In [None]:
# Question: generete data bar
def generate_data_bar(npts, dim=2):
    # code goes here
    pass

# Question: generate data donut
def generate_data_donut(npts, dim=2):
    # code goes here
    pass

# visu bar
# code goes here

# visu donut
# code goes here


## Network definition

<font color='red'>Question</font> study the following code and comment it.

<font color='red'>Question</font> run the code and comment the output.

In [None]:
# Definition of the network
class NetworkExample(torch.nn.Module):

    def __init__(self) -> None:
        super().__init__()
        self.l1 = torch.nn.Linear(2,1)

    def forward(self, x):

        y = self.l1(x)

        return y


train_pts, train_labels = generate_data_linear(5000)
test_pts, test_labels = generate_data_linear(1000)

num_epoch = 10

net = NetworkExample()

lr = 1e-2
optimizer = torch.optim.SGD(net.parameters(), lr)

criterion = torch.nn.BCEWithLogitsLoss()

losses = []
iterations = []
total_iter_counter = 0
for epoch in range(num_epoch):

    net.train()

    t = tqdm(torch.randperm(train_pts.shape[0]))

    total_loss = 0

    epoch_iter_counter = 0

    for i in t:
        
        x = train_pts[i].reshape(1, 2)
        target = train_labels[i].reshape(1,1).float()
        
        y2 = net(x)

        loss = criterion(y2, target)

        optimizer.zero_grad()

        loss.backward()

        optimizer.step()
        
        total_loss += loss.item()
        epoch_iter_counter += 1

        t.set_description_str(f"Loss={total_loss/epoch_iter_counter:.4e}")

        total_iter_counter += 1
    
    total_loss /= train_pts.shape[0]
    losses.append(total_loss)
    iterations.append(total_iter_counter)

plt.figure()
plt.plot(iterations, losses)

net.eval()
with torch.no_grad():
    predicted_logits = net(test_pts)
    predicted_probas = torch.sigmoid(predicted_logits)
    predicted_labels = (predicted_probas > 0.5).long()

visualization_gt_pred(test_pts, predicted_labels, test_labels)


## Different data


<font color='red'>Question</font>: copy paste the previous code and change to bar data generation.

In [None]:
# copy code here


<font color='red'>Question</font>: comment the results, explain

*Answer*: (double-click on the cell to edit)

<font color='red'> Question </font>: Copy paste the previous code, and change the network definition to a three-linear-layer network. Activation function (between linear layers) are hyperbolic tangent functions. The parameters of the network are the input size, the hidden size, and the output size. Train the newly defined network with 1) the bar data, 2) the donut data.
Comments on the choice of parameters.

In [None]:
# copy code here

# Batch size

<font color='red'>Question</font>: copy-paste previous and modify the code to use a mini-batch of size 16 instead of a single point.

In [None]:
# copy code here

### Adding a momentum

Depending on the batch size, the gradient descent may be unstable. 

One solution is to increase the batch size (see previous) but depending on the problem it may not be always possible.

Another approach is to use **momentum** during the optimization. 

$$v_t = \gamma v_{t-1} + (1-\gamma) \Delta w $$
$$w_t = w_{t-1} + \alpha v_t $$

<font color='red'>Question</font>: what is the intuition behing momentum?

*answer here (double click on the cell)*

<font color='red'>Question</font>: copy-paste previous and modify the code to use momentum. In PyTorch, it is an optimizer parameter.

In [None]:
# copy code here

### Influence of the batch and momentum

<font color='red'>Question</font>: plot the loss curves for mini-batch 1, mini-batch 16 and mini-batch 16 with momentum. Comment the results.

In [None]:
# code here

## Multi-label classification

<font color='red'>Question</font>: we give a new data generation code for multilabel classification. Copy-paste the previous code and adapt it.

<font color='red'>Question</font>: comment the results with respect with previous binary classification results/architecture/parameters.

In [None]:
def generate_data_multi_class(npts, n_classes):
    data = torch.rand((npts, 2))*2-1
    target = torch.zeros(npts, dtype=torch.long)
    for i in range(n_classes-1):
        mask = torch.logical_and(
                ((data[:,0])**2 + (data[:,1])**2) < (i+1)/(n_classes),
                ((data[:,0])**2 + (data[:,1])**2) > i/(n_classes)
                )
        target[mask] = i+1
        
    return data, target

data, target = generate_data_multi_class(1000, 4)
visualization_simple(data, target)

In [None]:
# copy code here