{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"accelerator": "GPU"
},
"cells": [
{
"metadata": {
"id": "my7Myevmy3G0"
},
"cell_type": "markdown",
"source": [
"# IOGS - Machine learning and pattern recognition\n",
"\n",
"## Class objective\n",
"\n",
"The objective of this class is to build from scratch a convolutional neural network (CNN) using PyToch framework.\n",
"\n",
"To make the optimization efficient, will work with the GPU acceleration provided by Colab.\n",
"To set the GPU acceleration:\n",
"* Edit\n",
"* Notebook settings\n",
"* Hardware accelerator: **GPU**\n"
]
},
{
"metadata": {
"id": "lxzVjTgYUYdW"
},
"cell_type": "code",
"source": [
"from tqdm import tqdm # for progress bars\n",
"from sklearn.metrics import confusion_matrix\n",
"from IPython.display import clear_output\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Pytorch\n",
"import torch # deep learning framework\n",
"import torchvision\n",
"import torchvision.transforms as transforms\n",
"import torch.nn.functional as F\n",
"import torch.nn as nn"
],
"execution_count": null,
"outputs": []
},
{
"metadata": {
"id": "VPuamL2uKKM_"
},
"cell_type": "markdown",
"source": [
"## Data\n",
"\n",
"We will use the CIFAR10 dataset for image classification. It is one of the most used classification dataset for first experiences.\n",
"It is composed of 60000 images (50000 for training, 10000 for testing).\n",
"\n",
"### Dataloaders\n",
"\n",
"PyTorch and TorchVision offer classes for an easy data usage.\n",
"\n",
"The Dataset class, allow to apply data transformation such as normalization or data augmentation.\n",
"\n",
"For the CIFAR10 dataset, a specific dataset class is coded.\n",
"It will download the data if needed."
]
},
{
"metadata": {
"id": "-KLKxMqjXLnx"
},
"cell_type": "code",
"source": [
"# transformations for images (convert to pytorch tensor and center data)\n",
"transform = transforms.Compose(\n",
" [transforms.ToTensor(),\n",
" transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])\n",
"\n",
"# create the train dataset and test dataset\n",
"trainset = torchvision.datasets.CIFAR10(root=\".\", train=True,\n",
" download=True, transform=transform)\n",
"testset = torchvision.datasets.CIFAR10(root=\".\", train=False,\n",
" download=True, transform=transform)"
],
"execution_count": null,
"outputs": []
},
{
"metadata": {
"id": "zeGScW7iFvxl"
},
"cell_type": "markdown",
"source": [
"The datasets are then used in dataloaders.\n",
"The dataloaders are multithreaded and create the batches of data used for optimization at the given batch size.\n",
"They can shuffle the data, if specified."
]
},
{
"metadata": {
"id": "q68R3RWwFvxr"
},
"cell_type": "markdown",
"source": [
"## Classes\n",
"\n",
"CIFAR10 as 10 classes for classifications."
]
},
{
"metadata": {
"id": "ERvzJrjEFvxu"
},
"cell_type": "code",
"source": [
"\n",
"classes = ('plane', 'car', 'bird', 'cat',\n",
" 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')"
],
"execution_count": null,
"outputs": []
},
{
"metadata": {
"id": "TIQL-wl3FvyA"
},
"cell_type": "markdown",
"source": [
"## Network definition\n",
"\n",
"The state of the art for image classification is convolutional neural networks (CNN).\n",
"These CNNs have become more and more complex and involve today a high number of convolutional layers.\n",
"In this class to keep the optimization possible in the time of the class, we will use a very simple network based on LeNet-5.\n",
"\n",
"Question Create a network with 3 convolutions and 2 linear layers.\n",
"\n",
"The network will be:\n",
"```\n",
"conv (3 -> 16, kernel 3x3)\n",
"relu \n",
"max_pooling\n",
"conv (16 -> 32, kernel 3x3)\n",
"relu \n",
"max_pooling\n",
"conv (32 -> 64, kernel 3x3)\n",
"relu \n",
"max_pooling\n",
"Linear (256 (64x2x2) -> 128)\n",
"relu\n",
"Linear (128 -> 10)\n",
"```\n",
"In the ```__init__``` function, you need to declare the layers with parameters (convolutional and linear layers).\n",
"In the ```forward``` function, you describe the information flow from the input (x) to the final output.\n",
"\n",
"The object with parameters are:\n",
"* ```nn.Conv2d(a,b,c)``` where ```a``` is the input channel number, ```b``` the output channel number and ```c``` the kernel size.\n",
"* ```nn.Linear(a,b)``` where ```a``` is the input size, ```b``` the output size\n",
"\n",
"Here are some useful functions:\n",
"* ```F.relu(a)```: apply a relu on ```a```\n",
"* ```F.max_pool2d(a,2)```: apply a max pooling of size 2 on ```a```\n",
"* ```b = a.view(a.size(0), -1)```: flattens ```a``` to be usable with linear layer (shoud be used between 2d operations and 1d operations)"
]
},
{
"metadata": {
"id": "pJ7PqoP4rPyJ"
},
"cell_type": "code",
"source": [
"# network class\n",
"class SimpleCNN(nn.Module):\n",
" \n",
" def __init__(self):\n",
" super().__init__()\n",
"\n",
" raise NotImplementedError\n",
" \n",
" def forward(self, x):\n",
" # layer after layer, feature computation\n",
" \n",
" raise NotImplementedError \n",
"\n",
"\n",
"# net_test = SimpleCNN()\n",
"# x = torch.rand(1, 3, 32, 32)\n",
"# print(x.shape)\n",
"# output = net_test(x)\n",
"# print(output.shape)\n"
],
"execution_count": null,
"outputs": []
},
{
"metadata": {
"id": "lJW94zotFvyJ"
},
"cell_type": "markdown",
"source": [
"## Metrics\n",
"\n",
"In this practical session, we will only use the global accuracy, defined by the number of correctly classified elements over the total number of sample.\n",
"\n",
"Question: define the accuracy function, from a confusion matrix."
]
},
{
"metadata": {
"id": "stQvG7UUFvyK"
},
"cell_type": "code",
"source": [
"def accuracy(cm):\n",
" raise NotImplementedError"
],
"execution_count": null,
"outputs": []
},
{
"metadata": {
"id": "W1E9tkFkFvyR"
},
"cell_type": "markdown",
"source": [
"## Main optimization loop\n",
"\n",
"This is the main loop for optimization.\n",
"It follows the same principles as the MLP of the previous class.\n",
"\n",
"#### Train\n",
"\n",
"Question: Set the gradients to zero\n",
"\n",
"Question: Compute the outputs\n",
"\n",
"Question: Compute the cross entropy loss\n",
"\n",
"Question: Call backward on the loss\n",
"\n",
"Question: Call step on the optimizer\n",
"\n",
"Question: Compute the predictions on the outputs (in \n",
"numpy format), it is the argmax of the prediction vector\n",
"\n",
"Question: Update the confusion matrix\n",
"\n",
"#### Test\n",
"Question: Compute the outputs\n",
"\n",
"Question: Compute the predictions on the outputs (in numpy format), it is the argmax of the prediction vector\n",
"\n",
"Question: Update the confusion matrix\n",
"\n",
"#### Display results\n",
"Question: Compute train and test accuracies, display them\n",
"\n",
"Question: Save these accuracy in the corresponding lists\n",
"\n",
"#### Misc\n",
"Do not forget to put your variables on the GPU with `variable.cuda()`"
]
},
{
"metadata": {
"id": "JB3GuOqn6OlQ"
},
"cell_type": "code",
"source": [
"# create the network\n",
"network = SimpleCNN()\n",
"\n",
"# put network weights on gpu\n",
"network.cuda()\n",
"\n",
"# create an optimizer\n",
"optimizer = torch.optim.Adam(network.parameters(), lr=1e-4)\n",
"\n",
"# epoch --> we sse each piece of data\n",
"max_epoch = 10\n",
"\n",
"# list for saving accuracies\n",
"train_accs = []\n",
"test_accs = []\n",
"\n",
"# train / test loaders\n",
"trainloader = torch.utils.data.DataLoader(trainset, batch_size=16, shuffle=True)\n",
"testloader = torch.utils.data.DataLoader(testset, batch_size=16, shuffle=False)\n",
"\n",
"# iterate over epochs\n",
"for epoch in range(max_epoch):\n",
" \n",
" # set the network in train mode\n",
" network.train()\n",
" \n",
" # create a zero confuction matrix\n",
" cm = np.zeros((10,10))\n",
" \n",
" # epoch loop\n",
" for inputs, targets in tqdm(trainloader, ncols=80, desc=\"Epoch {}\".format(epoch)):\n",
"\n",
" # put data on gpu\n",
" inputs = inputs.cuda()\n",
" targets = targets.cuda()\n",
"\n",
" # set the network to evaluatio mode\n",
" network.eval()\n",
" \n",
" # create the confusion matrix\n",
" cm_test = np.zeros((10,10))\n",
"\n",
" # tell not to reserve memory space for gradients (much faster)\n",
" with torch.no_grad():\n",
" for inputs, targets in tqdm(testloader, ncols=80, desc=\" Test {}\".format(epoch)):\n",
"\n",
" # put data on gpu\n",
" inputs = inputs.cuda()\n",
" targets = targets.cuda()\n",
" \n",
" clear_output()\n",
" \n",
" # compute accuracies and display them\n",
" \n",
" # add accuracies to the lists oa_train and oa_test\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Analysis\n",
"\n",
"Question: copy paste the accuracy list in new variables (to save the results)\n",
"\n",
"Question: display the training and testing curves"
],
"metadata": {
"id": "Q701vLLjrU_Q"
}
},
{
"metadata": {
"id": "-GkhKBnp5Owp"
},
"cell_type": "code",
"source": [
"# display the curves\n"
],
"execution_count": null,
"outputs": []
},
{
"metadata": {
"id": "gAsITlVgFvym"
},
"cell_type": "markdown",
"source": [
"### Improvements\n",
"\n",
"In this part, the objective is play with the network to improve the results.\n",
"You can add more convolutional layers, batchnorm, train for longer...\n",
"\n",
"Question: each time you do a modification, save the results along with the previous results and display the curves.\n",
"\n",
"**Note**: the state of the art on this dataset is around 95% for the test. Reaching that is difficult and would require training for a long time, but you can try to get as close as possible !"
]
}
]
}