Skip to content

Instantly share code, notes, and snippets.

@rafaelvareto
Created March 6, 2024 13:14
Show Gist options
  • Save rafaelvareto/cc272b7134081e8d9a9cc1535e1abd03 to your computer and use it in GitHub Desktop.
Save rafaelvareto/cc272b7134081e8d9a9cc1535e1abd03 to your computer and use it in GitHub Desktop.
04_Problemas_conv_pytorch.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/rafaelvareto/cc272b7134081e8d9a9cc1535e1abd03/04_problemas_conv_pytorch.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6OghyhHyXsVi"
},
"source": [
"# Problemas\n",
"\n",
"Nesta prática iremos usar tudo que aprendemos durante o módulo.\n",
"Logo, **seu objetivo é determinar e implementar um modelo para cada problema.**\n",
"\n",
"Lembre-se de definir:\n",
"\n",
"1. o Dataloader, tratando a forma de ler as imagens de cada dataset, experimentando transformações diferentes (resize, crop, flips e etc.)\n",
"1. uma arquitetura (tentem usar tanto arquiteturas existentes como propor novas usando camadas de convolução, pooling, e densas),\n",
"1. uma função de custo\n",
"1. um algoritmo de otimização (agora, como os problemas são maiores, será possível notar mais claramente a diferença entre diferentes algoritmos).\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "bJkH8GxmXzuC"
},
"source": [
"import time, os, sys, numpy as np\n",
"import torch\n",
"import torchvision\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"\n",
"import PIL\n",
"from PIL import Image\n",
"from torch import optim\n",
"from torchsummary import summary\n",
"\n",
"# Test if GPU is avaliable, if not, use cpu instead\n",
"device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
"n = torch.cuda.device_count()\n",
"devices_ids= list(range(n))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "udlf9BxBX3Vx"
},
"source": [
"# funções básicas\n",
"\n",
"# Função usada para calcular acurácia\n",
"def evaluate_accuracy(data_iter, net, loss):\n",
" \"\"\"Evaluate accuracy of a model on the given data set.\"\"\"\n",
"\n",
" acc_sum, n, l = torch.Tensor([0]), 0, 0\n",
" net.eval()\n",
" with torch.no_grad():\n",
" for X, y in data_iter:\n",
" #y = y.astype('float32')\n",
" X, y = X.to(device), y.to(device)\n",
" y_hat = net(X)\n",
" l += loss(y_hat, y).sum()\n",
" acc_sum += (y_hat.argmax(axis=1) == y).sum().item()\n",
" n += y.size()[0]\n",
"\n",
" return acc_sum.item() / n, l.item() / len(data_iter)\n",
"\n",
"# Função usada no treinamento e validação da rede\n",
"def train_validate(net, train_iter, test_iter, batch_size, trainer, loss,\n",
" num_epochs):\n",
" print('training on', device)\n",
" for epoch in range(num_epochs):\n",
" net.train()\n",
" train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()\n",
" for X, y in train_iter:\n",
" X, y = X.to(device), y.to(device)\n",
" y_hat = net(X)\n",
" trainer.zero_grad()\n",
" l = loss(y_hat, y).sum()\n",
" l.backward()\n",
" trainer.step()\n",
" train_l_sum += l.item()\n",
" train_acc_sum += (y_hat.argmax(axis=1) == y).sum().item()\n",
" n += y.size()[0]\n",
" test_acc, test_loss = evaluate_accuracy(test_iter, net, loss)\n",
" print('epoch %d, train loss %.4f, train acc %.3f, test loss %.4f, '\n",
" 'test acc %.3f, time %.1f sec'\n",
" % (epoch + 1, train_l_sum / len(train_iter), train_acc_sum / n, test_loss,\n",
" test_acc, time.time() - start))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Su-ITUO4eJ7p"
},
"source": [
"\n",
"## Problema 1 - Exemplo\n",
"\n",
"Neste problema, classificaremos imagens histólogica do dataset [*Colorectal Histology*](https://www.kaggle.com/kmader/colorectal-histology-mnist).\n",
"Neste caso, vamos receber imagens com tamanho de $150\\times 150$ pixels e classificá-las entre 8 classes:\n",
"\n",
"1. tumor\n",
"1. stroma\n",
"1. complex\n",
"1. lympho\n",
"1. debris\n",
"1. mucosa\n",
"1. adipose\n",
"1. empty"
]
},
{
"cell_type": "code",
"metadata": {
"id": "n_d2-W5a0Kap",
"outputId": "9e6bf0de-dba7-4c37-b0ca-601a7ac2f36c",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"# Baixando o dataset\n",
"!wget https://www.dropbox.com/s/k0f6vxyhcr6gh1r/Kather_texture_2016_image_tiles_5000.zip\n",
"!unzip -q Kather_texture_2016_image_tiles_5000.zip"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--2022-03-28 13:49:12-- https://www.dropbox.com/s/k0f6vxyhcr6gh1r/Kather_texture_2016_image_tiles_5000.zip\n",
"Resolving www.dropbox.com (www.dropbox.com)... 162.125.5.18, 2620:100:6018:18::a27d:312\n",
"Connecting to www.dropbox.com (www.dropbox.com)|162.125.5.18|:443... connected.\n",
"HTTP request sent, awaiting response... 301 Moved Permanently\n",
"Location: /s/raw/k0f6vxyhcr6gh1r/Kather_texture_2016_image_tiles_5000.zip [following]\n",
"--2022-03-28 13:49:12-- https://www.dropbox.com/s/raw/k0f6vxyhcr6gh1r/Kather_texture_2016_image_tiles_5000.zip\n",
"Reusing existing connection to www.dropbox.com:443.\n",
"HTTP request sent, awaiting response... 302 Found\n",
"Location: https://uc8e592ebc7bbb85965655cbbe26.dl.dropboxusercontent.com/cd/0/inline/BiVBzMriv6Vp3W1XCiHhzClO1FpWl4bp9CAidCIooh49l76HgvJhCiENpj9ks6Hp8bmdqVTA0HxewAHpNE6AlVxhV-t8aJKT7__zkF2XIP3-60wwZGGEx0b4WbPMmAtFGGSdJmIP2LvKJw6KckbKEBX_vOq3rWMk7pwNju7qhG7GIA/file# [following]\n",
"--2022-03-28 13:49:13-- https://uc8e592ebc7bbb85965655cbbe26.dl.dropboxusercontent.com/cd/0/inline/BiVBzMriv6Vp3W1XCiHhzClO1FpWl4bp9CAidCIooh49l76HgvJhCiENpj9ks6Hp8bmdqVTA0HxewAHpNE6AlVxhV-t8aJKT7__zkF2XIP3-60wwZGGEx0b4WbPMmAtFGGSdJmIP2LvKJw6KckbKEBX_vOq3rWMk7pwNju7qhG7GIA/file\n",
"Resolving uc8e592ebc7bbb85965655cbbe26.dl.dropboxusercontent.com (uc8e592ebc7bbb85965655cbbe26.dl.dropboxusercontent.com)... 162.125.3.15, 2620:100:6018:15::a27d:30f\n",
"Connecting to uc8e592ebc7bbb85965655cbbe26.dl.dropboxusercontent.com (uc8e592ebc7bbb85965655cbbe26.dl.dropboxusercontent.com)|162.125.3.15|:443... connected.\n",
"HTTP request sent, awaiting response... 302 Found\n",
"Location: /cd/0/inline2/BiUr-AJtXEZS6dWmYx1cInhtlG6v2wPa1biur1535d0MYohqVcyUItHh5eCcKJAm5nTsnyXk3HEjqRvxRqNqGYcXKvD7qBveFnD1AypzKGIPp6-69pMXifAJv6iUQ7WBpnfpxdfO_wTYGFYnt1CqtBWXJsClT0jcDZJ3ZF59-1wIIaLQD3YSVhLgFZ1OozpZKYRkqrwe7q1ExtW5cY4JwnWszZXONEqLnQ7wd70RJUa6IboBsB_1b1cEjdogabQDDOwg5WztYd-0uXAk0W9K1J8bfunCbujRWE3CI_z2Jh6QGSGjNKzE6mr4PRfIQGOTOJigorz2MPw6zl9bm7L8NW04r6xKEyif2HBNEPPbSUk8qGkiDE7rqvRbadyrXd1q0GshXYyRx0z7jGjZOiP4LnqCGijEtyg8eMYfpS-RtzVJ4A/file [following]\n",
"--2022-03-28 13:49:13-- https://uc8e592ebc7bbb85965655cbbe26.dl.dropboxusercontent.com/cd/0/inline2/BiUr-AJtXEZS6dWmYx1cInhtlG6v2wPa1biur1535d0MYohqVcyUItHh5eCcKJAm5nTsnyXk3HEjqRvxRqNqGYcXKvD7qBveFnD1AypzKGIPp6-69pMXifAJv6iUQ7WBpnfpxdfO_wTYGFYnt1CqtBWXJsClT0jcDZJ3ZF59-1wIIaLQD3YSVhLgFZ1OozpZKYRkqrwe7q1ExtW5cY4JwnWszZXONEqLnQ7wd70RJUa6IboBsB_1b1cEjdogabQDDOwg5WztYd-0uXAk0W9K1J8bfunCbujRWE3CI_z2Jh6QGSGjNKzE6mr4PRfIQGOTOJigorz2MPw6zl9bm7L8NW04r6xKEyif2HBNEPPbSUk8qGkiDE7rqvRbadyrXd1q0GshXYyRx0z7jGjZOiP4LnqCGijEtyg8eMYfpS-RtzVJ4A/file\n",
"Reusing existing connection to uc8e592ebc7bbb85965655cbbe26.dl.dropboxusercontent.com:443.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 258098431 (246M) [application/zip]\n",
"Saving to: ‘Kather_texture_2016_image_tiles_5000.zip’\n",
"\n",
"Kather_texture_2016 100%[===================>] 246.14M 77.9MB/s in 3.2s \n",
"\n",
"2022-03-28 13:49:18 (77.9 MB/s) - ‘Kather_texture_2016_image_tiles_5000.zip’ saved [258098431/258098431]\n",
"\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "R7PqlFcLeG-h"
},
"source": [
"class HistologyDataset(torch.utils.data.Dataset):\n",
" def __init__(self, root, transform, train=False, calc_norm=True, has_norm=True):\n",
" self.root = root\n",
" self.train = train\n",
" self.calc_norm = calc_norm\n",
" self.has_norm = has_norm\n",
" self.le = {'tumor': 0, 'stroma': 1, 'complex': 2, 'lympho': 3,\n",
" 'debris': 4, 'mucosa': 5, 'adipose': 6, 'empty': 7} # dicionário para definir o label de cada classe\n",
" self.transform = transform\n",
" self.load_images()\n",
"\n",
" def load_images(self):\n",
" self.img_list, self.labels = self.read_images(root=self.root)\n",
"\n",
" def read_images(self, root):\n",
" # Leitura das imagens do dataset\n",
" # para este caso, o dataset divide em pastas as imagens de cada classe correspondente\n",
" # portanto, vamos percorrer essas pastas, adicionando as primeiras 500 imagens para o conjunto de treino\n",
" # o restante das imagens (a partir da 500) são adicionadas na validação\n",
" # o label é definido de acordo com o nome da pasta pelo dicionário self.le definido acima\n",
" # por exemplo: a pasta 01_TUMOR vai ser correspondente ao self.le['tumor'], que é igual a 0\n",
" img_list, labels = [], []\n",
" if self.train is True:\n",
" for folder in os.listdir(self.root):\n",
" for num, img_name in enumerate(os.listdir(os.path.join(self.root, folder))):\n",
" if num < 500:\n",
" img_list.append(os.path.join(self.root, folder, img_name))\n",
" labels.append(self.le[folder.split('_')[1].lower()])\n",
" else:\n",
" for folder in os.listdir(os.path.join(self.root)):\n",
" for num, img_name in enumerate(os.listdir(os.path.join(self.root, folder))):\n",
" if num >= 500:\n",
" img_list.append(os.path.join(self.root, folder, img_name))\n",
" labels.append(self.le[folder.split('_')[1].lower()])\n",
"\n",
" return img_list, labels\n",
"\n",
" def __getitem__(self, item):\n",
" # retorna uma imagem para o treino/teste\n",
" if self.has_norm is True:\n",
" # normaliza a imagem se has_norm for setado como True\n",
" cur_img = self.normalize_image(self.transform(Image.open(self.img_list[item])))\n",
" else:\n",
" # apenas converte a imagem para tensor, sem normalizar\n",
" cur_img = self.transform(Image.open(self.img_list[item]))\n",
" cur_label = self.labels[item]\n",
" return cur_img, cur_label\n",
"\n",
" def __len__(self):\n",
" return len(self.img_list)\n",
"\n",
" def normalize_image(self, img):\n",
" # normaliza uma imagem\n",
" # se calc_norm for True, normaliza pela subtração da média dividida pelo desvio para cada canal da imagem\n",
" # se calc_norm for False, normaliza pelos valores pré-definidos de média e desvio padrão\n",
" if self.calc_norm is True:\n",
" for i in range(img.shape[0]):\n",
" mu = img[i, :, :].mean()\n",
" std = img[i, :, :].std()\n",
" img[i, :, :] = ((img[i, :, :] - mu) / std)\n",
" else:\n",
" img = torchvision.transforms.functional.normalize(img,\n",
" mean=torch.Tensor([0.485, 0.456, 0.406]),\n",
" std=torch.Tensor([0.229, 0.224, 0.225]))\n",
" return img\n",
"\n",
"\n",
"def load_data(dataset, root, batch_size, resize=None):\n",
" # o transformer define a sequência de transformações que serão aplicadas na imagem\n",
" # neste caso, a sequência é um redimensionamento da imagem (caso a variável resize seja definida)\n",
" # seguido de uma transformação para tensor\n",
" # várias outras transformações estão disponíveis no Pytorch, como crops, flips, espelhamento e etc.\n",
" transformer = []\n",
" if resize is not None:\n",
" transformer += [torchvision.transforms.Resize(size=(resize,resize))]\n",
" transformer += [torchvision.transforms.ToTensor()]\n",
" transformer = torchvision.transforms.Compose(transformer)\n",
"\n",
" train = dataset(root=root, transform=transformer, train=True) #obtem dataset de treino\n",
" test = dataset(root=root, transform=transformer, train=False) #obtem dataset de validação\n",
" num_workers = 0 if sys.platform.startswith('win32') else 4\n",
"\n",
" train_iter = torch.utils.data.DataLoader(train,\n",
" batch_size, shuffle=True,\n",
" num_workers=num_workers) # criação do dataloader de treino\n",
" test_iter = torch.utils.data.DataLoader(test,\n",
" batch_size, shuffle=False,\n",
" num_workers=num_workers) # criação do dataloader de teste\n",
" return train_iter, test_iter\n",
"\n",
"# carregamento do dado\n",
"batch_size = 64\n",
"train_iter, test_iter = load_data(HistologyDataset, 'Kather_texture_2016_image_tiles_5000', batch_size, resize=None)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "MajgTJrh0PsR"
},
"source": [
"# IMPLEMENTE AQUI SUA REDE E DEFINIÇÃO DE LOSS E OTIMIZADOR\n",
"\n",
"# experimente criar redes do zero com o conhecimento adquirido no curso até agora\n",
"# experimente também replicar redes já estabelecidas (alexnet, lenet, vgg e etc)\n",
"# experimente também utilizar as redes pré-treinadas já implementadas no torchvision\n",
"# para o caso de rede pré-treinada, lembre-se de modificar a saída da rede para o número de classes do problema\n",
"\n",
"# treinamento e validação\n",
"train_validate(model, train_iter, test_iter, batch_size, optimizer, loss, num_epochs)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "03GGRGweYCZm"
},
"source": [
"## Problema 2\n",
"\n",
"Neste problema, classificaremos imagens de sensoriamento remoto de plantações de café do dataset público [Brazilian Coffee Scenes](http://www.patreo.dcc.ufmg.br/2017/11/12/brazilian-coffee-scenes-dataset/).\n",
"Neste caso, , vamos receber imagens de $64\\times 64$ pixels e classificá-las entre duas classes:\n",
"\n",
"1. café, e\n",
"2. não café."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Pb9SyKmrY_c2"
},
"source": [
"# Baixando o dataset\n",
"!wget http://www.patreo.dcc.ufmg.br/wp-content/uploads/2017/11/brazilian_coffee_dataset.zip\n",
"!unzip -q brazilian_coffee_dataset.zip"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "znSdowm6YAtq"
},
"source": [
"class CoffeeDataset(torch.utils.data.Dataset):\n",
" def __init__(self, root, transform, train=False, calc_norm=True, has_norm=True):\n",
" self.root = root\n",
" self.train = train\n",
" self.calc_norm = calc_norm\n",
" self.has_norm = has_norm\n",
" self.load_images()\n",
" self.transform = transform\n",
"\n",
" def load_images(self):\n",
" self.img_list, self.labels = self.read_images(root=self.root)\n",
"\n",
" def read_images(self, root):\n",
" # para este dataset, existem 5 pastas (fold1, fold2, ..., fold5) com as imagens\n",
" # e existem 5 arquivos txts (fold1.txt, fold2.txt, ..., fold5.txt) com o nome das imagens correspondentes\n",
" # nos arquivos txts, cada linha representa uma imagem seguindo o formato {classe}.{nome da img}\n",
" # sendo classe igual a coffee ou noncoffee (0 ou 1)\n",
" # tratamos o nome das imagens de acordo com cada linha do arquivo (não esquecendo de adicionar o .jpg)\n",
" # convertemos o label em 0 ou 1 dependendo da classe\n",
" # Vamos utilizar o fold 1 como validação e o restante dos folds como treino\n",
" img_list, labels = [], []\n",
" if self.train is True:\n",
" for i in range(1,5):\n",
" data_file = open(os.path.join(root, 'fold' + str(i+1) + '.txt'), \"r\") # arquivo com nome das imagens\n",
" data_list = [i.replace('\\n', '') for i in data_file.readlines()]\n",
" for row in data_list:\n",
" img_name = '.'.join(row.split('.')[1:])\n",
" img_list.append(os.path.join(root, 'fold' + str(i+1), img_name + '.jpg'))\n",
" labels.append(0 if row.split('.')[0] == 'coffee' else 1)\n",
" else:\n",
" data_file = open(os.path.join(root, 'fold1.txt'), \"r\") # arquivo com nome das imagens\n",
" data_list = [i.replace('\\n', '') for i in data_file.readlines()]\n",
" for row in data_list:\n",
" img_name = '.'.join(row.split('.')[1:])\n",
" img_list.append(os.path.join(root, 'fold1', img_name + '.jpg'))\n",
" labels.append(0 if row.split('.')[0] == 'coffee' else 1)\n",
"\n",
" return img_list, labels\n",
"\n",
" def __getitem__(self, item):\n",
" # retorna uma imagem para o treino/teste\n",
" if self.has_norm is True:\n",
" # normaliza a imagem se has_norm for setado como True\n",
" cur_img = self.normalize_image(self.transform(Image.open(self.img_list[item])))\n",
" else:\n",
" # apenas converte a imagem para tensor, sem normalizar\n",
" cur_img = self.transform(Image.open(self.img_list[item]))\n",
" cur_label = self.labels[item]\n",
" return cur_img, cur_label\n",
"\n",
" def __len__(self):\n",
" return len(self.img_list)\n",
"\n",
" def normalize_image(self, img):\n",
" # normaliza uma imagem\n",
" # se calc_norm for True, normaliza pela subtração da média dividida pelo desvio para cada canal da imagem\n",
" # se calc_norm for False, normaliza pelos valores pré-definidos de média e desvio padrão\n",
" if self.calc_norm is True:\n",
" for i in range(img.shape[0]):\n",
" mu = img[i, :, :].mean()\n",
" std = img[i, :, :].std()\n",
" img[i, :, :] = ((img[i, :, :] - mu) / std)\n",
" else:\n",
" img = torchvision.transforms.functional.normalize(img,\n",
" mean=torch.Tensor([0.485, 0.456, 0.406]),\n",
" std=torch.Tensor([0.229, 0.224, 0.225]))\n",
" return img\n",
"\n",
"\n",
"def load_data(dataset, root, batch_size, resize=None):\n",
" # o transformer define a sequência de transformações que serão aplicadas na imagem\n",
" # neste caso, a sequência é um redimensionamento da imagem (caso a variável resize seja definida)\n",
" # seguido de uma transformação para tensor\n",
" # várias outras transformações estão disponíveis no Pytorch, como crops, flips, espelhamento e etc.\n",
" transformer = []\n",
" if resize is not None:\n",
" transformer += [torchvision.transforms.Resize(size=(resize,resize))]\n",
" transformer += [torchvision.transforms.ToTensor()]\n",
" transformer = torchvision.transforms.Compose(transformer)\n",
"\n",
" train = dataset(root=root, transform=transformer, train=True) #obtem dataset de treino\n",
" test = dataset(root=root, transform=transformer, train=False) #obtem dataset de validação\n",
" num_workers = 0 if sys.platform.startswith('win32') else 4\n",
"\n",
" train_iter = torch.utils.data.DataLoader(train,\n",
" batch_size, shuffle=True,\n",
" num_workers=num_workers) # criação do dataloader de treino\n",
" test_iter = torch.utils.data.DataLoader(test,\n",
" batch_size, shuffle=False,\n",
" num_workers=num_workers) # criação do dataloader de teste\n",
" return train_iter, test_iter\n",
"\n",
"# carregamento do dado\n",
"batch_size = 64\n",
"train_iter, test_iter = load_data(CoffeeDataset, 'brazilian_coffee_scenes', batch_size, resize=None)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "E-4J_GJRXLMq"
},
"source": [
"# IMPLEMENTE AQUI SUA REDE E DEFINIÇÃO DE LOSS E OTIMIZADOR\n",
"\n",
"# experimente criar redes do zero com o conhecimento adquirido no curso até agora\n",
"# experimente também replicar redes já estabelecidas (alexnet, lenet, vgg e etc)\n",
"# experimente também utilizar as redes pré-treinadas já implementadas no torchvision\n",
"# para o caso de rede pré-treinada, lembre-se de modificar a saída da rede para o número de classes do problema\n",
"\n",
"# treinamento e validação\n",
"train_validate(model, train_iter, test_iter, batch_size, optimizer, loss, num_epochs)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "O59tc-hAdOm5"
},
"source": [
"## Problema 3\n",
"\n",
"Neste problema, classificaremos imagens gerais de sensoriamento remoto do dataset público [UCMerced](http://weegee.vision.ucmerced.edu/datasets/landuse.html).\n",
"Neste caso, vamos receber imagens de $256\\times 256$ pixels e classificá-las entre 21 classes:\n",
"\n",
"1. agricultural\n",
"1. airplane\n",
"1. baseballdiamond\n",
"1. beach\n",
"1. buildings\n",
"1. chaparral\n",
"1. denseresidential\n",
"1. forest\n",
"1. freeway\n",
"1. golfcourse\n",
"1. harbor\n",
"1. intersection\n",
"1. mediumresidential\n",
"1. mobilehomepark\n",
"1. overpass\n",
"1. parkinglot\n",
"1. river\n",
"1. runway\n",
"1. sparseresidential\n",
"1. storagetanks\n",
"1. tenniscourt"
]
},
{
"cell_type": "code",
"metadata": {
"id": "8IA0iam6pXbZ"
},
"source": [
"# Baixando o dataset\n",
"!wget http://weegee.vision.ucmerced.edu/datasets/UCMerced_LandUse.zip\n",
"!unzip -q UCMerced_LandUse.zip"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "mlTW4qgWdM6y"
},
"source": [
"class UCMercedDataset(torch.utils.data.Dataset):\n",
" def __init__(self, root, transform, train=False, calc_norm=False, has_norm=True):\n",
" self.root = root\n",
" self.train = train\n",
" self.calc_norm = calc_norm\n",
" self.has_norm = has_norm\n",
" self.transform = transform\n",
" self.load_images()\n",
"\n",
" def load_images(self):\n",
" self.img_list, self.labels = self.read_images(root=self.root)\n",
"\n",
" def read_images(self, root):\n",
" # IMPLEMENTE AQUI A LEITURA DAS IMAGENS\n",
"\n",
" # para este dataset, as imagens estão dividas em pastas com o nome da classe\n",
" # uma sugestão é usar o enumerate do Python para percorrer essas pastas, atribuindo o valor do enumerate como label da classe\n",
" # outra opção é definir um dicionário com o label de cada classe como no exemplo do Colerectal histology\n",
" # para cada pasta, selecione as primeiras 80 imagens para o treino e o restante para o teste\n",
" img_list, labels = [], []\n",
"\n",
" #...\n",
"\n",
" return img_list, labels\n",
"\n",
" def __getitem__(self, item):\n",
" # IMPLEMENTE AQUI O RETORNO E TRATAMENTO DE CADA IMAGEM\n",
"\n",
" # lembre-se de aplicar as transformações enviadas ao dataloader (principalmente o ToTensor)\n",
"\n",
" return img, label\n",
"\n",
" def __len__(self):\n",
" return len(self.img_list)\n",
"\n",
" def normalize_image(self, img):\n",
" # normaliza uma imagem\n",
" # se calc_norm for True, normaliza pela subtração da média dividida pelo desvio para cada canal da imagem\n",
" # se calc_norm for False, normaliza pelos valores pré-definidos de média e desvio padrão\n",
" if self.calc_norm is True:\n",
" for i in range(img.shape[0]):\n",
" mu = img[i, :, :].mean()\n",
" std = img[i, :, :].std()\n",
" img[i, :, :] = ((img[i, :, :] - mu) / std)\n",
" else:\n",
" img = torchvision.transforms.functional.normalize(img,\n",
" mean=torch.Tensor([0.485, 0.456, 0.406]),\n",
" std=torch.Tensor([0.229, 0.224, 0.225]))\n",
" return img\n",
"\n",
"\n",
"def load_data(dataset, root, batch_size, resize=None):\n",
" # IMPLEMENTE AQUI A DEFINIÇÃO DAS TRANSFORMAÇÕES E DO DATALOADER\n",
"\n",
" # o transformer define a sequência de transformações que serão aplicadas na imagem\n",
" # a principal para o nosso caso é o ToTensor, que converte a imagem no formato lido para um tensor\n",
" # experimente transformações diferentes, como crops e flips\n",
" # o resize pode ser necessário para datasets com imagems de tamanhos variados\n",
"\n",
" # defina também o dataloader de treino e teste\n",
"\n",
" return train_iter, test_iter\n",
"\n",
"# carregamento do dado\n",
"batch_size = 64\n",
"train_iter, test_iter = load_data(UCMercedDataset, os.path.join('UCMerced_LandUse', 'Images'), batch_size, resize=256)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "KTCvFXC2q4Xa"
},
"source": [
"# IMPLEMENTE AQUI SUA REDE E DEFINIÇÃO DE LOSS E OTIMIZADOR\n",
"\n",
"# experimente criar redes do zero com o conhecimento adquirido no curso até agora\n",
"# experimente também replicar redes já estabelecidas (alexnet, lenet, vgg e etc)\n",
"# experimente também utilizar as redes pré-treinadas já implementadas no torchvision\n",
"# para o caso de rede pré-treinada, lembre-se de modificar a saída da rede para o número de classes do problema\n",
"\n",
"# treinamento e validação\n",
"train_validate(model, train_iter, test_iter, batch_size, optimizer, loss, num_epochs)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "S_26IvT9d5nF"
},
"source": [
"## Problema 4\n",
"\n",
"Neste problema, classificaremos imagens genéricas de textura do dataset público [*Describable Textures Dataset*](http://www.robots.ox.ac.uk/~vgg/data/dtd/).\n",
"Neste caso, vamos receber imagens com tamanho variado (de $300\\times 300$ pixels até $640\\times 640$) e classificá-las entre 47 classes:\n",
"\n",
"1. banded\n",
"1. blotchy\n",
"1. braided\n",
"1. bubbly\n",
"1. bumpy\n",
"1. chequered\n",
"1. cobwebbed\n",
"1. cracked\n",
"1. crosshatched\n",
"1. crystalline\n",
"1. dotted\n",
"1. fibrous\n",
"1. flecked\n",
"1. freckled\n",
"1. frilly\n",
"1. gauzy\n",
"1. grid\n",
"1. grooved\n",
"1. honeycombed\n",
"1. interlaced\n",
"1. knitted\n",
"1. lacelike\n",
"1. lined\n",
"1. marbled\n",
"1. matted\n",
"1. meshed\n",
"1. paisley\n",
"1. perforated\n",
"1. pitted\n",
"1. pleated\n",
"1. polka-dotted\n",
"1. porous\n",
"1. potholed\n",
"1. scaly\n",
"1. smeared\n",
"1. spiralled\n",
"1. sprinkled\n",
"1. stained\n",
"1. stratified\n",
"1. striped\n",
"1. studded\n",
"1. swirly\n",
"1. veined\n",
"1. waffled\n",
"1. woven\n",
"1. wrinkled\n",
"1. zigzagged"
]
},
{
"cell_type": "code",
"metadata": {
"id": "WBeE2iv9yI-_"
},
"source": [
"# Download do dataset\n",
"!wget http://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz\n",
"!tar -xzf dtd-r1.0.1.tar.gz"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "OgWH1s5ndbhx"
},
"source": [
"class TextureDataset(torch.utils.data.Dataset):\n",
" def __init__(self, root, transform, train=False, calc_norm=False, has_norm=True):\n",
" self.root = root\n",
" self.train = train\n",
" self.calc_norm = calc_norm\n",
" self.has_norm = has_norm\n",
" self.le = {'banded': 0, 'blotchy': 1, 'braided': 2, 'bubbly': 3, 'bumpy': 4,\n",
" 'chequered': 5, 'cobwebbed': 6, 'cracked': 7, 'crosshatched': 8,\n",
" 'crystalline': 9, 'dotted': 10, 'fibrous': 11, 'flecked': 12,\n",
" 'freckled': 13, 'frilly': 14, 'gauzy': 15, 'grid': 16, 'grooved': 17,\n",
" 'honeycombed': 18, 'interlaced': 19, 'knitted': 20, 'lacelike': 21, 'lined': 22,\n",
" 'marbled': 23, 'matted': 24, 'meshed': 25, 'paisley': 26, 'perforated': 27,\n",
" 'pitted': 28, 'pleated': 29, 'polka-dotted': 30, 'porous': 31, 'potholed': 32,\n",
" 'scaly': 33, 'smeared': 34, 'spiralled': 35, 'sprinkled': 36, 'stained': 37,\n",
" 'stratified': 38, 'striped': 39, 'studded': 40, 'swirly': 41, 'veined': 42,\n",
" 'waffled': 43, 'woven': 44, 'wrinkled': 45, 'zigzagged': 46} # dicionário definindo o label de cada classe\n",
" self.transform = transform\n",
" self.load_images()\n",
"\n",
" def load_images(self):\n",
" self.img_list, self.labels = self.read_images(root=self.root)\n",
"\n",
" def read_images(self, root):\n",
" # IMPLEMENTE AQUI A LEITURA DAS IMAGENS\n",
"\n",
" # para este caso, na pasta images estão as imagens separadas por pastas relacionadas a classe\n",
" # na pasta label existem txts definindo a divisão das imagens em treino, teste e validação\n",
" # utilize as imagens nos arquivos train1.txt e val1.txt como treino\n",
" # utilize as imagens nos arquivos teste1.txt como validação\n",
" # lembre-se de atribuir o label de acordo com o dicionário self.le definido acima\n",
"\n",
" img_list, labels = [], []\n",
"\n",
" #...\n",
"\n",
" return img_list, labels\n",
"\n",
" def __getitem__(self, item):\n",
" # IMPLEMENTE AQUI O RETORNO E TRATAMENTO DE CADA IMAGEM\n",
"\n",
" # lembre-se de aplicar as transformações enviadas ao dataloader (principalmente o ToTensor)\n",
"\n",
" return img, label\n",
"\n",
" def __len__(self):\n",
" return len(self.img_list)\n",
"\n",
" def normalize_image(self, img):\n",
" # normaliza uma imagem\n",
" # se calc_norm for True, normaliza pela subtração da média dividida pelo desvio para cada canal da imagem\n",
" # se calc_norm for False, normaliza pelos valores pré-definidos de média e desvio padrão\n",
" if self.calc_norm is True:\n",
" for i in range(img.shape[0]):\n",
" mu = img[i, :, :].mean()\n",
" std = img[i, :, :].std()\n",
" img[i, :, :] = ((img[i, :, :] - mu) / std)\n",
" else:\n",
" img = torchvision.transforms.functional.normalize(img,\n",
" mean=torch.Tensor([0.485, 0.456, 0.406]),\n",
" std=torch.Tensor([0.229, 0.224, 0.225]))\n",
" return img\n",
"\n",
"\n",
"def load_data(dataset, root, batch_size, resize=None):\n",
" # IMPLEMENTE AQUI A DEFINIÇÃO DAS TRANSFORMAÇÕES E DO DATALOADER\n",
"\n",
" # o transformer define a sequência de transformações que serão aplicadas na imagem\n",
" # a principal para o nosso caso é o ToTensor, que converte a imagem no formato lido para um tensor\n",
" # experimente transformações diferentes, como crops e flips\n",
" # o resize pode ser necessário para datasets com imagems de tamanhos variados\n",
"\n",
" # defina também o dataloader de treino e teste\n",
"\n",
" return train_iter, test_iter\n",
"\n",
"# carregamento do dado\n",
"batch_size = 32\n",
"train_iter, test_iter = load_data(TextureDataset, os.path.join('dtd'), batch_size, resize=None)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "bY5x343ZyuAu"
},
"source": [
"# IMPLEMENTE AQUI SUA REDE E DEFINIÇÃO DE LOSS E OTIMIZADOR\n",
"\n",
"# experimente criar redes do zero com o conhecimento adquirido no curso até agora\n",
"# experimente também replicar redes já estabelecidas (alexnet, lenet, vgg e etc)\n",
"# experimente também utilizar as redes pré-treinadas já implementadas no torchvision\n",
"# para o caso de rede pré-treinada, lembre-se de modificar a saída da rede para o número de classes do problema\n",
"\n",
"# treinamento e validação\n",
"train_validate(model, train_iter, test_iter, batch_size, optimizer, loss, num_epochs)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "LTrCVEiiekAy"
},
"source": [
"# função auxiliar para plotar imagens do dataset\n",
"\n",
"from matplotlib import pyplot as plt\n",
"\n",
"def show_images(imgs, num_rows, num_cols, titles=None, scale=1.5):\n",
" \"\"\"Plot a list of images.\"\"\"\n",
" figsize = (num_cols * scale, num_rows * scale)\n",
" axes = plt.subplots(num_rows, num_cols, figsize=figsize)[1].flatten()\n",
" for i, (ax, img) in enumerate(zip(axes, imgs)):\n",
" ax.imshow(img.numpy())\n",
" ax.axes.get_xaxis().set_visible(False)\n",
" ax.axes.get_yaxis().set_visible(False)\n",
" if titles:\n",
" ax.set_title(titles[i])\n",
" return axes"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "OllvZWX6epl9"
},
"source": [
"#plota imagens do dataset\n",
"\n",
"imgs = []\n",
"for X, y in train_iter:\n",
" X = np.swapaxes(X, 1, 3)\n",
" imgs = X[0:18]\n",
" break\n",
"\n",
"\n",
"show_images(imgs, 3, 6, titles=None, scale=3)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "RTihfBqEerPE"
},
"source": [],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment