Created
November 29, 2022 12:31
-
-
Save vikeshpandey/c3dfe841b02cf986dbeadbfc839b5635 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"source": [ | |
"## Move your code from local Jupyter to Amazon SageMaker Studio\n", | |
"\n", | |
"This notebook is directly taken from [Official PyTorch QuickStart Tutorial Guide](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) and further adapted to add Amazon SageMaker related code in last section of the notebook." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## NOTE!! \n", | |
"\n", | |
"The section below is taken as-is from [Official PyTorch QuickStart Tutorial Guide](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%matplotlib inline" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"\n", | |
"[Learn the Basics](intro.html) ||\n", | |
"**Quickstart** ||\n", | |
"[Tensors](tensorqs_tutorial.html) ||\n", | |
"[Datasets & DataLoaders](data_tutorial.html) ||\n", | |
"[Transforms](transforms_tutorial.html) ||\n", | |
"[Build Model](buildmodel_tutorial.html) ||\n", | |
"[Autograd](autogradqs_tutorial.html) ||\n", | |
"[Optimization](optimization_tutorial.html) ||\n", | |
"[Save & Load Model](saveloadrun_tutorial.html)\n", | |
"\n", | |
"# Quickstart\n", | |
"This section runs through the API for common tasks in machine learning. Refer to the links in each section to dive deeper.\n", | |
"\n", | |
"## Working with data\n", | |
"PyTorch has two [primitives to work with data](https://pytorch.org/docs/stable/data.html):\n", | |
"``torch.utils.data.DataLoader`` and ``torch.utils.data.Dataset``.\n", | |
"``Dataset`` stores the samples and their corresponding labels, and ``DataLoader`` wraps an iterable around\n", | |
"the ``Dataset``.\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"import torch\n", | |
"from torch import nn\n", | |
"from torch.utils.data import DataLoader\n", | |
"from torchvision import datasets\n", | |
"from torchvision.transforms import ToTensor" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"PyTorch offers domain-specific libraries such as [TorchText](https://pytorch.org/text/stable/index.html),\n", | |
"[TorchVision](https://pytorch.org/vision/stable/index.html), and [TorchAudio](https://pytorch.org/audio/stable/index.html),\n", | |
"all of which include datasets. For this tutorial, we will be using a TorchVision dataset.\n", | |
"\n", | |
"The ``torchvision.datasets`` module contains ``Dataset`` objects for many real-world vision data like\n", | |
"CIFAR, COCO ([full list here](https://pytorch.org/vision/stable/datasets.html)). In this tutorial, we\n", | |
"use the FashionMNIST dataset. Every TorchVision ``Dataset`` includes two arguments: ``transform`` and\n", | |
"``target_transform`` to modify the samples and labels respectively.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"# Download training data from open datasets.\n", | |
"training_data = datasets.FashionMNIST(\n", | |
" root=\"data\",\n", | |
" train=True,\n", | |
" download=True,\n", | |
" transform=ToTensor(),\n", | |
")\n", | |
"\n", | |
"# Download test data from open datasets.\n", | |
"test_data = datasets.FashionMNIST(\n", | |
" root=\"data\",\n", | |
" train=False,\n", | |
" download=True,\n", | |
" transform=ToTensor(),\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We pass the ``Dataset`` as an argument to ``DataLoader``. This wraps an iterable over our dataset, and supports\n", | |
"automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element\n", | |
"in the dataloader iterable will return a batch of 64 features and labels.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"batch_size = 64\n", | |
"\n", | |
"# Create data loaders.\n", | |
"train_dataloader = DataLoader(training_data, batch_size=batch_size)\n", | |
"test_dataloader = DataLoader(test_data, batch_size=batch_size)\n", | |
"\n", | |
"for X, y in test_dataloader:\n", | |
" print(f\"Shape of X [N, C, H, W]: {X.shape}\")\n", | |
" print(f\"Shape of y: {y.shape} {y.dtype}\")\n", | |
" break" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Read more about [loading data in PyTorch](data_tutorial.html).\n", | |
"\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"--------------\n", | |
"\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Creating Models\n", | |
"To define a neural network in PyTorch, we create a class that inherits\n", | |
"from [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). We define the layers of the network\n", | |
"in the ``__init__`` function and specify how data will pass through the network in the ``forward`` function. To accelerate\n", | |
"operations in the neural network, we move it to the GPU if available.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"# Get cpu or gpu device for training.\n", | |
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", | |
"print(f\"Using {device} device\")\n", | |
"\n", | |
"# Define model\n", | |
"class NeuralNetwork(nn.Module):\n", | |
" def __init__(self):\n", | |
" super().__init__()\n", | |
" self.flatten = nn.Flatten()\n", | |
" self.linear_relu_stack = nn.Sequential(\n", | |
" nn.Linear(28*28, 512),\n", | |
" nn.ReLU(),\n", | |
" nn.Linear(512, 512),\n", | |
" nn.ReLU(),\n", | |
" nn.Linear(512, 10)\n", | |
" )\n", | |
"\n", | |
" def forward(self, x):\n", | |
" x = self.flatten(x)\n", | |
" logits = self.linear_relu_stack(x)\n", | |
" return logits\n", | |
"\n", | |
"model = NeuralNetwork().to(device)\n", | |
"print(model)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Read more about [building neural networks in PyTorch](buildmodel_tutorial.html).\n", | |
"\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"--------------\n", | |
"\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Optimizing the Model Parameters\n", | |
"To train a model, we need a [loss function](https://pytorch.org/docs/stable/nn.html#loss-functions)\n", | |
"and an [optimizer](https://pytorch.org/docs/stable/optim.html).\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"loss_fn = nn.CrossEntropyLoss()\n", | |
"optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and\n", | |
"backpropagates the prediction error to adjust the model's parameters.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"def train(dataloader, model, loss_fn, optimizer):\n", | |
" size = len(dataloader.dataset)\n", | |
" model.train()\n", | |
" for batch, (X, y) in enumerate(dataloader):\n", | |
" X, y = X.to(device), y.to(device)\n", | |
"\n", | |
" # Compute prediction error\n", | |
" pred = model(X)\n", | |
" loss = loss_fn(pred, y)\n", | |
"\n", | |
" # Backpropagation\n", | |
" optimizer.zero_grad()\n", | |
" loss.backward()\n", | |
" optimizer.step()\n", | |
"\n", | |
" if batch % 100 == 0:\n", | |
" loss, current = loss.item(), batch * len(X)\n", | |
" print(f\"loss: {loss:>7f} [{current:>5d}/{size:>5d}]\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We also check the model's performance against the test dataset to ensure it is learning.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"def test(dataloader, model, loss_fn):\n", | |
" size = len(dataloader.dataset)\n", | |
" num_batches = len(dataloader)\n", | |
" model.eval()\n", | |
" test_loss, correct = 0, 0\n", | |
" with torch.no_grad():\n", | |
" for X, y in dataloader:\n", | |
" X, y = X.to(device), y.to(device)\n", | |
" pred = model(X)\n", | |
" test_loss += loss_fn(pred, y).item()\n", | |
" correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n", | |
" test_loss /= num_batches\n", | |
" correct /= size\n", | |
" print(f\"Test Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \\n\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The training process is conducted over several iterations (*epochs*). During each epoch, the model learns\n", | |
"parameters to make better predictions. We print the model's accuracy and loss at each epoch; we'd like to see the\n", | |
"accuracy increase and the loss decrease with every epoch.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"epochs = 5\n", | |
"for t in range(epochs):\n", | |
" print(f\"Epoch {t+1}\\n-------------------------------\")\n", | |
" train(train_dataloader, model, loss_fn, optimizer)\n", | |
" test(test_dataloader, model, loss_fn)\n", | |
"print(\"Done!\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Read more about [Training your model](optimization_tutorial.html).\n", | |
"\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"--------------\n", | |
"\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Saving Models\n", | |
"A common way to save a model is to serialize the internal state dictionary (containing the model parameters).\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"torch.save(model.state_dict(), \"model.pth\")\n", | |
"print(\"Saved PyTorch Model State to model.pth\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Loading Models\n", | |
"\n", | |
"The process for loading a model includes re-creating the model structure and loading\n", | |
"the state dictionary into it.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"model = NeuralNetwork()\n", | |
"model.load_state_dict(torch.load(\"model.pth\"))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This model can now be used to make predictions.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"jupyter": { | |
"outputs_hidden": false | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"classes = [\n", | |
" \"T-shirt/top\",\n", | |
" \"Trouser\",\n", | |
" \"Pullover\",\n", | |
" \"Dress\",\n", | |
" \"Coat\",\n", | |
" \"Sandal\",\n", | |
" \"Shirt\",\n", | |
" \"Sneaker\",\n", | |
" \"Bag\",\n", | |
" \"Ankle boot\",\n", | |
"]\n", | |
"\n", | |
"model.eval()\n", | |
"x, y = test_data[0][0], test_data[0][1]\n", | |
"with torch.no_grad():\n", | |
" pred = model(x)\n", | |
" predicted, actual = classes[pred[0].argmax(0)], classes[y]\n", | |
" print(f'Predicted: \"{predicted}\", Actual: \"{actual}\"')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Official PyTorch QuickStart Tutorial Ends here !!\n", | |
"\n", | |
"\n", | |
"# Introducing Amazon SageMaker related code !!" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In this section, we add Amazon SageMaker related code." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"First of all we import sagemaker python SDK and a SageMaker managed PyTorch Framework Estimator. We also mention IAM execution role to be used for Training" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import sagemaker # importing sagemaker python SDK\n", | |
"from sagemaker.pytorch.estimator import PyTorch # import PyTorch Estimator class \n", | |
"from sagemaker import get_execution_role # import fn to fetch execution role\n", | |
"\n", | |
"#Store the execution role. \n", | |
"#Here the same role used which was used to create a sagemaker studio user profile\n", | |
"execution_role = get_execution_role()\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now its time to define the Estimator and start the training job using SageMaker Training APIs. Note that training is happening on a SageMaker managed training cluster and not on the notebook itself." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"#Create the estimator object for PyTorch\n", | |
"estimator = PyTorch(\n", | |
" entry_point = \"train.py\", # training script\n", | |
" framework_version = \"1.12\", #PyTorch Framework version, keep it same as used in default example\n", | |
" py_version = \"py38\", # Compatible Python version to use\n", | |
" instance_count = 1, #number of EC2 instances needed for training\n", | |
" instance_type = \"ml.c5.xlarge\", #Type of EC2 instance/s needed for training\n", | |
" disable_profiler = True, #Disable profiler, as not needed\n", | |
" role = execution_role #Execution role used by training job\n", | |
")\n", | |
"\n", | |
"#Start the training\n", | |
"estimator.fit()" | |
] | |
} | |
], | |
"metadata": { | |
"instance_type": "ml.t3.medium", | |
"kernelspec": { | |
"display_name": "Python 3 (PyTorch 1.12 Python 3.8 CPU Optimized)", | |
"language": "python", | |
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/pytorch-1.12-cpu-py38" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.13" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment