ayulockin/preparing_MNIST_data.ipynb

## preparing_MNIST_data.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Utility to prepare get MNIST dataset as csv\n",
    "\n",
    "### Instructions:\n",
    "\n",
    "- Download dataset from [here](http://yann.lecun.com/exdb/mnist/).\n",
    "- Extract them into `MNIST_dataset/data` dir.\n",
    "- Set the path to the files if required.\n",
    "\n",
    "Hurray you have the csv dataset. Now enjoy!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gzip\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### About MNIST Hand Written Dataset\n",
    "\n",
    "- train-images-idx3-ubyte: training set images<br>\n",
    "- train-labels-idx1-ubyte: training set labels<br>\n",
    "- t10k-images-idx3-ubyte:  test set images<br>\n",
    "- t10k-labels-idx1-ubyte:  test set labels<br>\n",
    "\n",
    "The training set contains 60000 examples, and the test set 10000 examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setting path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "path_to_dataset = os.getcwd()+'/MNIST_dataset/data'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "files = os.listdir(path_to_dataset)\n",
    "files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Helper function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def MNISTtocsv(path, img_file, label_file, out_csv, n):\n",
    "    img = open(path+'/'+img_file, \"rb\")\n",
    "    label = open(path+'/'+label_file, \"rb\")\n",
    "    out = open(out_csv, \"w\")\n",
    "\n",
    "    img.read(16)\n",
    "    label.read(8)\n",
    "    images = []\n",
    "\n",
    "    for i in range(n):\n",
    "        image = [ord(label.read(1))]\n",
    "        for j in range(28*28):\n",
    "            image.append(ord(img.read(1)))\n",
    "        images.append(image)\n",
    "\n",
    "    for image in images:\n",
    "        out.write(\",\".join(str(pix) for pix in image)+\"\\n\")\n",
    "        \n",
    "    img.close()\n",
    "    label.close()\n",
    "    out.close()\n",
    "\n",
    "MNISTtocsv(path_to_dataset, files[0], files[1], 'mnist_test.csv',10000)\n",
    "MNISTtocsv(path_to_dataset, files[2], files[3], 'mnist_train.csv', 60000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Utility to prepare get MNIST dataset as csv\n",
	"\n",
	"### Instructions:\n",
	"\n",
	"- Download dataset from [here](http://yann.lecun.com/exdb/mnist/).\n",
	"- Extract them into `MNIST_dataset/data` dir.\n",
	"- Set the path to the files if required.\n",
	"\n",
	"Hurray you have the csv dataset. Now enjoy!"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Imports"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"import gzip\n",
	"import os"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### About MNIST Hand Written Dataset\n",
	"\n",
	"- train-images-idx3-ubyte: training set images<br>\n",
	"- train-labels-idx1-ubyte: training set labels<br>\n",
	"- t10k-images-idx3-ubyte: test set images<br>\n",
	"- t10k-labels-idx1-ubyte: test set labels<br>\n",
	"\n",
	"The training set contains 60000 examples, and the test set 10000 examples"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Setting path"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"path_to_dataset = os.getcwd()+'/MNIST_dataset/data'"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"files = os.listdir(path_to_dataset)\n",
	"files"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Helper function"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def MNISTtocsv(path, img_file, label_file, out_csv, n):\n",
	" img = open(path+'/'+img_file, \"rb\")\n",
	" label = open(path+'/'+label_file, \"rb\")\n",
	" out = open(out_csv, \"w\")\n",
	"\n",
	" img.read(16)\n",
	" label.read(8)\n",
	" images = []\n",
	"\n",
	" for i in range(n):\n",
	" image = [ord(label.read(1))]\n",
	" for j in range(28*28):\n",
	" image.append(ord(img.read(1)))\n",
	" images.append(image)\n",
	"\n",
	" for image in images:\n",
	" out.write(\",\".join(str(pix) for pix in image)+\"\\n\")\n",
	" \n",
	" img.close()\n",
	" label.close()\n",
	" out.close()\n",
	"\n",
	"MNISTtocsv(path_to_dataset, files[0], files[1], 'mnist_test.csv',10000)\n",
	"MNISTtocsv(path_to_dataset, files[2], files[3], 'mnist_train.csv', 60000)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.1"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}