DiogenesAnalytics/convolutional_autoencoder.ipynb

## convolutional_autoencoder.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1b2cf03f-8eaf-4adf-80ab-0d5d5c661deb",
   "metadata": {},
   "source": [
    "# Building Autoencoders in Keras\n",
    "The following `Jupyter Notebook` has been *adapted* from the [Keras blog article](https://blog.keras.io/building-autoencoders-in-keras.html) written by *F. Chollet* on [autoencoders](https://en.wikipedia.org/wiki/Autoencoder)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06d935a8-5cb4-466b-ad97-aedf12265338",
   "metadata": {},
   "source": [
    "## Convolutional Autoencoder\n",
    "Since our inputs are images, it makes sense to use convolutional neural networks (convnets) as encoders and decoders. In practical settings, autoencoders applied to images are always convolutional autoencoders --they simply perform much better.\n",
    "\n",
    "Let's implement one. The encoder will consist in a stack of Conv2D and MaxPooling2D layers (max pooling being used for spatial down-sampling), while the decoder will consist in a stack of Conv2D and UpSampling2D layers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5f302f4-45bc-482c-8016-63a0e698d6f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# get initial libs\n",
    "import keras\n",
    "from keras import layers\n",
    "from keras.datasets import mnist\n",
    "import numpy as np\n",
    "import visualization.image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e743e762-1c41-4b32-b6db-a63db7c25c7c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "input_img = keras.Input(shape=(28, 28, 1))\n",
    "\n",
    "x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)\n",
    "x = layers.MaxPooling2D((2, 2), padding='same')(x)\n",
    "x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)\n",
    "x = layers.MaxPooling2D((2, 2), padding='same')(x)\n",
    "x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)\n",
    "encoded = layers.MaxPooling2D((2, 2), padding='same')(x)\n",
    "\n",
    "# at this point the representation is (4, 4, 8) i.e. 128-dimensional\n",
    "x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)\n",
    "x = layers.UpSampling2D((2, 2))(x)\n",
    "x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)\n",
    "x = layers.UpSampling2D((2, 2))(x)\n",
    "x = layers.Conv2D(16, (3, 3), activation='relu')(x)\n",
    "x = layers.UpSampling2D((2, 2))(x)\n",
    "decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)\n",
    "\n",
    "autoencoder = keras.Model(input_img, decoded)\n",
    "autoencoder.compile(optimizer='adam', loss='binary_crossentropy')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58b56c97-6532-41a2-96bc-a8f589b20a84",
   "metadata": {},
   "source": [
    "To train it, we will use the original MNIST digits with shape (samples, 3, 28, 28), and we will just normalize pixel values between 0 and 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96970532-a0b4-40f7-8225-a770bff0d42b",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "(x_train, _), (x_test, _) = mnist.load_data()\n",
    "\n",
    "x_train = x_train.astype('float32') / 255.\n",
    "x_test = x_test.astype('float32') / 255.\n",
    "x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))\n",
    "x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c10a00e-73ad-469e-869b-85f902d09533",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "autoencoder.fit(x_train, x_train,\n",
    "                epochs=50,\n",
    "                batch_size=128,\n",
    "                shuffle=True,\n",
    "                validation_data=(x_test, x_test))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b7e1872-ad71-4d3b-bbd9-b9b6efb09982",
   "metadata": {},
   "outputs": [],
   "source": [
    "visualization.image.compare_results(x_test, autoencoder.predict(x_test));"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1955fa8-c2d9-42dc-98b8-84e82616dc9f",
   "metadata": {},
   "source": [
    "The model converges to a loss of 0.094, significantly better than our previous models (this is in large part due to the higher entropic capacity of the encoded representation, 128 dimensions vs. 32 previously). "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

## deep_autoencoder.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              deep_autoencoder.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## minimal_autoencoder.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              minimal_autoencoder.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## variational_autoencoder.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              variational_autoencoder.ipynb
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "1b2cf03f-8eaf-4adf-80ab-0d5d5c661deb",
	"metadata": {},
	"source": [
	"# Building Autoencoders in Keras\n",
	"The following `Jupyter Notebook` has been adapted from the [Keras blog article](https://blog.keras.io/building-autoencoders-in-keras.html) written by F. Chollet on [autoencoders](https://en.wikipedia.org/wiki/Autoencoder)."
	]
	},
	{
	"cell_type": "markdown",
	"id": "06d935a8-5cb4-466b-ad97-aedf12265338",
	"metadata": {},
	"source": [
	"## Convolutional Autoencoder\n",
	"Since our inputs are images, it makes sense to use convolutional neural networks (convnets) as encoders and decoders. In practical settings, autoencoders applied to images are always convolutional autoencoders --they simply perform much better.\n",
	"\n",
	"Let's implement one. The encoder will consist in a stack of Conv2D and MaxPooling2D layers (max pooling being used for spatial down-sampling), while the decoder will consist in a stack of Conv2D and UpSampling2D layers."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "b5f302f4-45bc-482c-8016-63a0e698d6f6",
	"metadata": {},
	"outputs": [],
	"source": [
	"# get initial libs\n",
	"import keras\n",
	"from keras import layers\n",
	"from keras.datasets import mnist\n",
	"import numpy as np\n",
	"import visualization.image"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "e743e762-1c41-4b32-b6db-a63db7c25c7c",
	"metadata": {
	"tags": []
	},
	"outputs": [],
	"source": [
	"input_img = keras.Input(shape=(28, 28, 1))\n",
	"\n",
	"x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)\n",
	"x = layers.MaxPooling2D((2, 2), padding='same')(x)\n",
	"x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)\n",
	"x = layers.MaxPooling2D((2, 2), padding='same')(x)\n",
	"x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)\n",
	"encoded = layers.MaxPooling2D((2, 2), padding='same')(x)\n",
	"\n",
	"# at this point the representation is (4, 4, 8) i.e. 128-dimensional\n",
	"x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)\n",
	"x = layers.UpSampling2D((2, 2))(x)\n",
	"x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)\n",
	"x = layers.UpSampling2D((2, 2))(x)\n",
	"x = layers.Conv2D(16, (3, 3), activation='relu')(x)\n",
	"x = layers.UpSampling2D((2, 2))(x)\n",
	"decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)\n",
	"\n",
	"autoencoder = keras.Model(input_img, decoded)\n",
	"autoencoder.compile(optimizer='adam', loss='binary_crossentropy')"
	]
	},
	{
	"cell_type": "markdown",
	"id": "58b56c97-6532-41a2-96bc-a8f589b20a84",
	"metadata": {},
	"source": [
	"To train it, we will use the original MNIST digits with shape (samples, 3, 28, 28), and we will just normalize pixel values between 0 and 1."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "96970532-a0b4-40f7-8225-a770bff0d42b",
	"metadata": {
	"tags": []
	},
	"outputs": [],
	"source": [
	"(x_train, _), (x_test, _) = mnist.load_data()\n",
	"\n",
	"x_train = x_train.astype('float32') / 255.\n",
	"x_test = x_test.astype('float32') / 255.\n",
	"x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))\n",
	"x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "2c10a00e-73ad-469e-869b-85f902d09533",
	"metadata": {
	"tags": []
	},
	"outputs": [],
	"source": [
	"autoencoder.fit(x_train, x_train,\n",
	" epochs=50,\n",
	" batch_size=128,\n",
	" shuffle=True,\n",
	" validation_data=(x_test, x_test))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "9b7e1872-ad71-4d3b-bbd9-b9b6efb09982",
	"metadata": {},
	"outputs": [],
	"source": [
	"visualization.image.compare_results(x_test, autoencoder.predict(x_test));"
	]
	},
	{
	"cell_type": "markdown",
	"id": "a1955fa8-c2d9-42dc-98b8-84e82616dc9f",
	"metadata": {},
	"source": [
	"The model converges to a loss of 0.094, significantly better than our previous models (this is in large part due to the higher entropic capacity of the encoded representation, 128 dimensions vs. 32 previously). "
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3 (ipykernel)",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.10.11"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}