cgawron/HC2020_practice_round.ipynb

## HC2020_practice_round.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Problem background\n",
    "\n",
    "Let $n$ denote the number of pizzas, $w_i$ the number of slices of pizza $i$ and $c$ the number of slices we want to order.\n",
    "\n",
    "Then we want to solve the following optimization problem:\n",
    "\n",
    "Maximize\n",
    "$$\n",
    " z = \\sum_{j=1}^n w_jx_j\n",
    "$$\n",
    "subject to\n",
    "$$ \n",
    " \\sum_{j=1}^n w_jx_j \\leq c \n",
    "$$\n",
    "\n",
    "$$\n",
    "x_j \\in \\{0, 1\\}, j \\in \\{1, \\dots, n\\}\n",
    "$$\n",
    "\n",
    "This problem is known as the *subset sum problem* (SSP), a special case of the *knapsack problem*. Both the SSP and the knapsack problem are well studied, see for example [Knapsack Problems Algorithms and Computer Implementations](http://www.or.deis.unibo.it/kp/KnapsackProblems.pdf).\n",
    "\n",
    "## Exact solutions\n",
    "\n",
    "Using *dynamic programming* – that is by calculating solutions of sub problems with fewer pizzas and lower bounds – the \n",
    "problem can be solved in time and space complexity of $O(nc)$.\n",
    "\n",
    "For the larger instances, however, this approach is infeasible: We do not have enough memory!\n",
    "\n",
    "## Approximations\n",
    "\n",
    "An obvious approach is to use the greedy algorithm: Start chosing large pizzas as long as we keep below the bound $c$.\n",
    "\n",
    "## Hybrid approach\n",
    "\n",
    "In the course of the greedy algorithm, the remaining capacity and the number of remaining pizzas to choose from is decreasing. Once these numbers are small enough, we can use dynamic programming to find the exact solution for the small \n",
    "problem. \n",
    "\n",
    "It turns out that this leeds to optimal solutions, i.e. we can order exactly $c$ slices for all but the example input (for which it is not hard to see the 16 slices is the best possible result)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 196,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from collections import defaultdict, namedtuple\n",
    "from tqdm.autonotebook import tqdm\n",
    "from os import path\n",
    "\n",
    "class HC2020(dict):\n",
    "    \"\"\"Solve the practice problem of Hash Coce 2020\"\"\"\n",
    "    \n",
    "    class record:\n",
    "        def __init__(self):\n",
    "            self.selection = []\n",
    "    \n",
    "    \n",
    "    Data = namedtuple('Data', ['target', 'slices'])\n",
    "    \n",
    "    IN = \"in\"\n",
    "    OUT = \"out\"\n",
    "    FILES = ( 'a_example.in', 'b_small.in', 'c_medium.in', 'd_quite_big.in',  'e_also_big.in' )\n",
    "    \n",
    "    def __init__(self):\n",
    "        \"\"\"Initialize data structures and read input files.\"\"\"\n",
    "        for file in self.FILES:\n",
    "            self[file] = self.record()\n",
    "            data = self.Data(*self.load_file(file))\n",
    "            self[file].data = data\n",
    "            \n",
    "    def __repr__(self):\n",
    "        return str({ file: self[file].selection for file in self.FILES })\n",
    "            \n",
    "    def load_file(self, file):\n",
    "        with open(path.join(self.IN, file), 'r') as input:\n",
    "            (capacity, count) = map(int, input.readline().split())\n",
    "            slices = tuple(map(int, input.readline().split()))\n",
    "                \n",
    "            # print(f\"file: {file}\\ntarget: {capacity}\\nslices: {slices}\\n\")\n",
    "            return (capacity, slices)\n",
    "        \n",
    "    def save_file(self, file):\n",
    "        with open(path.join(self.OUT, file), 'w') as output:\n",
    "            selection = self[file].selection\n",
    "            output.write(f\"{len(selection)}\\n\")\n",
    "            for s in selection:\n",
    "                output.write(f\"{s} \")\n",
    "                \n",
    "    def optimize(self, file, dp_limit=10000):\n",
    "        \"\"\"Optimize pizza selection with a hybrid approach.\n",
    "        \n",
    "        Start with a greedy approach until the remaining problem size is small enough for \n",
    "        an exact solution with dynamic programming.\n",
    "        \"\"\"\n",
    "        \n",
    "        weights = list(self[file].data.slices)\n",
    "        bound = self[file].data.target\n",
    "        select = []\n",
    "        score = 0\n",
    "        \n",
    "        for i in reversed(range(len(weights))):\n",
    "            if bound - score < dp_limit:\n",
    "                _select, _score = self.ssp_dp(weights[:i+1], bound-score)\n",
    "                # print(f\"dp sub-score: {_score} {bound-score} {bound-score-_score} {_select}\")\n",
    "                score += _score\n",
    "                select += _select\n",
    "                break\n",
    "                \n",
    "            if score + weights[i] < bound:\n",
    "                score += weights[i]\n",
    "                select.append(i)\n",
    "        \n",
    "        print(f\"result: {score} {bound} {bound-score}\")\n",
    "        return select, score\n",
    "    \n",
    "    def ssp_dp(self, weights, bound):\n",
    "        \"\"\"Solve subset sum problem with dynamic programming\n",
    "        \n",
    "        See http://www.or.deis.unibo.it/kp/Chapter4.pdf\n",
    "        \"\"\"\n",
    "\n",
    "        n = len(weights)\n",
    "        r = np.zeros((n, bound+1), dtype=int)\n",
    "        \n",
    "        # fill matrix with partial solutions, i.e. with reduced bound and number of weights\n",
    "        #   first dimension: maximum index of weights included\n",
    "        #   second dimension: bound of sub problem\n",
    "        # we need a matrix of size len(weights) * bound for this - way too much for the large instances\n",
    "        for j in range(bound + 1):\n",
    "            if weights[0] <= j:\n",
    "                r[0][j] = weights[0]\n",
    "        \n",
    "        for i in range(1, n):\n",
    "            for j in range(bound + 1):\n",
    "                if weights[i] <= j:\n",
    "                    r[i][j] = max(weights[i] + r[i-1][j - weights[i]], r[i-1][j])\n",
    "                else:\n",
    "                    r[i][j] = r[i-1][j]\n",
    "        \n",
    "        # Now use backtracking to get the selected weights\n",
    "        select = []\n",
    "        # The optimum value is r[n-1][bound]\n",
    "        # Now we track the \"calculation path\", i.e. the cases where the first term \n",
    "        # in the max(...) was selected.\n",
    "        j = r[n-1][bound]\n",
    "        for i in reversed(range(n)):\n",
    "            if weights[i] <= j and weights[i] + r[i-1][j - weights[i]] > r[i-1][j]:\n",
    "                j -= weights[i]\n",
    "                select.append(i)\n",
    "        if weights[0] <= j:\n",
    "            select.append(0)\n",
    "      \n",
    "        return select, r[n-1][bound]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 197,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "result: 16 17 1\n",
      "result: 100 100 0\n",
      "result: 4500 4500 0\n",
      "result: 999999725 1000000000 275\n",
      "result: 505000000 505000000 0\n"
     ]
    }
   ],
   "source": [
    "hc = HC2020()\n",
    "    \n",
    "for file in hc.FILES:\n",
    "    selection, score = hc.optimize(file)\n",
    "    hc[file].selection = selection\n",
    "    hc[file].score = score\n",
    "    hc.save_file(file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For file d, we need to stop the greedy algorithm earlier to find an optimal solition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 199,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "result: 1000000000 1000000000 0\n"
     ]
    }
   ],
   "source": [
    "file = 'd_quite_big.in'\n",
    "selection, score = hc.optimize(file, 200000)\n",
    "hc[file].selection = selection\n",
    "hc[file].score = score\n",
    "hc.save_file(file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 194,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a_example.in: 16\n",
      "b_small.in: 100\n",
      "c_medium.in: 4500\n",
      "d_quite_big.in: 1000000000\n",
      "e_also_big.in: 505000000\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "1505004616"
      ]
     },
     "execution_count": 194,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "for file in hc.FILES:\n",
    "    print(f\"{file}: {hc[file].score}\")\n",
    "    \n",
    "sum([ hc[file].score for file in hc.FILES ])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Except for the example file – where 16 is the best possible result – we have found solutions with the maximum number of slices allowed. \n",
    "\n",
    "We are done!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Problem background\n",
	"\n",
	"Let $n$ denote the number of pizzas, $w_i$ the number of slices of pizza $i$ and $c$ the number of slices we want to order.\n",
	"\n",
	"Then we want to solve the following optimization problem:\n",
	"\n",
	"Maximize\n",
	"$$\n",
	" z = \\sum_{j=1}^n w_jx_j\n",
	"$$\n",
	"subject to\n",
	"$$ \n",
	" \\sum_{j=1}^n w_jx_j \\leq c \n",
	"$$\n",
	"\n",
	"$$\n",
	"x_j \\in \\{0, 1\\}, j \\in \\{1, \\dots, n\\}\n",
	"$$\n",
	"\n",
	"This problem is known as the subset sum problem (SSP), a special case of the knapsack problem. Both the SSP and the knapsack problem are well studied, see for example [Knapsack Problems Algorithms and Computer Implementations](http://www.or.deis.unibo.it/kp/KnapsackProblems.pdf).\n",
	"\n",
	"## Exact solutions\n",
	"\n",
	"Using dynamic programming – that is by calculating solutions of sub problems with fewer pizzas and lower bounds – the \n",
	"problem can be solved in time and space complexity of $O(nc)$.\n",
	"\n",
	"For the larger instances, however, this approach is infeasible: We do not have enough memory!\n",
	"\n",
	"## Approximations\n",
	"\n",
	"An obvious approach is to use the greedy algorithm: Start chosing large pizzas as long as we keep below the bound $c$.\n",
	"\n",
	"## Hybrid approach\n",
	"\n",
	"In the course of the greedy algorithm, the remaining capacity and the number of remaining pizzas to choose from is decreasing. Once these numbers are small enough, we can use dynamic programming to find the exact solution for the small \n",
	"problem. \n",
	"\n",
	"It turns out that this leeds to optimal solutions, i.e. we can order exactly $c$ slices for all but the example input (for which it is not hard to see the 16 slices is the best possible result)."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 196,
	"metadata": {},
	"outputs": [],
	"source": [
	"import numpy as np\n",
	"from collections import defaultdict, namedtuple\n",
	"from tqdm.autonotebook import tqdm\n",
	"from os import path\n",
	"\n",
	"class HC2020(dict):\n",
	" \"\"\"Solve the practice problem of Hash Coce 2020\"\"\"\n",
	" \n",
	" class record:\n",
	" def __init__(self):\n",
	" self.selection = []\n",
	" \n",
	" \n",
	" Data = namedtuple('Data', ['target', 'slices'])\n",
	" \n",
	" IN = \"in\"\n",
	" OUT = \"out\"\n",
	" FILES = ( 'a_example.in', 'b_small.in', 'c_medium.in', 'd_quite_big.in', 'e_also_big.in' )\n",
	" \n",
	" def __init__(self):\n",
	" \"\"\"Initialize data structures and read input files.\"\"\"\n",
	" for file in self.FILES:\n",
	" self[file] = self.record()\n",
	" data = self.Data(*self.load_file(file))\n",
	" self[file].data = data\n",
	" \n",
	" def __repr__(self):\n",
	" return str({ file: self[file].selection for file in self.FILES })\n",
	" \n",
	" def load_file(self, file):\n",
	" with open(path.join(self.IN, file), 'r') as input:\n",
	" (capacity, count) = map(int, input.readline().split())\n",
	" slices = tuple(map(int, input.readline().split()))\n",
	" \n",
	" # print(f\"file: {file}\\ntarget: {capacity}\\nslices: {slices}\\n\")\n",
	" return (capacity, slices)\n",
	" \n",
	" def save_file(self, file):\n",
	" with open(path.join(self.OUT, file), 'w') as output:\n",
	" selection = self[file].selection\n",
	" output.write(f\"{len(selection)}\\n\")\n",
	" for s in selection:\n",
	" output.write(f\"{s} \")\n",
	" \n",
	" def optimize(self, file, dp_limit=10000):\n",
	" \"\"\"Optimize pizza selection with a hybrid approach.\n",
	" \n",
	" Start with a greedy approach until the remaining problem size is small enough for \n",
	" an exact solution with dynamic programming.\n",
	" \"\"\"\n",
	" \n",
	" weights = list(self[file].data.slices)\n",
	" bound = self[file].data.target\n",
	" select = []\n",
	" score = 0\n",
	" \n",
	" for i in reversed(range(len(weights))):\n",
	" if bound - score < dp_limit:\n",
	" _select, _score = self.ssp_dp(weights[:i+1], bound-score)\n",
	" # print(f\"dp sub-score: {_score} {bound-score} {bound-score-_score} {_select}\")\n",
	" score += _score\n",
	" select += _select\n",
	" break\n",
	" \n",
	" if score + weights[i] < bound:\n",
	" score += weights[i]\n",
	" select.append(i)\n",
	" \n",
	" print(f\"result: {score} {bound} {bound-score}\")\n",
	" return select, score\n",
	" \n",
	" def ssp_dp(self, weights, bound):\n",
	" \"\"\"Solve subset sum problem with dynamic programming\n",
	" \n",
	" See http://www.or.deis.unibo.it/kp/Chapter4.pdf\n",
	" \"\"\"\n",
	"\n",
	" n = len(weights)\n",
	" r = np.zeros((n, bound+1), dtype=int)\n",
	" \n",
	" # fill matrix with partial solutions, i.e. with reduced bound and number of weights\n",
	" # first dimension: maximum index of weights included\n",
	" # second dimension: bound of sub problem\n",
	" # we need a matrix of size len(weights) * bound for this - way too much for the large instances\n",
	" for j in range(bound + 1):\n",
	" if weights[0] <= j:\n",
	" r[0][j] = weights[0]\n",
	" \n",
	" for i in range(1, n):\n",
	" for j in range(bound + 1):\n",
	" if weights[i] <= j:\n",
	" r[i][j] = max(weights[i] + r[i-1][j - weights[i]], r[i-1][j])\n",
	" else:\n",
	" r[i][j] = r[i-1][j]\n",
	" \n",
	" # Now use backtracking to get the selected weights\n",
	" select = []\n",
	" # The optimum value is r[n-1][bound]\n",
	" # Now we track the \"calculation path\", i.e. the cases where the first term \n",
	" # in the max(...) was selected.\n",
	" j = r[n-1][bound]\n",
	" for i in reversed(range(n)):\n",
	" if weights[i] <= j and weights[i] + r[i-1][j - weights[i]] > r[i-1][j]:\n",
	" j -= weights[i]\n",
	" select.append(i)\n",
	" if weights[0] <= j:\n",
	" select.append(0)\n",
	" \n",
	" return select, r[n-1][bound]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 197,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"result: 16 17 1\n",
	"result: 100 100 0\n",
	"result: 4500 4500 0\n",
	"result: 999999725 1000000000 275\n",
	"result: 505000000 505000000 0\n"
	]
	}
	],
	"source": [
	"hc = HC2020()\n",
	" \n",
	"for file in hc.FILES:\n",
	" selection, score = hc.optimize(file)\n",
	" hc[file].selection = selection\n",
	" hc[file].score = score\n",
	" hc.save_file(file)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"For file d, we need to stop the greedy algorithm earlier to find an optimal solition."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 199,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"result: 1000000000 1000000000 0\n"
	]
	}
	],
	"source": [
	"file = 'd_quite_big.in'\n",
	"selection, score = hc.optimize(file, 200000)\n",
	"hc[file].selection = selection\n",
	"hc[file].score = score\n",
	"hc.save_file(file)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 194,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"a_example.in: 16\n",
	"b_small.in: 100\n",
	"c_medium.in: 4500\n",
	"d_quite_big.in: 1000000000\n",
	"e_also_big.in: 505000000\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"1505004616"
	]
	},
	"execution_count": 194,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"for file in hc.FILES:\n",
	" print(f\"{file}: {hc[file].score}\")\n",
	" \n",
	"sum([ hc[file].score for file in hc.FILES ])"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Except for the example file – where 16 is the best possible result – we have found solutions with the maximum number of slices allowed. \n",
	"\n",
	"We are done!"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.3"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}