Verina-Armanyous/Preclass work.ipynb

## Preclass work.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Multiprocessing deep dive\n",
    "\n",
    "So far, we have discussed several classes (i.e., Lock, Manager, Queue, etc.) from the multiprocessing library. If you are confused about when to use which one, don't worry. This exercise should help with that! You will be presented with various scenarios, and you should think about the appropriate class(es) that can be used in each scenario to return the desired output.  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import multiprocessing \n",
    "import time "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.a (estimated time 5-10 minutes)\n",
    "- Description: this code simulates a simplified bank transaction where the initial balance is set to £500, and we add and subtract £500, respectively. \n",
    "- Current behavior: the final balance should be £800, but we run this good multiple times, you get different, wrong values. \n",
    "- Task: explain why the code is behaving this way, and change the code as needed to get the expected output (£800). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "779\n"
     ]
    }
   ],
   "source": [
    "def deposit(ns):\n",
    "    for i in range(500):\n",
    "        time.sleep(0.1)\n",
    "        ns.balance += 1\n",
    "    \n",
    "def withdraw(ns):\n",
    "    for i in range(500):\n",
    "        time.sleep(0.1)\n",
    "        ns.balance -= 1 \n",
    "        \n",
    "if __name__ == '__main__':\n",
    "    manager = multiprocessing.Manager()\n",
    "    ns = manager.Namespace()\n",
    "    ns.balance = 800\n",
    "    processes = []\n",
    "    processes.append(multiprocessing.Process(target=deposit, args=(ns,)))\n",
    "    processes.append(multiprocessing.Process(target=withdraw, args=(ns,)))\n",
    "    \n",
    "    for process in processes:\n",
    "        process.start()\n",
    "    for process in processes:\n",
    "        process.join()\n",
    "    print(ns.balance) # change the code so that this prints 800 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.b (estimated time 5 - 10 minutes)\n",
    "- Description: this code simulates a simplified way to pay Minervans for their work-study hours. The expected output is a list of lists that shows how much each student should get paid. \n",
    "- Current behavior: the printed list is empty, but it should return a list of lists \n",
    "- Task: explain why the code is behaving this way, and change the code as needed to get the expected output. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[]\n"
     ]
    }
   ],
   "source": [
    "list_of_paychecks = []\n",
    "\n",
    "def payStudent(studentId, numberOfHours):\n",
    "    paycheck = numberOfHours * 20 # pay 20 for each hour \n",
    "    list_of_paychecks.append([studentId,paycheck])\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    # studentInfo[i][0] = studentId, studentInfo[i][1] = numberOfHours\n",
    "    studentInfo = [(1707, 23), (1708, 50), (1709, 43), (1710, 1), (1711, 20)]\n",
    "    with multiprocessing.Pool(5) as p:\n",
    "        # why do we use starmap here? \n",
    "        p.starmap(payStudent, studentInfo)\n",
    "    print(list_of_paychecks)\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Parallel Mapreduce (estimated time 30 minutes - 1.5 hours)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Lilly owns several stores that are in various locations around the US. Lilly wants to analyze which payment method was used the most across all her stores. Help her get an answer by using MapReduce. \n",
    "- The data file is attached in the email. The format is as follows date\\time\\store\\location\\item description\\cost\\method of payment\n",
    "- Feel free to design and code the system whatever way you see fit. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#first: write a mapper function \n",
    "def mapper_func():\n",
    "    \"\"\" \n",
    "    INPUT: Transactions of products in multiple stores and location; these can also be passed to STDIN\n",
    "            Format of each line is:  date\\time\\store\\location\\item description\\cost\\method of payment\n",
    "    OUTPUT: E.g. \n",
    "                Las Vegas       Visa\n",
    "                Miami   Cash\n",
    "                Tucson  MasterCard\n",
    "                San Francisco   Amex\n",
    "                Dallas  Amex\n",
    "                Tampa   Visa\n",
    "                Washington      Discover\n",
    "                San Jose        Amex\n",
    "                Newark  Cash\n",
    "                Memphis Cash\n",
    "                Jersey City     Discover\n",
    "                Plano   Cash\n",
    "                Buffalo MasterCard\n",
    "                Louisville      Cash\n",
    "                Miami   Discover\n",
    "                ...\n",
    "    - Feel free to design and code the system whatever way you see fit. \"\"\"\n",
    "    pass\n",
    "\n",
    "# second: write a reducer function \n",
    "def reducer_func():\n",
    "    \n",
    "    \"\"\" \n",
    "    OUTPUT: - shoud be sorted in decreasing order, so the most used method is at the top of the list  \n",
    "            E.g.\n",
    "                Cash  5\n",
    "                Amex  3\n",
    "                Discover  3\n",
    "                MasterCard  2\n",
    "                Visa     2\n",
    "  \n",
    "    \"\"\"\n",
    "    pass \n",
    "\n",
    "# third: change mapper and/or reducer function(s) to parallelize the process using the multiprocessing library\n",
    "\n",
    "# fourth: Can you make the code fault-tolerant? If one of the processes gets killed, how can you make sure\n",
    "# that the task gets reassigned? "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 1. Multiprocessing deep dive\n",
	"\n",
	"So far, we have discussed several classes (i.e., Lock, Manager, Queue, etc.) from the multiprocessing library. If you are confused about when to use which one, don't worry. This exercise should help with that! You will be presented with various scenarios, and you should think about the appropriate class(es) that can be used in each scenario to return the desired output. \n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"import multiprocessing \n",
	"import time "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### 1.a (estimated time 5-10 minutes)\n",
	"- Description: this code simulates a simplified bank transaction where the initial balance is set to £500, and we add and subtract £500, respectively. \n",
	"- Current behavior: the final balance should be £800, but we run this good multiple times, you get different, wrong values. \n",
	"- Task: explain why the code is behaving this way, and change the code as needed to get the expected output (£800). "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"779\n"
	]
	}
	],
	"source": [
	"def deposit(ns):\n",
	" for i in range(500):\n",
	" time.sleep(0.1)\n",
	" ns.balance += 1\n",
	" \n",
	"def withdraw(ns):\n",
	" for i in range(500):\n",
	" time.sleep(0.1)\n",
	" ns.balance -= 1 \n",
	" \n",
	"if __name__ == '__main__':\n",
	" manager = multiprocessing.Manager()\n",
	" ns = manager.Namespace()\n",
	" ns.balance = 800\n",
	" processes = []\n",
	" processes.append(multiprocessing.Process(target=deposit, args=(ns,)))\n",
	" processes.append(multiprocessing.Process(target=withdraw, args=(ns,)))\n",
	" \n",
	" for process in processes:\n",
	" process.start()\n",
	" for process in processes:\n",
	" process.join()\n",
	" print(ns.balance) # change the code so that this prints 800 "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### 1.b (estimated time 5 - 10 minutes)\n",
	"- Description: this code simulates a simplified way to pay Minervans for their work-study hours. The expected output is a list of lists that shows how much each student should get paid. \n",
	"- Current behavior: the printed list is empty, but it should return a list of lists \n",
	"- Task: explain why the code is behaving this way, and change the code as needed to get the expected output. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[]\n"
	]
	}
	],
	"source": [
	"list_of_paychecks = []\n",
	"\n",
	"def payStudent(studentId, numberOfHours):\n",
	" paycheck = numberOfHours * 20 # pay 20 for each hour \n",
	" list_of_paychecks.append([studentId,paycheck])\n",
	"\n",
	"if __name__ == '__main__':\n",
	" # studentInfo[i][0] = studentId, studentInfo[i][1] = numberOfHours\n",
	" studentInfo = [(1707, 23), (1708, 50), (1709, 43), (1710, 1), (1711, 20)]\n",
	" with multiprocessing.Pool(5) as p:\n",
	" # why do we use starmap here? \n",
	" p.starmap(payStudent, studentInfo)\n",
	" print(list_of_paychecks)\n",
	" "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 2. Parallel Mapreduce (estimated time 30 minutes - 1.5 hours)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"- Lilly owns several stores that are in various locations around the US. Lilly wants to analyze which payment method was used the most across all her stores. Help her get an answer by using MapReduce. \n",
	"- The data file is attached in the email. The format is as follows date\\time\\store\\location\\item description\\cost\\method of payment\n",
	"- Feel free to design and code the system whatever way you see fit. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [],
	"source": [
	"#first: write a mapper function \n",
	"def mapper_func():\n",
	" \"\"\" \n",
	" INPUT: Transactions of products in multiple stores and location; these can also be passed to STDIN\n",
	" Format of each line is: date\\time\\store\\location\\item description\\cost\\method of payment\n",
	" OUTPUT: E.g. \n",
	" Las Vegas Visa\n",
	" Miami Cash\n",
	" Tucson MasterCard\n",
	" San Francisco Amex\n",
	" Dallas Amex\n",
	" Tampa Visa\n",
	" Washington Discover\n",
	" San Jose Amex\n",
	" Newark Cash\n",
	" Memphis Cash\n",
	" Jersey City Discover\n",
	" Plano Cash\n",
	" Buffalo MasterCard\n",
	" Louisville Cash\n",
	" Miami Discover\n",
	" ...\n",
	" - Feel free to design and code the system whatever way you see fit. \"\"\"\n",
	" pass\n",
	"\n",
	"# second: write a reducer function \n",
	"def reducer_func():\n",
	" \n",
	" \"\"\" \n",
	" OUTPUT: - shoud be sorted in decreasing order, so the most used method is at the top of the list \n",
	" E.g.\n",
	" Cash 5\n",
	" Amex 3\n",
	" Discover 3\n",
	" MasterCard 2\n",
	" Visa 2\n",
	" \n",
	" \"\"\"\n",
	" pass \n",
	"\n",
	"# third: change mapper and/or reducer function(s) to parallelize the process using the multiprocessing library\n",
	"\n",
	"# fourth: Can you make the code fault-tolerant? If one of the processes gets killed, how can you make sure\n",
	"# that the task gets reassigned? "
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.7"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}