erichannell/10M

## 10M
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are working with Comma-separated value (CSV) files, so we'll need that library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's iterate over the rows in the file to see what data we find."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Password              Username            \n",
      "c4$h                  money               \n",
      "this                  that                \n",
      "$ynbe$21              vhang_15            \n",
      "$YBWVdau              potato              \n",
      "car$                  car                 \n",
      "b0bb0b                bob                 \n",
      "Gaiu$                 Baltar              \n",
      "$tarbuck              BillAdama           \n",
      "Caprica               LauraRoslin         \n",
      "five                  number6             \n"
     ]
    }
   ],
   "source": [
    "with open(\"10M.csv\", \"r\") as inf:\n",
    "    reader = csv.reader(inf)\n",
    "    reader.next()\n",
    "    print '{:<20}  {:<20}'.format('Password', 'Username')\n",
    "    for i in range(10):\n",
    "        data = reader.next()\n",
    "        print '{:<20}  {:<20}'.format(data[0], data[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great, but what about augmenting this very simple data with something else. How about the length of the password & username."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pass Len    Password              User Len    Username            \n",
      "4           c4$h                  5           money               \n",
      "4           this                  4           that                \n",
      "8           $ynbe$21              8           vhang_15            \n",
      "8           $YBWVdau              6           potato              \n",
      "4           car$                  3           car                 \n",
      "6           b0bb0b                3           bob                 \n",
      "5           Gaiu$                 6           Baltar              \n",
      "8           $tarbuck              9           BillAdama           \n",
      "7           Caprica               11          LauraRoslin         \n",
      "4           five                  7           number6             \n"
     ]
    }
   ],
   "source": [
    "with open(\"10M.csv\", \"r\") as inf:\n",
    "    reader = csv.reader(inf)\n",
    "    reader.next()\n",
    "    print '{:<10}  {:<20}  {:<10}  {:<20}'.format('Pass Len', 'Password', 'User Len', 'Username')\n",
    "    for i in range(10):\n",
    "        data = reader.next()\n",
    "        pwd = data[0]\n",
    "        usr = data[1]\n",
    "        pwd_len = len(pwd)\n",
    "        usr_len = len(usr)\n",
    "        print '{:<10}  {:<20}  {:<10}  {:<20}'.format(pwd_len, pwd, usr_len, usr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Perfect, but what else can we do? Unfortunately it looks like some people pick some really bad passwords. Let's see if we can tag the passwords that are simply English-language words. We'll use a library called \"enchant\" to do that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import enchant"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's check each password and username to see if it is in the English-language dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pass Len    Word?  Password         User Len    Word?  Username            \n",
      "4           0      c4$h             5           1      money               \n",
      "4           1      this             4           1      that                \n",
      "8           0      $ynbe$21         8           0      vhang_15            \n",
      "8           0      $YBWVdau         6           1      potato              \n",
      "4           0      car$             3           1      car                 \n",
      "6           0      b0bb0b           3           1      bob                 \n",
      "5           0      Gaiu$            6           0      Baltar              \n",
      "8           0      $tarbuck         9           0      BillAdama           \n",
      "7           0      Caprica          11          0      LauraRoslin         \n",
      "4           1      five             7           0      number6             \n"
     ]
    }
   ],
   "source": [
    "d = enchant.Dict(\"en_US\")\n",
    "\n",
    "with open(\"10M.csv\", \"r\") as inf:\n",
    "    reader = csv.reader(inf)\n",
    "    reader.next()\n",
    "    print '{:<10}  {:<5}  {:<15}  {:<10}  {:<5}  {:<20}'.format('Pass Len', 'Word?', 'Password', 'User Len', 'Word?', 'Username')\n",
    "    for i in range(10):\n",
    "        data = reader.next()\n",
    "        pwd = data[0]\n",
    "        usr = data[1]\n",
    "        pwd_len = len(pwd)\n",
    "        usr_len = len(usr)\n",
    "        pwd_word = d.check(pwd)\n",
    "        usr_word = d.check(usr)\n",
    "        print '{:<10}  {:<5}  {:<15}  {:<10}  {:<5}  {:<20}'.format(pwd_len, pwd_word, pwd, usr_len, usr_word, usr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "OK, some passwords are terrible, but some look pretty strong. Let's calculate the strength of each password using a library called \"passwordmeter\" which returns a score between 0 (terrible password) and 1 (incredibly strong password).\n",
    "\n",
    "Let's also check to see if the username string appears in the password string (not a great idea)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pass Len    Word?  Password         Strength    user in pwd?     User Len    Word?  Username            \n",
      "4           0      c4$h             0.36             0           5           1      money               \n",
      "4           1      this             0.1              0           4           1      that                \n",
      "8           0      $ynbe$21         0.4              0           8           0      vhang_15            \n",
      "8           0      $YBWVdau         0.48             0           6           1      potato              \n",
      "4           0      car$             0.24             1           3           1      car                 \n",
      "6           0      b0bb0b           0.24             0           3           1      bob                 \n",
      "5           0      Gaiu$            0.44             0           6           0      Baltar              \n",
      "8           0      $tarbuck         0.26             0           9           0      BillAdama           \n",
      "7           0      Caprica          0.35             0           11          0      LauraRoslin         \n",
      "4           1      five             0.1              0           7           0      number6             \n"
     ]
    }
   ],
   "source": [
    "import passwordmeter\n",
    "\n",
    "d = enchant.Dict(\"en_US\")\n",
    "\n",
    "with open(\"10M.csv\", \"r\") as inf:\n",
    "    reader = csv.reader(inf)\n",
    "    reader.next()\n",
    "    print '{:<10}  {:<5}  {:<15}  {:<10}  {:<15}  {:<10}  {:<5}  {:<20}'.format(\n",
    "        'Pass Len', 'Word?', 'Password', 'Strength', 'user in pwd?', 'User Len', 'Word?', 'Username')\n",
    "    for i in range(10):\n",
    "        data = reader.next()\n",
    "        pwd = data[0]\n",
    "        usr = data[1]\n",
    "        pwd_len = len(pwd)\n",
    "        usr_len = len(usr)\n",
    "        pwd_word = d.check(pwd)\n",
    "        usr_word = d.check(usr)\n",
    "        usr_in_pwd = pwd.find(usr) != -1\n",
    "        pwd_strength = round(passwordmeter.test(pwd)[0],2)\n",
    "        print '{:<10}  {:<5}  {:<15}  {:<15}  {:<10}  {:<10}  {:<5}  {:<20}'.format(\n",
    "            pwd_len, pwd_word, pwd, pwd_strength, usr_in_pwd, usr_len, usr_word, usr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now... to run this on 10M rows of data."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We are working with Comma-separated value (CSV) files, so we'll need that library."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import csv"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now, let's iterate over the rows in the file to see what data we find."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Password Username \n",
	"c4$h money \n",
	"this that \n",
	"$ynbe$21 vhang_15 \n",
	"$YBWVdau potato \n",
	"car$ car \n",
	"b0bb0b bob \n",
	"Gaiu$ Baltar \n",
	"$tarbuck BillAdama \n",
	"Caprica LauraRoslin \n",
	"five number6 \n"
	]
	}
	],
	"source": [
	"with open(\"10M.csv\", \"r\") as inf:\n",
	" reader = csv.reader(inf)\n",
	" reader.next()\n",
	" print '{:<20} {:<20}'.format('Password', 'Username')\n",
	" for i in range(10):\n",
	" data = reader.next()\n",
	" print '{:<20} {:<20}'.format(data[0], data[1])"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Great, but what about augmenting this very simple data with something else. How about the length of the password & username."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 19,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Pass Len Password User Len Username \n",
	"4 c4$h 5 money \n",
	"4 this 4 that \n",
	"8 $ynbe$21 8 vhang_15 \n",
	"8 $YBWVdau 6 potato \n",
	"4 car$ 3 car \n",
	"6 b0bb0b 3 bob \n",
	"5 Gaiu$ 6 Baltar \n",
	"8 $tarbuck 9 BillAdama \n",
	"7 Caprica 11 LauraRoslin \n",
	"4 five 7 number6 \n"
	]
	}
	],
	"source": [
	"with open(\"10M.csv\", \"r\") as inf:\n",
	" reader = csv.reader(inf)\n",
	" reader.next()\n",
	" print '{:<10} {:<20} {:<10} {:<20}'.format('Pass Len', 'Password', 'User Len', 'Username')\n",
	" for i in range(10):\n",
	" data = reader.next()\n",
	" pwd = data[0]\n",
	" usr = data[1]\n",
	" pwd_len = len(pwd)\n",
	" usr_len = len(usr)\n",
	" print '{:<10} {:<20} {:<10} {:<20}'.format(pwd_len, pwd, usr_len, usr)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Perfect, but what else can we do? Unfortunately it looks like some people pick some really bad passwords. Let's see if we can tag the passwords that are simply English-language words. We'll use a library called \"enchant\" to do that."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 20,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import enchant"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now, let's check each password and username to see if it is in the English-language dictionary."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 21,
	"metadata": {
	"collapsed": false,
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Pass Len Word? Password User Len Word? Username \n",
	"4 0 c4$h 5 1 money \n",
	"4 1 this 4 1 that \n",
	"8 0 $ynbe$21 8 0 vhang_15 \n",
	"8 0 $YBWVdau 6 1 potato \n",
	"4 0 car$ 3 1 car \n",
	"6 0 b0bb0b 3 1 bob \n",
	"5 0 Gaiu$ 6 0 Baltar \n",
	"8 0 $tarbuck 9 0 BillAdama \n",
	"7 0 Caprica 11 0 LauraRoslin \n",
	"4 1 five 7 0 number6 \n"
	]
	}
	],
	"source": [
	"d = enchant.Dict(\"en_US\")\n",
	"\n",
	"with open(\"10M.csv\", \"r\") as inf:\n",
	" reader = csv.reader(inf)\n",
	" reader.next()\n",
	" print '{:<10} {:<5} {:<15} {:<10} {:<5} {:<20}'.format('Pass Len', 'Word?', 'Password', 'User Len', 'Word?', 'Username')\n",
	" for i in range(10):\n",
	" data = reader.next()\n",
	" pwd = data[0]\n",
	" usr = data[1]\n",
	" pwd_len = len(pwd)\n",
	" usr_len = len(usr)\n",
	" pwd_word = d.check(pwd)\n",
	" usr_word = d.check(usr)\n",
	" print '{:<10} {:<5} {:<15} {:<10} {:<5} {:<20}'.format(pwd_len, pwd_word, pwd, usr_len, usr_word, usr)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"OK, some passwords are terrible, but some look pretty strong. Let's calculate the strength of each password using a library called \"passwordmeter\" which returns a score between 0 (terrible password) and 1 (incredibly strong password).\n",
	"\n",
	"Let's also check to see if the username string appears in the password string (not a great idea)."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 22,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Pass Len Word? Password Strength user in pwd? User Len Word? Username \n",
	"4 0 c4$h 0.36 0 5 1 money \n",
	"4 1 this 0.1 0 4 1 that \n",
	"8 0 $ynbe$21 0.4 0 8 0 vhang_15 \n",
	"8 0 $YBWVdau 0.48 0 6 1 potato \n",
	"4 0 car$ 0.24 1 3 1 car \n",
	"6 0 b0bb0b 0.24 0 3 1 bob \n",
	"5 0 Gaiu$ 0.44 0 6 0 Baltar \n",
	"8 0 $tarbuck 0.26 0 9 0 BillAdama \n",
	"7 0 Caprica 0.35 0 11 0 LauraRoslin \n",
	"4 1 five 0.1 0 7 0 number6 \n"
	]
	}
	],
	"source": [
	"import passwordmeter\n",
	"\n",
	"d = enchant.Dict(\"en_US\")\n",
	"\n",
	"with open(\"10M.csv\", \"r\") as inf:\n",
	" reader = csv.reader(inf)\n",
	" reader.next()\n",
	" print '{:<10} {:<5} {:<15} {:<10} {:<15} {:<10} {:<5} {:<20}'.format(\n",
	" 'Pass Len', 'Word?', 'Password', 'Strength', 'user in pwd?', 'User Len', 'Word?', 'Username')\n",
	" for i in range(10):\n",
	" data = reader.next()\n",
	" pwd = data[0]\n",
	" usr = data[1]\n",
	" pwd_len = len(pwd)\n",
	" usr_len = len(usr)\n",
	" pwd_word = d.check(pwd)\n",
	" usr_word = d.check(usr)\n",
	" usr_in_pwd = pwd.find(usr) != -1\n",
	" pwd_strength = round(passwordmeter.test(pwd)[0],2)\n",
	" print '{:<10} {:<5} {:<15} {:<15} {:<10} {:<10} {:<5} {:<20}'.format(\n",
	" pwd_len, pwd_word, pwd, pwd_strength, usr_in_pwd, usr_len, usr_word, usr)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now... to run this on 10M rows of data."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 2",
	"language": "python",
	"name": "python2"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.7"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}