Skip to content

Instantly share code, notes, and snippets.

@erichannell
Created September 2, 2015 10:31
Show Gist options
  • Save erichannell/ae187c6f6d9bb7e8212d to your computer and use it in GitHub Desktop.
Save erichannell/ae187c6f6d9bb7e8212d to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are working with Comma-separated value (CSV) files, so we'll need that library."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's iterate over the rows in the file to see what data we find."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Password Username \n",
"c4$h money \n",
"this that \n",
"$ynbe$21 vhang_15 \n",
"$YBWVdau potato \n",
"car$ car \n",
"b0bb0b bob \n",
"Gaiu$ Baltar \n",
"$tarbuck BillAdama \n",
"Caprica LauraRoslin \n",
"five number6 \n"
]
}
],
"source": [
"with open(\"10M.csv\", \"r\") as inf:\n",
" reader = csv.reader(inf)\n",
" reader.next()\n",
" print '{:<20} {:<20}'.format('Password', 'Username')\n",
" for i in range(10):\n",
" data = reader.next()\n",
" print '{:<20} {:<20}'.format(data[0], data[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great, but what about augmenting this very simple data with something else. How about the length of the password & username."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Pass Len Password User Len Username \n",
"4 c4$h 5 money \n",
"4 this 4 that \n",
"8 $ynbe$21 8 vhang_15 \n",
"8 $YBWVdau 6 potato \n",
"4 car$ 3 car \n",
"6 b0bb0b 3 bob \n",
"5 Gaiu$ 6 Baltar \n",
"8 $tarbuck 9 BillAdama \n",
"7 Caprica 11 LauraRoslin \n",
"4 five 7 number6 \n"
]
}
],
"source": [
"with open(\"10M.csv\", \"r\") as inf:\n",
" reader = csv.reader(inf)\n",
" reader.next()\n",
" print '{:<10} {:<20} {:<10} {:<20}'.format('Pass Len', 'Password', 'User Len', 'Username')\n",
" for i in range(10):\n",
" data = reader.next()\n",
" pwd = data[0]\n",
" usr = data[1]\n",
" pwd_len = len(pwd)\n",
" usr_len = len(usr)\n",
" print '{:<10} {:<20} {:<10} {:<20}'.format(pwd_len, pwd, usr_len, usr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Perfect, but what else can we do? Unfortunately it looks like some people pick some really bad passwords. Let's see if we can tag the passwords that are simply English-language words. We'll use a library called \"enchant\" to do that."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import enchant"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's check each password and username to see if it is in the English-language dictionary."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Pass Len Word? Password User Len Word? Username \n",
"4 0 c4$h 5 1 money \n",
"4 1 this 4 1 that \n",
"8 0 $ynbe$21 8 0 vhang_15 \n",
"8 0 $YBWVdau 6 1 potato \n",
"4 0 car$ 3 1 car \n",
"6 0 b0bb0b 3 1 bob \n",
"5 0 Gaiu$ 6 0 Baltar \n",
"8 0 $tarbuck 9 0 BillAdama \n",
"7 0 Caprica 11 0 LauraRoslin \n",
"4 1 five 7 0 number6 \n"
]
}
],
"source": [
"d = enchant.Dict(\"en_US\")\n",
"\n",
"with open(\"10M.csv\", \"r\") as inf:\n",
" reader = csv.reader(inf)\n",
" reader.next()\n",
" print '{:<10} {:<5} {:<15} {:<10} {:<5} {:<20}'.format('Pass Len', 'Word?', 'Password', 'User Len', 'Word?', 'Username')\n",
" for i in range(10):\n",
" data = reader.next()\n",
" pwd = data[0]\n",
" usr = data[1]\n",
" pwd_len = len(pwd)\n",
" usr_len = len(usr)\n",
" pwd_word = d.check(pwd)\n",
" usr_word = d.check(usr)\n",
" print '{:<10} {:<5} {:<15} {:<10} {:<5} {:<20}'.format(pwd_len, pwd_word, pwd, usr_len, usr_word, usr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, some passwords are terrible, but some look pretty strong. Let's calculate the strength of each password using a library called \"passwordmeter\" which returns a score between 0 (terrible password) and 1 (incredibly strong password).\n",
"\n",
"Let's also check to see if the username string appears in the password string (not a great idea)."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Pass Len Word? Password Strength user in pwd? User Len Word? Username \n",
"4 0 c4$h 0.36 0 5 1 money \n",
"4 1 this 0.1 0 4 1 that \n",
"8 0 $ynbe$21 0.4 0 8 0 vhang_15 \n",
"8 0 $YBWVdau 0.48 0 6 1 potato \n",
"4 0 car$ 0.24 1 3 1 car \n",
"6 0 b0bb0b 0.24 0 3 1 bob \n",
"5 0 Gaiu$ 0.44 0 6 0 Baltar \n",
"8 0 $tarbuck 0.26 0 9 0 BillAdama \n",
"7 0 Caprica 0.35 0 11 0 LauraRoslin \n",
"4 1 five 0.1 0 7 0 number6 \n"
]
}
],
"source": [
"import passwordmeter\n",
"\n",
"d = enchant.Dict(\"en_US\")\n",
"\n",
"with open(\"10M.csv\", \"r\") as inf:\n",
" reader = csv.reader(inf)\n",
" reader.next()\n",
" print '{:<10} {:<5} {:<15} {:<10} {:<15} {:<10} {:<5} {:<20}'.format(\n",
" 'Pass Len', 'Word?', 'Password', 'Strength', 'user in pwd?', 'User Len', 'Word?', 'Username')\n",
" for i in range(10):\n",
" data = reader.next()\n",
" pwd = data[0]\n",
" usr = data[1]\n",
" pwd_len = len(pwd)\n",
" usr_len = len(usr)\n",
" pwd_word = d.check(pwd)\n",
" usr_word = d.check(usr)\n",
" usr_in_pwd = pwd.find(usr) != -1\n",
" pwd_strength = round(passwordmeter.test(pwd)[0],2)\n",
" print '{:<10} {:<5} {:<15} {:<15} {:<10} {:<10} {:<5} {:<20}'.format(\n",
" pwd_len, pwd_word, pwd, pwd_strength, usr_in_pwd, usr_len, usr_word, usr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now... to run this on 10M rows of data."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment