Skip to content

Instantly share code, notes, and snippets.

@douglasgoodwin
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save douglasgoodwin/9353554 to your computer and use it in GitHub Desktop.
Save douglasgoodwin/9353554 to your computer and use it in GitHub Desktop.
Some examples of the entropy of text measured with ZLIB
{
"metadata": {
"name": "Measuring the redundancy of text"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": "import zlib",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": "chandler = \"\"\"It was about eleven o'clock in the morning, mid October, with the sun not shining and a look of hard wet \nrain in the clearness of the foothills. I was wearing my powder-blue suit, with dark blue shirt, tie and display \nhandkerchief, black brogues, black wool socks with dark blue clocks on them. I was neat, clean, shaved, and sober, \nand I didn't care who knew it. I was everything the well-dressed private detective ought to be. I was calling on four \nmillion dollars.\n\nThe main hallway of the Sternwood Place was two stories high. Over the entrance doors, which would \nhave let in a troop of Indian elephants, there was a broad stained-glass panel showing a knight in dark armor rescuing \na lady who was tied to a tree and didn't have any clothes on but some very long and convenient hair. The knight had \npushed the vizor of his helmet back to be sociable, and he was fiddling on the ropes that tied the lady to the tree \nand not getting anywhere. I stood there and thought that if I lived in the house, I would sooner or later have to climb \nup there and help him.\n\nThere were French doors at the back of the hall, beyond them a wide sweep of emerald grass \nto a white garage, in front of which a slim dark young chauffeur in shiny black leggings was dusting a maroon Packard \nconvertible. Beyond the garage were some decorative trees trimmed as carefully as poodle dogs. Beyond them a large \ngreenhouse with a domed roof. Then more trees and beyond everything the solid, uneven, comfortable line of the \nfoothills.\n\nOn the east side of the hall, a free staircase, tile-paved, rose to a gallery with a wrought-iron railing and another \npiece of stained-glass romance. Large hard chairs with rounded red plush seats were backed into the vacant spaces of \nthe wall round about. They didn't look as if anybody had ever sat in them. In the middle of the west wall there was a \nbig empty fireplace with a brass screen in four hinged panels, and over the fireplace a marble mantel with cupids at \nthe corners. Above the mantel there was a large oil portrait, and above the portrait two bullet-torn or moth-eaten \ncavalry pennants crossed in a glass frame. The portrait was a stiffly posed job of an officer in full regimentals of \nabout the time of the Mexican war. The officer had a neat black imperial, black moustachios, hot hard coal-black eyes, \nand the general look of a man it would pay to get along with. I thought this might be General Sternwood's grandfather. \nIt could hardly be the General himself, even though I had heard he was pretty far gone in years to have a couple of \ndaughters still in the dangerous twenties.\n\nI was still staring at the hot black eyes when a door opened far back \nunder the stairs. It wasn't the butler coming back. It was a girl.\"\"\"\n",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": "compressed_chandler = zlib.compress(chandler)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": "print len(chandler)\nprint len(compressed_chandler)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "2769\n1372\n"
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": "low=\"witch which has which witches wrist watch\"",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": "low_compressed = zlib.compress(low)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": "print len(low_compressed)\nprint len(low)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "37\n41\n"
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": "def getredundancy(atext):\n atext_z = zlib.compress(atext)\n redundancy= ( len(atext_z)*100 ) / len(atext)\n redundancy = 100-redundancy\n return redundancy",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": "blargie=\"\"\"Redundancy, rather than poor grammar and spelling, is the biggest source of \nproblems in prose. Here are sets of exercises to sharpen your ability to identify redundancy. \nThe exercises tend to get harder as you progress through the page. Remember, you're trying to \ndevelop the habit of scrutinising the need for every word in a text. Undertaking these \nexercises can be the start of a longer project to tighten up your prose.\"\"\"\nblargie_z=zlib.compress(blargie)\n\nprint \"my len is %d and compressed len is %d\" %( len(blargie), len(blargie_z) )\nprint \"my redundancy is %d\" %(getredundancy(blargie))",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "my len is 427 and compressed len is 262\nmy redundancy is 39\n"
}
],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": "witch=\"witch which has which witches wrist watch\"\nwitch_z = zlib.compress(low)\n\nprint \"my len is %d and compressed len is %d\" %( len(witch), len(witch_z) )\nprint \"my redundancy is %d\" %( getredundancy(witch) )",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "my len is 41 and compressed len is 37\nmy redundancy is 10\n"
}
],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": "print \"Chandler's redundancy is %d\" %( getredundancy(chandler) )",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Chandler's redundancy is 51\n"
}
],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": "pi=\"\"\"3.1415926535897932384626433832795028841971693993751058209749445923078164\"\"\"",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": "print \"Pi's redundancy is %d\" %( getredundancy(pi) )",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Pi's redundancy is 21\n"
}
],
"prompt_number": 28
},
{
"cell_type": "code",
"collapsed": false,
"input": "len(pi)",
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 29,
"text": "72"
}
],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": "latenight=\"wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww\"\nprint \"latenight redundancy is %d\" %( getredundancy(latenight) )",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "latenight redundancy is 85\n"
}
],
"prompt_number": 31
},
{
"cell_type": "code",
"collapsed": false,
"input": "DNA=\"\"\"ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA\"\"\"\n\nprint \"DNA length is %d\" %( len(DNA) )\nprint \"DNA's redundancy is %d\" %( getredundancy(DNA) )",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "DNA length is 368\nDNA's redundancy is 61\n"
}
],
"prompt_number": 35
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment