Skip to content

Instantly share code, notes, and snippets.

@ltc-hotspot
Last active August 29, 2015 14:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ltc-hotspot/f3dea0e65b879479eb72 to your computer and use it in GitHub Desktop.
Save ltc-hotspot/f3dea0e65b879479eb72 to your computer and use it in GitHub Desktop.
embedded data
handle = """From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
From louis@media.berkeley.edu Fri Jan 4 18:10:48 2008
""".split("\n") # snippet file data: mbox-short.txt
count = dict()
#fname = raw_input("Enter file name: ")# insert # to add snippet file data
#handle = open (fname, 'r')# insert # to add snippet file data
for line in handle:
if line.startswith("From "):
time = line.split() # splitting the lines ->
# print time: ['From', 'stephen.marquard@uct.ac.za', 'Sat', 'Jan', '5', '09:14:16', '2008']
for hours in time: #getting the index pos of time ->
hours = line.split(":")[2] # splitting on ":" ->
line = line.rstrip()
count[hours] = count.get(hours, 0) + 1 # getting the index pos of hours.
lst = [(val,key) for key,val in count.items()] # find the most common words
lst.sort(reverse=True)
for key, val in lst[:12] :
print key, val
@ltc-hotspot
Copy link
Author

The Assignment:

I'm trying to write Python code to read through a data file and figure out the distribution by hour of the day for each message in the data file.

Python can pull the hour from the 'From ' line by finding the time and then splitting the string a second time using a colon, i.e., From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008

Finally, accumulated the counts for each hour, print out the counts, sorted by hour as shown below:

name = raw_input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)

Desired Output:

04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment