Skip to content

Instantly share code, notes, and snippets.

@chenghan
Last active February 5, 2022 20:17
Show Gist options
  • Save chenghan/7456549 to your computer and use it in GitHub Desktop.
Save chenghan/7456549 to your computer and use it in GitHub Desktop.
Instructor code that was shown on screen
import sys
salesTotal = 0
oldKey = None
for line in sys.stdin:
data = line.strip().split("\t")
if len(data) != 2:
# Something has gone wrong. Skip this line.
continue
thisKey, thisSale = data
if oldKey and oldKey != thisKey:
print oldKey, "\t", salesTotal
oldKey = thisKey
salesTotal = 0
oldKey = thisKey
salesTotal += float(thisSale)
if oldKey != None:
print oldKey, "\t", salesTotal
@piggybox
Copy link

The ";' isn't needed on line 13

"sys" needs to be imported at the beginning

@sanoops
Copy link

sanoops commented Mar 10, 2014

import sys
salesTotal = 0.0
oldKey = None
dummy_Data=["Miami 12.34","Miami 99.07","Miami 55.07","NYC 88.97","NYC 33.56"]

for line in dummy_Data:
data = line.strip().split(" ")
if len(data) != 2:
# Something has gone wrong. Skip this line.
continue

thisKey, thisSale = data
if oldKey and oldKey != thisKey:
    print oldKey, ":", salesTotal
    oldKey = thisKey
    salesTotal = 0

oldKey = thisKey
salesTotal += float(thisSale)

if oldKey != None:
print oldKey, ":", salesTotal

reducer.py https://gist.github.com/sanoops/9471084

@spenceronuffer
Copy link

Would it be cleaner to store this info to dictionary? It would make it so you don't have to keep track of oldKey vs thisKey, also it will work if the sort is imperfect, but I'm not sure if there's any map reduce specific thing it would screw up

import sys

salesTotals = {}

for line in sys.stdin:
    data = line.strip().split("\t")
    if len(data) != 2:
        # Something has gone wrong. Skip this line.           
        continue

    store, sale = data
    salesTotals.setdefault(store, 0)
    salesTotals[store] += float(sale)

for store in salesTotals:
    print "{0}\t{1}".format(store, salesTotals[store])

@digitalmacgyver
Copy link

digitalmacgyver commented Jan 23, 2017

Line 13:

if oldKey and oldKey != thisKey:

Would be better written with an explicit check against None:

if oldKey is not None and oldKey != thisKey:

As it is, this code malfunctions if given input where the key is the empty string, e.g.:

NY\t100
\t200
SF\t300

Will yield an output of:
SF:600

@senthil1988
Copy link

Can anyone explain what the below line does, I understand one part and i don't get the first condition.

"if oldkey and oldkey!=None"

I don't get what the first condition "if oldkey and" does...Thanks in Advance

Senthil

@pabloalicante
Copy link

Hi @senthil1988

The sentence "if oldkey..." what tests is that the variable oldkey is assigned to some value and its type is different than NoneType.

It would be clear and easier to write "if oldKey is not None..." instead of "if oldkey..."

Regards!

@wbl17
Copy link

wbl17 commented Jul 2, 2017

Just tested the code locally. To me, line 15 is not necessary, is it? In this example (and supposedly in general, with the keys sorted), when a new city gets processed, the assignment oldKey=thisKey will be done in line 18 anyway; setting totalSales=0 is necessary, though.

Happy coding!

@yashgyy
Copy link

yashgyy commented Dec 27, 2017

I m having some Confusion around here the Reducer script is reading from sys.stdin so how does the mapper passes on to the file to Read, Mapper code is only printing the line, its not storing the lines to pass onto the Reducer, Reducer is reading from stdin so it has read to from the keyboard and not lines passed by mapper

@DeepanshKhurana
Copy link

Hi, I'm very new to this and I was wondering why we need these lines of code at lines 21 and 22.

if oldKey != None: print oldKey, "\t", salesTotal

@Dikyashi
Copy link

Hi, I'm very new to this and I was wondering why we need these lines of code at lines 21 and 22.

if oldKey != None: print oldKey, "\t", salesTotal

This is for printing the last line

" oldkey!=None ",means its testing if the oldkey has value or not but since the code has come out of for loop oldkey will have value.
Now if you ask "but we don't need if condition for printing last line".This is where its really interesting, if the (if len(data) != 2) turns out true or moreover if the input data is incorrect then the program wont simply print .

@HabibBG88
Copy link

i have implanted the code and as an output i find this
newyork 28
amazon 22
washdc 1
i wander why the tab doesnt work and the number are not in the same line thanks

@yvonnechanlove97
Copy link

Can I use groupby function in Pandas? That was my first thought

@zhujunqing1996
Copy link

I think this is probably how the groupby function in pandas works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment