Created
September 4, 2013 22:25
-
-
Save jdavidheiser/6443646 to your computer and use it in GitHub Desktop.
Example solutions for of CodeKata
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The following questions were posted by CodeKata 4. Answers are provided inline. | |
* To what extent did the design decisions you made when writing the original programs make it easier or harder to factor out common code? | |
Part 1 was very direct, with things like column numbers hard coded inline. As a first pass prototype it was fastest to do this way, rather than adding a lot of variables and creating functions. Part 2, however, took the same basic structure of part 1 and made it more abstract, such that it was extremely easy to factor out the common ideas by the time part 3 rolled around. | |
* Was the way you wrote the second program influenced by writing the first? | |
Yes - seeing the parallels between the two problems meant it made sense to already start abstracting some things in the second program, to make the code a bit more generic and less specific. | |
* Is factoring out as much common code as possible always a good thing? Did the readability of the programs suffer because of this requirement? How about the maintainability? | |
In this case, it was a good thing - the resultant program maintained good readability and was abstract and generic enough that it could be easily extended for other data files, and indeed might work for many without any changes up front. | |
HOWEVER, small distinctions like the extra asterisks in weather.dat, or the horizontal dividing line in football.dat informed the process used to parse the data files. If this program were intended to eventually be expanded to data files with significantly different formats, the combined, refactored code could become cumbersome quite quickly, as various filters and hacks were worked in to 'massage' the data into a more usable form. In this case, a more robust data parsing library would be appropriate, and possibly some sort of configuration file used to determine the data format expected. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Example solution of CodeKata number Four: Data Munging, part 1 | |
http://codekata.pragprog.com/2007/01/kata_four_data_.html#more | |
James Davidheiser | |
September 4, 2013 | |
''' | |
filename = "weather.dat" | |
''' | |
The problem stated is to load a data file and determine the day which had the smallest difference | |
between the high and low temperatures, and print out that information. | |
''' | |
with open(filename) as delimfile: | |
''' | |
the data is stored in an ugly fashion, with headers, HTML tags, arbitrary blank lines, etc | |
a more sophisiticated file reader add-on would be appropriate for here | |
but for the purposes of the exercise, let's brute force it using only Python built-in commands | |
''' | |
day=[] | |
spread=[] | |
for row in delimfile: | |
# here we want to grab only rows with information in them, and then only the rows which | |
# start with a number (for the day of the month). | |
if len(row) > 1: | |
tmp = row.split() | |
if tmp[0].isdigit(): | |
day.append(tmp[0]) | |
spread.append(float(tmp[1].strip('*'))-float(tmp[2].strip('*'))) | |
# Find the day that corresponds to the minimum temperature difference | |
# to do this much faster, we could use Numpy, but for this short data file this works well and is readable | |
min_index = spread.index(min(spread)) | |
print "The day with the smallest temperature spread was:" , day[min_index] , "with a spread of" , spread[min_index] , "degrees" | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Example solution of CodeKata number Four: Data Munging, part 2 | |
http://codekata.pragprog.com/2007/01/kata_four_data_.html#more | |
James Davidheiser | |
September 4, 2013 | |
''' | |
filename = "football.dat" | |
''' | |
The problem stated is to load a data file and determine the team which had the smallest difference between | |
the goals for and against, and print out that information. Here we define which columns of the data will | |
contain the relevant information, after being split by whitespace. | |
''' | |
column_for = 6 | |
column_against = 8 | |
column_name = 1 | |
min_data_columns = 9 # this will be a minimum of data columns, in case some rows have extra entries beyond | |
# the columns we care about | |
with open(filename) as delimfile: | |
team = [] | |
spread = [] | |
for row in delimfile: | |
tmp = row.split() | |
# previous example split after checking the row length - it's more logical to check the row length after the split | |
# operation, because we can identify the number of columns and more easily exclude non-data columns. | |
if len(tmp) > min_data_columns: | |
if tmp[column_for].isdigit(): | |
# add absolute value here because we care about the smallest difference, regardless of who won | |
spread.append(abs(float(tmp[column_for])-float(tmp[column_against]))) | |
team.append(tmp[column_name]) | |
index_min = spread.index(min(spread)) | |
print "The team",team[index_min], "had the smallest difference in 'for' and 'against' goals, with a difference of",int(spread[index_min]) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Example solution of CodeKata number Four: Data Munging, part 3 | |
http://codekata.pragprog.com/2007/01/kata_four_data_.html#more | |
James Davidheiser | |
September 4, 2013 | |
The problem stated is to take the previous two examples, determining minimum scoring differential and minimum | |
temperature swings, and refactor them to work with some shared code. Typically this would be placed in a separate | |
module file which is imported, but for the sake of brevity for this exercise, we include everything in a single | |
file and simply define a function that is called twice at the end. | |
''' | |
import sys | |
# It's possible that data files could have extraneous characters in the columns corresponding to | |
# data output values. Strip those out of the column completely. | |
deletechars = '*' | |
def get_minimum_difference(filename,column_A,column_B,column_name,min_data_columns): | |
''' | |
get_minimum_difference finds the smallest difference between column A and column B in the text data | |
file (filename), and returns a tuple containing the corresponding name from column_name, as well | |
as the difference value | |
''' | |
with open(filename) as delimfile: | |
name = [] | |
spread = [] | |
for row in delimfile: | |
tmp = row.split() | |
if len(tmp) > min_data_columns: | |
try: | |
''' | |
rather than checking manually whether one or both of the columns contains a digit | |
use the Pythonic approach with try and except blocks | |
if we fail to turn the two entries into floats, that means one of them wasn't in a format | |
capable of converting to float and we should fail gracefully | |
HOWEVER - there is a caveat to this approach. We could potentially skip lines that are formatted | |
differently, so let's make sure we print out those instances to stderr and warn the user | |
''' | |
spread.append(abs(float(tmp[column_A].translate(None,deletechars)) - \ | |
float(tmp[column_B].translate(None,deletechars)))) | |
name.append(tmp[column_name]) | |
except ValueError: | |
print >> sys.stderr, "Warning, ignoring row: ", row | |
index_min = spread.index(min(spread)) | |
return (name[index_min],spread[index_min]) | |
if __name__ == '__main__': | |
score_tuple = get_minimum_difference('football.dat',column_A=6,column_B=8,column_name=1,min_data_columns=9) | |
print "The team name and smallest point differential in football.dat are:",score_tuple | |
print "\n" | |
temp_tuple = get_minimum_difference('weather.dat',column_A=1, column_B=2,column_name=0,min_data_columns=14) | |
print "The day of June 2002 with the smallest difference between the high and low temperature is:", temp_tuple |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment