Skip to content

Instantly share code, notes, and snippets.

@coke

coke/article.md Secret

Last active December 14, 2023 05:08
Show Gist options
  • Save coke/147514763453ae439aea3a8999acff4d to your computer and use it in GitHub Desktop.
Save coke/147514763453ae439aea3a8999acff4d to your computer and use it in GitHub Desktop.
Why you should write some horrible Raku code this Christmas!

Draft article for 2023 Raku Advent of Code by Will Coleda

Writing some horrible Raku code this Christmas!

Santa only had a few days left to make sure everything was ready to go, but with all the stress of the season, he needed a break to recharge. He grabbed some cocoa and hid away in a nook to relax his body and distract his mind, tuning into one of his favorite Youtubers, Matt Parker. Parker finds interesting mathematical problems that he attempts to untangle and present to the audience in a tractable way, and as he analyzes the problems, he often has to write "some horrible python code." Santa, of course, will use his favorite language instead: Raku!

Maybe if Santa's brain was working on one of these puzzles, it'd help him stop thinking about all the other work he was supposed to be doing.

The Problem

So, what to work on in this precious downtime? Santa wants to work on something a little practical, so he doesn't feel too guilty about taking some time off - let's figure out how much we're going to have to expand the shop in the next few years!

A quick google search gets us to some UN data - surely that's a good start. Santa creates a sandbox folder, and manually downloads and unzips it. For small projects like this, Santa likes to attack the problem in chunks rather than map the whole thing project at once. First, he makes sure he can read the data at all:

my $data-file  = "data.csv".IO;
my $data = $data-file.lines;
my $headers = $data[0];
dd $headers.split(',');
("SortOrder", "LocID", "Notes", "ISO3_code", "ISO2_code", "SDMX_code", "LocTypeID", "LocTypeName", "ParentID", "Location", "VarID", "Variant", "Time", "MidPeriod", "AgeGrp", "AgeGrpStart", "AgeGrpSpan", "PopMale", "PopFemale", "PopTotal").Seq

Alright, the CSV starts with a row of headers, so we read it in, grab the first row, and do a data dump of that row. We ignore all the possible complexity of CSV, we'll deal with that if we need to.

Filtering

We are only interested in getting estimates on kids, so let's filter through the data. Santa can ignore anything where the starting age is 15 or higher, at least for this project.

We peeked at the headers, we know which columns the data we need is in, so we'll hardcode it for now. Santa gets the age first since that's our filter, and only prints out the data if the row is good!

my $data-file  = "data.csv".IO;
my $data = $data-file.lines;
my $headers = $data[0];
my $count = 0;
for @($data) -> $line {
    $count++;
    next if $count == 1; # skip the headers
    my @row = $line.split(',');

    my $age  = @row[15];
    next if $age >= 15;
    my $year = @row[12];
    my $pop  = @row[19];
    dd $year, $age, $pop;
}
Cannot convert string to number: imaginary part of complex number must be followed by 'i' or '\i' in '0-4⏏' (indicated by ⏏)

What? There's imaginary numbers in here? Santa adds some debug output to print the line before processing it, and sees:

15,934,g,,,,5,Development group,902,"Less developed regions, excluding least developed countries",2,Medium,1950,1950,0-4,0,5,113433.383,107834.33,221267.713

Not so simple

Ah, biscuits. Looks like our horribly simple start caught up with us, we do have to care about slightly more complicated CSV data after all. Rather than spending any more time on improving our CSV "parser" (currently only split), let's get out the big hammer:

zef install Text::CSV

Santa quickly checks out the docs and updates his code:

use Text::CSV;

my $csv = Text::CSV.new;
my $io = open "data.csv", :r;

my @headers = $csv.header($io).column-names;

while (my @row = $csv.getline($io)) {
    my $age = @row[15];
    next if $age >= 15;
    my $year = @row[12];
    my $pop  = @row[19];
}

He's still using column numbers, but now that he's switched over to Text::CSV, at least we can process the whole file.

Speed?

Problem with this version is it's a little slow. To be fair, it is over 900,000 lines of CSV data. Santa is willing to cheat a little here: he's just looking for some estimates, after all.

Maybe the Text::CSV has to do enough extra processing per line that it adds up, or maybe Raku's default line iteration is more efficient than manually calling getline a bunch of times. We're impatient, so we'll try updating both at once: .lines to walk through the file, and then only using the CSV parser if it we know we got the wrong column count back. We may miss a line or two but this is good enough for our rough estimate. Santa adds up all the data for each year and prints out some samples.

use Text::CSV;

my $csv = Text::CSV.new;

my @lines = "data.csv".IO.lines;
my $headers = @lines.shift.split(',');
my $cols = $headers.elems;

my %estimate;
for @lines {
   my @row = $_.split(','); # simple CSV
   if @row.elems != $cols {
       @row = $csv.getline($_); # real CSV
   }
   my $year = @row[12];
   next if $year <= 2023;
   my $age = @row[15];
   next if $age >= 15;
   my $pop = @row[19];
   %estimate{$year}+=$pop; 
}
say %estimate{2024};
say %estimate{2050};
19110349.077
19204147.428

Ah, much better. Now we can see we can expect a few more deliveries in 2050! Let's improve the formatting a little and filter to output each decade and see how much we need to expand!

Pretty print

use Text::CSV;

my $csv = Text::CSV.new;

my @lines = "data.csv".IO.lines;
my $headers = @lines.shift.split(',');
my $cols = $headers.elems;

my %estimate;
for @lines {
   my @row = $_.split(','); # simple CSV
   if @row.elems != $cols {
       @row = $csv.getline($_); # real CSV
   }
   my $year = @row[12];
   next if $year <= 2023;
   next unless $year %% 10; 
   my $age = @row[15];
   next if $age >= 15;
   my $pop = @row[19];
   %estimate{$year}+=$pop; 
}

for %estimate.keys.sort -> $year {
    say "$year: {%estimate{$year}.fmt('%i')}";
}
2030: 18838469
2040: 18926239
2050: 19204147
2060: 18816096
2070: 18281171
2080: 17819389
2090: 17111136
2100: 16315984

Oh! It's a good thing we checked, looks like 2050 was the peak, and then the projections go back down! Maybe we can avoid expanding the shop for a while!

Other improvements?

Having gotten the quick answer he was lookiing for, Santa throws together a TODO file for next year's estimate

  • Pull the file from the UN and unzip it in code if we haven't already - and see if there's an updated file name each year
  • Investigate parallelism - would race help
  • Switch to a full Text::CSV version and figure out the best API to use
  • Use column headers instead of numbers to future proof against changes in the data file!

Wrapup

Now that Santa's exercised his brain on this code, he's ready to get back to the real work for the season!

Santa's recommendation to you is to write some "horrible" Raku code, just like Matt Parker would. Of course, it's not actually horrible, more "quick and dirty". Remember, it's OK to write something that just gets the job done, and not start with something polished.

It's OK if you don't necessarily understand all the nuances of the language (it's big!), you just need enough to get the job done. You can always go back later and polish or iteratively improve it. Raku even has this attitude baked in with gradual typing - you can add type strictures as you need. Much like writing a blog post, it's easier to start with something and revise it than it is to face that blank file.

Remember, when optimizing your project, sometimes it's OK to optimize for developer time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment