Skip to content

Instantly share code, notes, and snippets.

@jorinvo
Last active November 19, 2024 02:40
Show Gist options
  • Save jorinvo/2e43ffa981a97bc17259 to your computer and use it in GitHub Desktop.
Save jorinvo/2e43ffa981a97bc17259 to your computer and use it in GitHub Desktop.
This is a little challenge to find out which tools programmers use to get their everyday tasks done quickly.

You got your hands on some data that was leaked from a social network and you want to help the poor people.

Luckily you know a government service to automatically block a list of credit cards.

The service is a little old school though and you have to upload a CSV file in the exact format. The upload fails if the CSV file contains invalid data.

The CSV files should have two columns, Name and Credit Card. Also, it must be named after the following pattern:

YYYYMMDD.csv.

The leaked data doesn't have credit card details for every user and you need to pick only the affected users.

The data was published here:

https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json

You don't have much time to act.

What tools would you use to get the data, format it correctly and save it in the CSV file?


Do you have a crazy vim configuration that allows you to do all of this inside your editor? Are you a shell power user and write this as a one-liner? How would you solve this in your favorite programming language?

Show me your solution in the comments below!

Update

Thank you all for participating!

I never thought so many people might be willing to submit a solution. This is exactly the overview about different technologies and ways of thinking I anticipated to get.

We have solutions without any coding, solutions in one line of code and solutions with over a hundred lines.

I hope everyone else also learned something new by looking at this different styles!

Make sure to also checkout the solutions on Hackernews, Reddit (and /r/haskell) and dev.to!

Cheers, Jorin

@ebastos
Copy link

ebastos commented Apr 27, 2015

Only with standard Unix tools:

file=$(date +"%Y%m%d").csv; echo name,creditcard > $file; curl https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json|sed -r 's/(.*name":)(.*)(,"email.*creditcard":")(.*)}/\2,\4/g' |egrep -v "null|\[|\]"|cut -d "," -f1,2|tr -d '"' >> $file

@rupa
Copy link

rupa commented Apr 29, 2015

nice thing. quick and dirty as I did not have much time to act!

curl https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json | grep -v '"creditcard":null' | sed -e 's/.*name":"//' -e 's/".*creditcard":"/,/' -e 's/".*//' -e 's/^\[$/name,creditcard/' | grep -v '^]$' >  $(date +%Y%m%d).csv

@singareddyb
Copy link

Using a combination of bash\perl --

wget https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json; filename=`date +%Y%m%d`.csv; echo "Name , CreditCard" > $filename; perl -w -n -e 'print "$1 , $2\n" if(m/^{"name":"(.*)","email.*"creditcard":"(.*)"}[,]?$/)' data.json >> $filename

Then, I have a pure Perl approach --

#!/usr/bin/perl

use strict;
use warnings;

use IO::File;
use LWP::Simple;

my $file = 'data.json';
my $uri = 'https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json';

getstore($uri,$file);

my $rf = IO::File->new('data.json','r');
my $filename = calculateDateBasedFile();
my $wf = IO::File->new("$filename",'w');

$wf->print("Name , Credit\n");

while(defined($_ = $rf->getline)){

if(m/^{"name":"(.*)","email.*"creditcard":"(.*)"}[,]?$/)
  {
      $wf->print("$1 , $2\n");
  }

}

sub calculateDateBasedFile{

 my @time=localtime(time);
 my $year = 1900 + $time[5];
 my $month = $time[4] + 1;

  $month=date_quirk($month);

 my $day = $time[3];
  $day = date_quirk($day);


return $year.$month.$day.".csv";

}
sub date_quirk{

  my $m_or_d = shift;

  if($m_or_d < 10){

      $m_or_d = "0".$m_or_d;

  }
  return $m_or_d;
}

$wf->close;
$rf->close;

@xtradev
Copy link

xtradev commented May 15, 2015

\B=name,creditcard\n
\L{"name"\:"<U>"*"creditcard"\:"<U>"=$1,$3\n

gema -match -f json2csv.gema data.json > 20150515.csv

@raine
Copy link

raine commented May 16, 2015

Solution using ramda-cli:

#!/usr/bin/env bash

data_url=https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json
file=`date "+%Y%m%d"`.csv
curl $data_url | R 'filter where creditcard: (!= null)' 'project [\name \creditcard]' -o csv > $file

@snahor
Copy link

snahor commented May 24, 2015

The lovely awk:

curl -s https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json | awk -F '("[,:]"|"})' '{if ($12!="") print $2","$12}' > (date +%Y%m%d).csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment