Skip to content

Instantly share code, notes, and snippets.

@jorinvo
Last active November 19, 2024 02:40
Show Gist options
  • Save jorinvo/2e43ffa981a97bc17259 to your computer and use it in GitHub Desktop.
Save jorinvo/2e43ffa981a97bc17259 to your computer and use it in GitHub Desktop.
This is a little challenge to find out which tools programmers use to get their everyday tasks done quickly.

You got your hands on some data that was leaked from a social network and you want to help the poor people.

Luckily you know a government service to automatically block a list of credit cards.

The service is a little old school though and you have to upload a CSV file in the exact format. The upload fails if the CSV file contains invalid data.

The CSV files should have two columns, Name and Credit Card. Also, it must be named after the following pattern:

YYYYMMDD.csv.

The leaked data doesn't have credit card details for every user and you need to pick only the affected users.

The data was published here:

https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json

You don't have much time to act.

What tools would you use to get the data, format it correctly and save it in the CSV file?


Do you have a crazy vim configuration that allows you to do all of this inside your editor? Are you a shell power user and write this as a one-liner? How would you solve this in your favorite programming language?

Show me your solution in the comments below!

Update

Thank you all for participating!

I never thought so many people might be willing to submit a solution. This is exactly the overview about different technologies and ways of thinking I anticipated to get.

We have solutions without any coding, solutions in one line of code and solutions with over a hundred lines.

I hope everyone else also learned something new by looking at this different styles!

Make sure to also checkout the solutions on Hackernews, Reddit (and /r/haskell) and dev.to!

Cheers, Jorin

@Magnap
Copy link

Magnap commented Apr 27, 2015

Here's my Haskell solution. By default it uses the supplied URL as the data source, but it can combine the data from any number of urls and/or files given as command line arguments.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}

module Main (main) where

import qualified System.Environment as E
import qualified Network.HTTP.Conduit as N
import qualified Data.Aeson as A
import qualified Data.Text as T
import qualified Data.ByteString.Lazy as B
import qualified Data.Maybe as DM
import qualified Data.Csv as C
import Data.Csv ((.=))
import qualified Data.Vector as V
import GHC.Generics
import qualified Data.Time as DT
import qualified Data.List as DL
import qualified Network.URI as U
import qualified Control.Monad as M

data Person = Person
  { name :: T.Text
  , creditcard :: T.Text
  } deriving (Show, Generic)

instance A.FromJSON Person

instance C.ToNamedRecord Person where
  toNamedRecord (Person name creditcard) = C.namedRecord [
    "Name" .= name, "Credit Card" .= creditcard]

defaultURL = "https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"

main = do
  args <- E.getArgs
  let (urls,files) = DL.partition U.isURI (if null args then [defaultURL] else args)
  M.when (null args) (putStrLn "No arguments provided. Downloading from default URL. Any number of files and/or URLs can be provided as arguments.")
  M.unless (null files) (putStrLn "Reading the following files:" >> mapM_ putStrLn files)
  M.unless (null urls) (putStrLn "Downloading from the following URLs:" >> mapM_ putStrLn urls)
  contents <- M.liftM2 (++) (mapM N.simpleHttp urls) (mapM B.readFile files)
  let lines = concatMap (map B.init . B.split 10) contents
      parse = DM.mapMaybe A.decode lines :: [Person]
      csv = C.encodeByName (V.fromList ["Name", "Credit Card"]) parse
  now <- DT.getCurrentTime
  let fileName = DT.formatTime DT.defaultTimeLocale "%Y%m%d.csv" now
  putStrLn $ "Writing to " ++ fileName
  B.writeFile fileName csv

@cgp
Copy link

cgp commented Apr 27, 2015

No love for Java? /s

import java.io.IOException;
import java.io.PrintWriter;
import java.net.MalformedURLException;
import java.net.URL;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.Scanner;


public class JavaChallenge {
    public static void main(String[] args) throws MalformedURLException, IOException {
        String url = "https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json";
        String in = new Scanner(new URL(url).openStream(), "UTF-8").useDelimiter("\\A").next();
        String[] entries = in.split("\n");
        StringBuilder sb = new StringBuilder("");
        for(String entry:entries) {
            String[] fields = entry.split("\"");
            if (fields.length < 10) continue;           
            if (!"creditcard".equals(fields[fields.length-2])) {
                sb.append("\""+fields[3]+","+fields[fields.length-2]+"\"\n");
            }           
        }
        LocalDate ld = LocalDate.now();     
        PrintWriter out = new PrintWriter(ld.format(DateTimeFormatter.ofPattern("YYYYMMdd"))+".csv");
        out.println(sb.toString());
        out.close();         
    }
}

@ebastos
Copy link

ebastos commented Apr 27, 2015

Only with standard Unix tools:

file=$(date +"%Y%m%d").csv; echo name,creditcard > $file; curl https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json|sed -r 's/(.*name":)(.*)(,"email.*creditcard":")(.*)}/\2,\4/g' |egrep -v "null|\[|\]"|cut -d "," -f1,2|tr -d '"' >> $file

@rupa
Copy link

rupa commented Apr 29, 2015

nice thing. quick and dirty as I did not have much time to act!

curl https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json | grep -v '"creditcard":null' | sed -e 's/.*name":"//' -e 's/".*creditcard":"/,/' -e 's/".*//' -e 's/^\[$/name,creditcard/' | grep -v '^]$' >  $(date +%Y%m%d).csv

@singareddyb
Copy link

Using a combination of bash\perl --

wget https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json; filename=`date +%Y%m%d`.csv; echo "Name , CreditCard" > $filename; perl -w -n -e 'print "$1 , $2\n" if(m/^{"name":"(.*)","email.*"creditcard":"(.*)"}[,]?$/)' data.json >> $filename

Then, I have a pure Perl approach --

#!/usr/bin/perl

use strict;
use warnings;

use IO::File;
use LWP::Simple;

my $file = 'data.json';
my $uri = 'https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json';

getstore($uri,$file);

my $rf = IO::File->new('data.json','r');
my $filename = calculateDateBasedFile();
my $wf = IO::File->new("$filename",'w');

$wf->print("Name , Credit\n");

while(defined($_ = $rf->getline)){

if(m/^{"name":"(.*)","email.*"creditcard":"(.*)"}[,]?$/)
  {
      $wf->print("$1 , $2\n");
  }

}

sub calculateDateBasedFile{

 my @time=localtime(time);
 my $year = 1900 + $time[5];
 my $month = $time[4] + 1;

  $month=date_quirk($month);

 my $day = $time[3];
  $day = date_quirk($day);


return $year.$month.$day.".csv";

}
sub date_quirk{

  my $m_or_d = shift;

  if($m_or_d < 10){

      $m_or_d = "0".$m_or_d;

  }
  return $m_or_d;
}

$wf->close;
$rf->close;

@xtradev
Copy link

xtradev commented May 15, 2015

\B=name,creditcard\n
\L{"name"\:"<U>"*"creditcard"\:"<U>"=$1,$3\n

gema -match -f json2csv.gema data.json > 20150515.csv

@raine
Copy link

raine commented May 16, 2015

Solution using ramda-cli:

#!/usr/bin/env bash

data_url=https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json
file=`date "+%Y%m%d"`.csv
curl $data_url | R 'filter where creditcard: (!= null)' 'project [\name \creditcard]' -o csv > $file

@snahor
Copy link

snahor commented May 24, 2015

The lovely awk:

curl -s https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json | awk -F '("[,:]"|"})' '{if ($12!="") print $2","$12}' > (date +%Y%m%d).csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment