Skip to content

Instantly share code, notes, and snippets.

@jorinvo
Last active April 21, 2023 17:14
Show Gist options
  • Star 156 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save jorinvo/2e43ffa981a97bc17259 to your computer and use it in GitHub Desktop.
Save jorinvo/2e43ffa981a97bc17259 to your computer and use it in GitHub Desktop.
This is a little challenge to find out which tools programmers use to get their everyday tasks done quickly.

You got your hands on some data that was leaked from a social network and you want to help the poor people.

Luckily you know a government service to automatically block a list of credit cards.

The service is a little old school though and you have to upload a CSV file in the exact format. The upload fails if the CSV file contains invalid data.

The CSV files should have two columns, Name and Credit Card. Also, it must be named after the following pattern:

YYYYMMDD.csv.

The leaked data doesn't have credit card details for every user and you need to pick only the affected users.

The data was published here:

https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json

You don't have much time to act.

What tools would you use to get the data, format it correctly and save it in the CSV file?


Do you have a crazy vim configuration that allows you to do all of this inside your editor? Are you a shell power user and write this as a one-liner? How would you solve this in your favorite programming language?

Show me your solution in the comments below!

Update

Thank you all for participating!

I never thought so many people might be willing to submit a solution. This is exactly the overview about different technologies and ways of thinking I anticipated to get.

We have solutions without any coding, solutions in one line of code and solutions with over a hundred lines.

I hope everyone else also learned something new by looking at this different styles!

Make sure to also checkout the solutions on Hackernews, Reddit (and /r/haskell) and dev.to!

Cheers, Jorin

@stephen-smith
Copy link

Probably just use awk. Download separately. Do a few test/refinement runs on the first 10/100 lines, output a CSV and call it good. At least if I only had to do it once.

For something longer-term, I'd grab a JSON (parser) and CSV (formatter) library in... whatever language (Haskell is my favorite right now, but it's advantages are not leveraged here.) and do it that way. Once I was happy with the results of a slurp/process/dump with everything in memory, I'd try and refactor to something streaming, in case future dumps where to large to process without swapping.

@farzher
Copy link

farzher commented Apr 27, 2015

Only takes a minute in sublime text. Also doesn't require much mental effort or planning.

@prikhi
Copy link

prikhi commented Apr 27, 2015

curl, perl, sed ,grep, head, date!

URL=https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json
curl -s $URL | perl -pe 's/{"name":"(.*?)".*creditcard":"?(.*?)"?},?/\1, \2/' | sed 's/\[/Name, Credit Card/' | grep -v ', null' | head -n -1 > `date +%Y%m%d`.csv

@varunity
Copy link

Grabbed the file:

wget https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json

Opened in vim:

vim data.json
and then ran the following to remove all lines with "creditcard=null": g/reditcard":null/d

A couple lines of python took care of the rest:

import csv
import json

with open('data.json') as data_file:
    x = json.load(data_file)

f = csv.writer(open("20150425.csv", "wb+"))
f.writerow(["name", "creditcard"])
for z in x:
    f.writerow([z["name"],
                z["creditcard"]])

I like that though @farzher, pretty slick!

@prabhu1010
Copy link

It is less than a minute's work in vi editor and I'm not even an admin guy.
:%s/^{"name":"//g
:%s/","email.*":/,/g
:%s/"{0,1}},$//g
Goto line 1 and add "name,creditcard" then save the file with the desired filename.

Why use script when you have such a powerful Swiss Army Knife like vi 'search and replace' feature?

@varunity
Copy link

Thanks @prabhu1010, I'm not an admin guy either. Yes I was wondering how to just do everything in vi(m). Sometimes I think clearer in Python so I just did that but I agree, your technique is quite quick and elegant.

@draegtun
Copy link

Rebol 3 solution:

import http://reb4.me/r3/altjson

json: load-json https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6
file: to-file format/pad [-4 -2 -2] reduce [now/year now/month now/day ".csv"] 0
csv:  open/write file

write csv join "name,creditcard" newline

foreach rec json [
    unless none? rec/creditcard [
        write csv ajoin [rec/name "," rec/creditcard newline]
    ]
]

close csv

For alternative solution see - https://gist.github.com/draegtun/d47a06a5d11370121539

@rocky-jaiswal
Copy link

Here is more verbose Ruby (scripty) solution, I liked the Clojure version best though.

require 'net/http'
require 'json'
require 'fileutils'
require 'date'
require 'csv'

uri = URI('https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json')
response = Net::HTTP.get(uri)

raw_data = JSON.parse(response)
users_with_cc = raw_data.select do |entry|
  !entry["creditcard"].nil? && !entry["creditcard"].strip.empty?
end

file_name = Date.today.strftime '%Y%m%d' + ".csv"
csv_file  = File.join(FileUtils.pwd, file_name)

CSV.open(csv_file, "wb", {force_quotes: true, headers: true}) do |csv|
  csv << ['Name', 'Credit Card']
  users_with_cc.each do |user|
    csv << [user["name"], user["creditcard"]]
  end
end

@localhost
Copy link

Quick'n'dirty in Elixir (using poison to parse JSON)

require Poison.Parser

IO.puts "name,creditcard"
Poison.Parser.parse!(File.read!("data.json"), keys: :atoms)
  |> Enum.filter(fn(x) -> x[:creditcard] !== nil end)
  |> Enum.each(fn(x) -> IO.puts x[:name] <> "," <> x[:creditcard] end)

elixir -r poison/lib/poison/parser.ex cc.exs > 20150425.csv

@henrikbjorn
Copy link

Something in Golang

package main

import (
    "encoding/csv"
    "encoding/json"
    "io/ioutil"
    "os"
    "time"
)

type Row struct {
    Name       string `json:"name"`
    CreditCard string `json:"creditcard"`
}

func main() {
    var rows []Row

    b, err := ioutil.ReadFile("data.json")

    if err != nil {
        os.Exit(1)
    }

    if err := json.Unmarshal(b, &rows); err != nil {
        os.Exit(1)
    }

    f, err := os.Create(time.Now().Format("20060102") + ".csv")

    if err != nil {
        os.Exit(1)
    }

    w := csv.NewWriter(f)

    defer w.Flush()

    w.Write([]string{
        "Name",
        "Credit Card",
    })

    for _, r := range rows {
        if r.CreditCard == "" {
            continue
        }

        w.Write([]string{
            r.Name,
            r.CreditCard,
        })
    }
}

@swuecho
Copy link

swuecho commented Apr 27, 2015

json_xs <data.json -e '$_ = join "\n", "Name, Credit Card", map { $_->{creditcard} ?  "$_->{name}, $_->{creditcard}" : ()  } @$_'  -t string

this is also perl, using the json_xs provided by JSON::XS

@Magnap
Copy link

Magnap commented Apr 27, 2015

Here's my Haskell solution. By default it uses the supplied URL as the data source, but it can combine the data from any number of urls and/or files given as command line arguments.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}

module Main (main) where

import qualified System.Environment as E
import qualified Network.HTTP.Conduit as N
import qualified Data.Aeson as A
import qualified Data.Text as T
import qualified Data.ByteString.Lazy as B
import qualified Data.Maybe as DM
import qualified Data.Csv as C
import Data.Csv ((.=))
import qualified Data.Vector as V
import GHC.Generics
import qualified Data.Time as DT
import qualified Data.List as DL
import qualified Network.URI as U
import qualified Control.Monad as M

data Person = Person
  { name :: T.Text
  , creditcard :: T.Text
  } deriving (Show, Generic)

instance A.FromJSON Person

instance C.ToNamedRecord Person where
  toNamedRecord (Person name creditcard) = C.namedRecord [
    "Name" .= name, "Credit Card" .= creditcard]

defaultURL = "https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"

main = do
  args <- E.getArgs
  let (urls,files) = DL.partition U.isURI (if null args then [defaultURL] else args)
  M.when (null args) (putStrLn "No arguments provided. Downloading from default URL. Any number of files and/or URLs can be provided as arguments.")
  M.unless (null files) (putStrLn "Reading the following files:" >> mapM_ putStrLn files)
  M.unless (null urls) (putStrLn "Downloading from the following URLs:" >> mapM_ putStrLn urls)
  contents <- M.liftM2 (++) (mapM N.simpleHttp urls) (mapM B.readFile files)
  let lines = concatMap (map B.init . B.split 10) contents
      parse = DM.mapMaybe A.decode lines :: [Person]
      csv = C.encodeByName (V.fromList ["Name", "Credit Card"]) parse
  now <- DT.getCurrentTime
  let fileName = DT.formatTime DT.defaultTimeLocale "%Y%m%d.csv" now
  putStrLn $ "Writing to " ++ fileName
  B.writeFile fileName csv

@cgp
Copy link

cgp commented Apr 27, 2015

No love for Java? /s

import java.io.IOException;
import java.io.PrintWriter;
import java.net.MalformedURLException;
import java.net.URL;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.Scanner;


public class JavaChallenge {
    public static void main(String[] args) throws MalformedURLException, IOException {
        String url = "https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json";
        String in = new Scanner(new URL(url).openStream(), "UTF-8").useDelimiter("\\A").next();
        String[] entries = in.split("\n");
        StringBuilder sb = new StringBuilder("");
        for(String entry:entries) {
            String[] fields = entry.split("\"");
            if (fields.length < 10) continue;           
            if (!"creditcard".equals(fields[fields.length-2])) {
                sb.append("\""+fields[3]+","+fields[fields.length-2]+"\"\n");
            }           
        }
        LocalDate ld = LocalDate.now();     
        PrintWriter out = new PrintWriter(ld.format(DateTimeFormatter.ofPattern("YYYYMMdd"))+".csv");
        out.println(sb.toString());
        out.close();         
    }
}

@ebastos
Copy link

ebastos commented Apr 27, 2015

Only with standard Unix tools:

file=$(date +"%Y%m%d").csv; echo name,creditcard > $file; curl https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json|sed -r 's/(.*name":)(.*)(,"email.*creditcard":")(.*)}/\2,\4/g' |egrep -v "null|\[|\]"|cut -d "," -f1,2|tr -d '"' >> $file

@rupa
Copy link

rupa commented Apr 29, 2015

nice thing. quick and dirty as I did not have much time to act!

curl https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json | grep -v '"creditcard":null' | sed -e 's/.*name":"//' -e 's/".*creditcard":"/,/' -e 's/".*//' -e 's/^\[$/name,creditcard/' | grep -v '^]$' >  $(date +%Y%m%d).csv

@singareddyb
Copy link

Using a combination of bash\perl --

wget https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json; filename=`date +%Y%m%d`.csv; echo "Name , CreditCard" > $filename; perl -w -n -e 'print "$1 , $2\n" if(m/^{"name":"(.*)","email.*"creditcard":"(.*)"}[,]?$/)' data.json >> $filename

Then, I have a pure Perl approach --

#!/usr/bin/perl

use strict;
use warnings;

use IO::File;
use LWP::Simple;

my $file = 'data.json';
my $uri = 'https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json';

getstore($uri,$file);

my $rf = IO::File->new('data.json','r');
my $filename = calculateDateBasedFile();
my $wf = IO::File->new("$filename",'w');

$wf->print("Name , Credit\n");

while(defined($_ = $rf->getline)){

if(m/^{"name":"(.*)","email.*"creditcard":"(.*)"}[,]?$/)
  {
      $wf->print("$1 , $2\n");
  }

}

sub calculateDateBasedFile{

 my @time=localtime(time);
 my $year = 1900 + $time[5];
 my $month = $time[4] + 1;

  $month=date_quirk($month);

 my $day = $time[3];
  $day = date_quirk($day);


return $year.$month.$day.".csv";

}
sub date_quirk{

  my $m_or_d = shift;

  if($m_or_d < 10){

      $m_or_d = "0".$m_or_d;

  }
  return $m_or_d;
}

$wf->close;
$rf->close;

@xtradev
Copy link

xtradev commented May 15, 2015

\B=name,creditcard\n
\L{"name"\:"<U>"*"creditcard"\:"<U>"=$1,$3\n

gema -match -f json2csv.gema data.json > 20150515.csv

@raine
Copy link

raine commented May 16, 2015

Solution using ramda-cli:

#!/usr/bin/env bash

data_url=https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json
file=`date "+%Y%m%d"`.csv
curl $data_url | R 'filter where creditcard: (!= null)' 'project [\name \creditcard]' -o csv > $file

@snahor
Copy link

snahor commented May 24, 2015

The lovely awk:

curl -s https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json | awk -F '("[,:]"|"})' '{if ($12!="") print $2","$12}' > (date +%Y%m%d).csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment