Skip to content

Instantly share code, notes, and snippets.

@jorinvo
Last active April 21, 2023 17:14
Show Gist options
  • Star 156 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save jorinvo/2e43ffa981a97bc17259 to your computer and use it in GitHub Desktop.
Save jorinvo/2e43ffa981a97bc17259 to your computer and use it in GitHub Desktop.
This is a little challenge to find out which tools programmers use to get their everyday tasks done quickly.

You got your hands on some data that was leaked from a social network and you want to help the poor people.

Luckily you know a government service to automatically block a list of credit cards.

The service is a little old school though and you have to upload a CSV file in the exact format. The upload fails if the CSV file contains invalid data.

The CSV files should have two columns, Name and Credit Card. Also, it must be named after the following pattern:

YYYYMMDD.csv.

The leaked data doesn't have credit card details for every user and you need to pick only the affected users.

The data was published here:

https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json

You don't have much time to act.

What tools would you use to get the data, format it correctly and save it in the CSV file?


Do you have a crazy vim configuration that allows you to do all of this inside your editor? Are you a shell power user and write this as a one-liner? How would you solve this in your favorite programming language?

Show me your solution in the comments below!

Update

Thank you all for participating!

I never thought so many people might be willing to submit a solution. This is exactly the overview about different technologies and ways of thinking I anticipated to get.

We have solutions without any coding, solutions in one line of code and solutions with over a hundred lines.

I hope everyone else also learned something new by looking at this different styles!

Make sure to also checkout the solutions on Hackernews, Reddit (and /r/haskell) and dev.to!

Cheers, Jorin

@graiz
Copy link

graiz commented Apr 26, 2015

Code isn't always the answer.
Change the given URL to end in .CSV. The server does the right thing.
Open in excel or numbers and remove the columns you don't need. Search replace on the keywords that are extra and you are done.

@jasonm23
Copy link

Yep, Excel / Google Sheet / OpenOffice spreadsheet can all get this done in a few mins.

@brnls
Copy link

brnls commented Apr 26, 2015

In F#, using the JSON Type provider.

open FSharp.Data;
open System.IO;
open System;

type data = JsonProvider<"https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json">

[| yield "Name,CreditCard"
   yield! 
      data.GetSamples()
      |> Seq.filter(fun x -> x.Creditcard.IsSome)
      |> Seq.map(fun x -> String.Format("{0},{1}", x.Name, x.Creditcard.Value)) |]
|> (fun csvLines -> File.WriteAllLines(DateTime.Today.ToString("yyyyMMdd") + ".csv", csvLines))

@dhruvasagar
Copy link

Using http://stedolan.github.io/jq/ :

jq '.[] | select(.creditcard) | [.name, .creditcard] | join(",")' data.json > `date +'%Y%m%d'`.csv

@Gabriella439
Copy link

Haskell, using maps instead of structured records (to be more analogous to the original Ruby solution):

{-# LANGUAGE OverloadedLists   #-}
{-# LANGUAGE OverloadedStrings #-}

import Control.Lens (view)
import Data.Aeson (decode)
import Data.ByteString (ByteString)
import qualified Data.ByteString.Lazy as Bytes
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Csv (encodeByName)
import Data.Text (Text)
import Data.Text.Encoding (encodeUtf8)
import Data.Time (getCurrentTime, formatTime)
import Network.Wreq (get, responseBody)
import System.Locale (defaultTimeLocale)

uri = "https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"

main = do
    -- Download and decode JSON from the given URI
    response <- get uri
    let Just records = decode (view responseBody response)

    -- CSV library only accepts bytes for headers, not text
    let toUtf8 :: [Map Text (Maybe Text)] -> [Map ByteString (Maybe Text)]
        toUtf8 = map (Map.mapKeys encodeUtf8)

    -- CSV-encode the result and write out to a timestamped file
    time <- getCurrentTime
    let file = formatTime defaultTimeLocale "%Y%m%d.csv" time
    Bytes.writeFile file (encodeByName ["name", "creditcard"] (toUtf8 records))

Also, this actually downloads from a URI instead of reading from a local file.

@ralt
Copy link

ralt commented Apr 26, 2015

Common Lisp

#!/usr/bin/sbcl --script

(load "~/.sbclrc")

(ql:quickload :drakma)
(ql:quickload :jsown)
(ql:quickload :local-time)

(defvar *url* "https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json")

(defun generate-filename ()
  (concatenate
   'string
   (local-time:format-timestring nil (local-time:now)
                                 :format '(:year (:month 2) (:day 2)))
   ".csv"))

(let ((data (jsown:parse (drakma:http-request *url*))))
  (with-open-file (f (generate-filename) :direction :output :if-does-not-exist :create)
    (format f "name,creditcard~%")
    (dolist (line data)
      (format f "~A,~A~%"
                   (jsown:val line "name")
                   (jsown:val line "creditcard")))))

@smihir
Copy link

smihir commented Apr 26, 2015

another way in R:

library('jsonlite')
url <- 'https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json'
cvec <- c('name', 'creditcard')
jdata <- fromJSON(url)[, cvec]
jdata <- jdata[complete.cases(jdata),]
colnames(jdata) <- c('Name', 'Credit Card')
write.csv(jdata, format(Sys.Date(), "%Y%m%d.csv"), row.names = FALSE)

@rendall
Copy link

rendall commented Apr 26, 2015

I cut and pasted this into the javascript console of an 'about:blank' in chrome. It worked even though I got some syntax 'end of line' errors in the data.

var xmlhttp = new XMLHttpRequest();
xmlhttp.onreadystatechange = function() {

var allData = JSON.parse(xmlhttp.response);

for (var i = 0; i < allData.length; i++){

    var entry = allData[i];
    if (entry.creditcard == null) continue;
    var elemDiv = document.createElement('div');
    elemDiv.innerText = entry.name +"," + entry.creditcard;
    document.body.appendChild(elemDiv);

}
    }
xmlhttp.open("GET", "https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json", true);
    xmlhttp.send();

@itamarhaber
Copy link

For this you must use Redis' Lua scripting and Sorted Sets to get it sorted by name of course (as well as wget & date):

wget -qO- https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json | redis-cli -x set j; redis-cli --raw EVAL 'redis.call("DEL", KEYS[2]) redis.call("ZADD", KEYS[2], 0, "Name, Credit Card") for i,l in pairs(cjson.decode(redis.call("GET", KEYS[1]))) do local cc = l["creditcard"] if type(cc) == "string" then redis.call("ZADD", KEYS[2], 1, l["name"] .. ", " .. cc) end end return(redis.call("ZRANGEBYSCORE", KEYS[2], 0, 1))' 2 j o > $(date +%Y%m%d).csv

@tel
Copy link

tel commented Apr 26, 2015

I quite like Gabriel's. Here's an example which uses a different kind of trick—lenses as opposed to very general automatic JSON decoding.

{-# LANGUAGE OverloadedStrings #-}

import Control.Lens
import Data.Aeson
import Data.Aeson.Lens
import Data.Monoid
import System.Locale (defaultTimeLocale)
import qualified Data.ByteString.Lazy as Sl
import qualified Data.Text as Tx
import qualified Data.Text.Encoding as TxE
import qualified Data.Time as Time
import qualified Network.Wreq as Wreq

-- "name" and "creditcard"
type Name = Tx.Text
type CC   = Tx.Text

relevant :: Value -> Maybe (Name, CC)
relevant v = do
  name <- v ^? key "name"       . _String
  cc   <- v ^? key "creditcard" . _String
  return (name, cc)

getData :: IO [(Name, CC)]
getData = do
  resp <- Wreq.get "https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"
  return $ toListOf
    (Wreq.responseBody . _Array . each . to relevant . _Just)
    resp

mkFile :: [(Name, CC)] -> Sl.ByteString
mkFile pairs =
  "name, creditcard\n" <>
  foldr build mempty pairs
  where
    encode text = Sl.fromStrict (TxE.encodeUtf8 text)
    build (name, cc) rest = encode name <> ", " <> encode cc <> "\n" <> rest

writeOut :: [(Name, CC)] -> IO ()
writeOut pairs = do
  now <- Time.getCurrentTime
  let filename = Time.formatTime defaultTimeLocale "%Y%m%d.csv" now
  Sl.writeFile filename (mkFile pairs)

main :: IO ()
main = getData >>= writeOut

@opyate
Copy link

opyate commented Apr 26, 2015

A minute's looking around shows me that someone has already done the work: https://gist.githubusercontent.com/jorin-vogel/09c6e7dcd7a70a475fa7/raw/44bb65f63681211ce17bde16519d5ce3d343af8e/20150425.csv

(yes, I know, I know)

@stephen-smith
Copy link

Probably just use awk. Download separately. Do a few test/refinement runs on the first 10/100 lines, output a CSV and call it good. At least if I only had to do it once.

For something longer-term, I'd grab a JSON (parser) and CSV (formatter) library in... whatever language (Haskell is my favorite right now, but it's advantages are not leveraged here.) and do it that way. Once I was happy with the results of a slurp/process/dump with everything in memory, I'd try and refactor to something streaming, in case future dumps where to large to process without swapping.

@farzher
Copy link

farzher commented Apr 27, 2015

Only takes a minute in sublime text. Also doesn't require much mental effort or planning.

@prikhi
Copy link

prikhi commented Apr 27, 2015

curl, perl, sed ,grep, head, date!

URL=https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json
curl -s $URL | perl -pe 's/{"name":"(.*?)".*creditcard":"?(.*?)"?},?/\1, \2/' | sed 's/\[/Name, Credit Card/' | grep -v ', null' | head -n -1 > `date +%Y%m%d`.csv

@varunity
Copy link

Grabbed the file:

wget https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json

Opened in vim:

vim data.json
and then ran the following to remove all lines with "creditcard=null": g/reditcard":null/d

A couple lines of python took care of the rest:

import csv
import json

with open('data.json') as data_file:
    x = json.load(data_file)

f = csv.writer(open("20150425.csv", "wb+"))
f.writerow(["name", "creditcard"])
for z in x:
    f.writerow([z["name"],
                z["creditcard"]])

I like that though @farzher, pretty slick!

@prabhu1010
Copy link

It is less than a minute's work in vi editor and I'm not even an admin guy.
:%s/^{"name":"//g
:%s/","email.*":/,/g
:%s/"{0,1}},$//g
Goto line 1 and add "name,creditcard" then save the file with the desired filename.

Why use script when you have such a powerful Swiss Army Knife like vi 'search and replace' feature?

@varunity
Copy link

Thanks @prabhu1010, I'm not an admin guy either. Yes I was wondering how to just do everything in vi(m). Sometimes I think clearer in Python so I just did that but I agree, your technique is quite quick and elegant.

@draegtun
Copy link

Rebol 3 solution:

import http://reb4.me/r3/altjson

json: load-json https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6
file: to-file format/pad [-4 -2 -2] reduce [now/year now/month now/day ".csv"] 0
csv:  open/write file

write csv join "name,creditcard" newline

foreach rec json [
    unless none? rec/creditcard [
        write csv ajoin [rec/name "," rec/creditcard newline]
    ]
]

close csv

For alternative solution see - https://gist.github.com/draegtun/d47a06a5d11370121539

@rocky-jaiswal
Copy link

Here is more verbose Ruby (scripty) solution, I liked the Clojure version best though.

require 'net/http'
require 'json'
require 'fileutils'
require 'date'
require 'csv'

uri = URI('https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json')
response = Net::HTTP.get(uri)

raw_data = JSON.parse(response)
users_with_cc = raw_data.select do |entry|
  !entry["creditcard"].nil? && !entry["creditcard"].strip.empty?
end

file_name = Date.today.strftime '%Y%m%d' + ".csv"
csv_file  = File.join(FileUtils.pwd, file_name)

CSV.open(csv_file, "wb", {force_quotes: true, headers: true}) do |csv|
  csv << ['Name', 'Credit Card']
  users_with_cc.each do |user|
    csv << [user["name"], user["creditcard"]]
  end
end

@localhost
Copy link

Quick'n'dirty in Elixir (using poison to parse JSON)

require Poison.Parser

IO.puts "name,creditcard"
Poison.Parser.parse!(File.read!("data.json"), keys: :atoms)
  |> Enum.filter(fn(x) -> x[:creditcard] !== nil end)
  |> Enum.each(fn(x) -> IO.puts x[:name] <> "," <> x[:creditcard] end)

elixir -r poison/lib/poison/parser.ex cc.exs > 20150425.csv

@henrikbjorn
Copy link

Something in Golang

package main

import (
    "encoding/csv"
    "encoding/json"
    "io/ioutil"
    "os"
    "time"
)

type Row struct {
    Name       string `json:"name"`
    CreditCard string `json:"creditcard"`
}

func main() {
    var rows []Row

    b, err := ioutil.ReadFile("data.json")

    if err != nil {
        os.Exit(1)
    }

    if err := json.Unmarshal(b, &rows); err != nil {
        os.Exit(1)
    }

    f, err := os.Create(time.Now().Format("20060102") + ".csv")

    if err != nil {
        os.Exit(1)
    }

    w := csv.NewWriter(f)

    defer w.Flush()

    w.Write([]string{
        "Name",
        "Credit Card",
    })

    for _, r := range rows {
        if r.CreditCard == "" {
            continue
        }

        w.Write([]string{
            r.Name,
            r.CreditCard,
        })
    }
}

@swuecho
Copy link

swuecho commented Apr 27, 2015

json_xs <data.json -e '$_ = join "\n", "Name, Credit Card", map { $_->{creditcard} ?  "$_->{name}, $_->{creditcard}" : ()  } @$_'  -t string

this is also perl, using the json_xs provided by JSON::XS

@Magnap
Copy link

Magnap commented Apr 27, 2015

Here's my Haskell solution. By default it uses the supplied URL as the data source, but it can combine the data from any number of urls and/or files given as command line arguments.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}

module Main (main) where

import qualified System.Environment as E
import qualified Network.HTTP.Conduit as N
import qualified Data.Aeson as A
import qualified Data.Text as T
import qualified Data.ByteString.Lazy as B
import qualified Data.Maybe as DM
import qualified Data.Csv as C
import Data.Csv ((.=))
import qualified Data.Vector as V
import GHC.Generics
import qualified Data.Time as DT
import qualified Data.List as DL
import qualified Network.URI as U
import qualified Control.Monad as M

data Person = Person
  { name :: T.Text
  , creditcard :: T.Text
  } deriving (Show, Generic)

instance A.FromJSON Person

instance C.ToNamedRecord Person where
  toNamedRecord (Person name creditcard) = C.namedRecord [
    "Name" .= name, "Credit Card" .= creditcard]

defaultURL = "https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"

main = do
  args <- E.getArgs
  let (urls,files) = DL.partition U.isURI (if null args then [defaultURL] else args)
  M.when (null args) (putStrLn "No arguments provided. Downloading from default URL. Any number of files and/or URLs can be provided as arguments.")
  M.unless (null files) (putStrLn "Reading the following files:" >> mapM_ putStrLn files)
  M.unless (null urls) (putStrLn "Downloading from the following URLs:" >> mapM_ putStrLn urls)
  contents <- M.liftM2 (++) (mapM N.simpleHttp urls) (mapM B.readFile files)
  let lines = concatMap (map B.init . B.split 10) contents
      parse = DM.mapMaybe A.decode lines :: [Person]
      csv = C.encodeByName (V.fromList ["Name", "Credit Card"]) parse
  now <- DT.getCurrentTime
  let fileName = DT.formatTime DT.defaultTimeLocale "%Y%m%d.csv" now
  putStrLn $ "Writing to " ++ fileName
  B.writeFile fileName csv

@cgp
Copy link

cgp commented Apr 27, 2015

No love for Java? /s

import java.io.IOException;
import java.io.PrintWriter;
import java.net.MalformedURLException;
import java.net.URL;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.Scanner;


public class JavaChallenge {
    public static void main(String[] args) throws MalformedURLException, IOException {
        String url = "https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json";
        String in = new Scanner(new URL(url).openStream(), "UTF-8").useDelimiter("\\A").next();
        String[] entries = in.split("\n");
        StringBuilder sb = new StringBuilder("");
        for(String entry:entries) {
            String[] fields = entry.split("\"");
            if (fields.length < 10) continue;           
            if (!"creditcard".equals(fields[fields.length-2])) {
                sb.append("\""+fields[3]+","+fields[fields.length-2]+"\"\n");
            }           
        }
        LocalDate ld = LocalDate.now();     
        PrintWriter out = new PrintWriter(ld.format(DateTimeFormatter.ofPattern("YYYYMMdd"))+".csv");
        out.println(sb.toString());
        out.close();         
    }
}

@ebastos
Copy link

ebastos commented Apr 27, 2015

Only with standard Unix tools:

file=$(date +"%Y%m%d").csv; echo name,creditcard > $file; curl https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json|sed -r 's/(.*name":)(.*)(,"email.*creditcard":")(.*)}/\2,\4/g' |egrep -v "null|\[|\]"|cut -d "," -f1,2|tr -d '"' >> $file

@rupa
Copy link

rupa commented Apr 29, 2015

nice thing. quick and dirty as I did not have much time to act!

curl https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json | grep -v '"creditcard":null' | sed -e 's/.*name":"//' -e 's/".*creditcard":"/,/' -e 's/".*//' -e 's/^\[$/name,creditcard/' | grep -v '^]$' >  $(date +%Y%m%d).csv

@singareddyb
Copy link

Using a combination of bash\perl --

wget https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json; filename=`date +%Y%m%d`.csv; echo "Name , CreditCard" > $filename; perl -w -n -e 'print "$1 , $2\n" if(m/^{"name":"(.*)","email.*"creditcard":"(.*)"}[,]?$/)' data.json >> $filename

Then, I have a pure Perl approach --

#!/usr/bin/perl

use strict;
use warnings;

use IO::File;
use LWP::Simple;

my $file = 'data.json';
my $uri = 'https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json';

getstore($uri,$file);

my $rf = IO::File->new('data.json','r');
my $filename = calculateDateBasedFile();
my $wf = IO::File->new("$filename",'w');

$wf->print("Name , Credit\n");

while(defined($_ = $rf->getline)){

if(m/^{"name":"(.*)","email.*"creditcard":"(.*)"}[,]?$/)
  {
      $wf->print("$1 , $2\n");
  }

}

sub calculateDateBasedFile{

 my @time=localtime(time);
 my $year = 1900 + $time[5];
 my $month = $time[4] + 1;

  $month=date_quirk($month);

 my $day = $time[3];
  $day = date_quirk($day);


return $year.$month.$day.".csv";

}
sub date_quirk{

  my $m_or_d = shift;

  if($m_or_d < 10){

      $m_or_d = "0".$m_or_d;

  }
  return $m_or_d;
}

$wf->close;
$rf->close;

@xtradev
Copy link

xtradev commented May 15, 2015

\B=name,creditcard\n
\L{"name"\:"<U>"*"creditcard"\:"<U>"=$1,$3\n

gema -match -f json2csv.gema data.json > 20150515.csv

@raine
Copy link

raine commented May 16, 2015

Solution using ramda-cli:

#!/usr/bin/env bash

data_url=https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json
file=`date "+%Y%m%d"`.csv
curl $data_url | R 'filter where creditcard: (!= null)' 'project [\name \creditcard]' -o csv > $file

@snahor
Copy link

snahor commented May 24, 2015

The lovely awk:

curl -s https://gist.githubusercontent.com/jorin-vogel/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json | awk -F '("[,:]"|"})' '{if ($12!="") print $2","$12}' > (date +%Y%m%d).csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment