Skip to content

Instantly share code, notes, and snippets.

@tphummel
Last active January 16, 2018 03:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tphummel/0045124009192f59dcd4c90df7f9eec6 to your computer and use it in GitHub Desktop.
Save tphummel/0045124009192f59dcd4c90df7f9eec6 to your computer and use it in GitHub Desktop.
Find Streaks in Daily Data with Sparse/Missing Days

install libraries

brew update
brew install tphummel/util/date-range
brew install tphummel/util/streak
brew install jq
brew install csvkit

create input dataset

cat << EOF > sparse.csv
"category 4",2018-01-01,3
"category 4",2018-01-02,1
"category 4",2018-01-04,0
"category 4",2018-01-05,1
"category 4",2018-01-08,1
"category 4",2018-01-10,5
"category 4",2018-01-11,6
"category 4",2018-01-12,5
"category 4",2018-01-13,7
"category 4",2018-01-15,1
"category 4",2018-01-19,1
"category 4",2018-01-21,5
"category 4",2018-01-22,6
"category 4",2018-01-23,5
"category 4",2018-01-24,5
"category 4",2018-01-25,7
"category 4",2018-01-27,5
"category 4",2018-01-28,9
"category 4",2018-01-30,1
"category 4",2018-01-31,2
EOF

compile list of streaks

date-range 2018-01-01 2018-02-15 | \
  jq -r '.[]' | \
  csvjoin -H -c "2,1" --outer ./sparse.csv - 2>/dev/null | \
  csvcut -c 4,1,3 | \
  csvsort -c 1 | \
  csvjson | \
  jq ".[] | [.a2, .a, .c]" | \
  jq --slurp "." | \
  streak --label 0 --column 2 --min 5 | \
  jq -r ".[] | [.start, .end, .value] | @csv" | \
  csvsort -H -c 3 -r 2>/dev/null | \
  sed '1 s/.*/start,end,streak/' | \
  csvlook

result

|      start |        end | streak |
| ---------- | ---------- | ------ |
| 2018-01-21 | 2018-01-25 |      5 |
| 2018-01-10 | 2018-01-13 |      4 |
| 2018-01-27 | 2018-01-28 |      2 |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment