Skip to content

Instantly share code, notes, and snippets.

@arademaker
Last active February 25, 2021 02:18
Show Gist options
  • Save arademaker/cc5620f802f2296a0a1bd2e57a831ed5 to your computer and use it in GitHub Desktop.
Save arademaker/cc5620f802f2296a0a1bd2e57a831ed5 to your computer and use it in GitHub Desktop.
simple data work
pushd ../../raw
wc -l *.raw | awk '$1 < 6 {print $2}' > ../ner/wks/files.all
popd
jq -r '.[].name' documents.json | awk '{gsub(/txt/,"raw"); print}' > files.in
(defun read-lines (fn)
  (with-open-file (in fn)
    (loop for line = (read-line in nil nil)
	    while line
	    collect line)))

(let* ((al (read-lines "files.all"))
       (in (read-lines "files.in"))
       (d  (set-difference al in :test #'equal)))
  (loop with len = (length d)
	  for x from 0 to 10 
	  collect (nth (random len) d)))
for f in $files; do cp ../../raw/$f upload/$(basename $f .raw).txt; done
putStrLn $ show (1 + 1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment