Skip to content

Instantly share code, notes, and snippets.

@DavidBerdik
Created April 20, 2020 00:12
Show Gist options
  • Save DavidBerdik/2a38295f2cc9b6ca54a4e624060bff71 to your computer and use it in GitHub Desktop.
Save DavidBerdik/2a38295f2cc9b6ca54a4e624060bff71 to your computer and use it in GitHub Desktop.
Python script to generate a JGAAP-Compatible Corpus CSV from a CSV of authors and their text.
# Example Usage: python split.py yourInputCSVHere.csv
import csv, sys
incsv = open(sys.argv[1], 'r')
outcsv = open('new-' + sys.argv[1], 'w')
counter = 1
csvreader = csv.reader(incsv, delimiter=',', quotechar='"')
for row in csvreader:
outcsv.write(str(row[0]) + ',file' + str(counter) + '.txt,file' + str(counter) + '.txt by ' + str(row[0]) + '\n')
doc = open('file' + str(counter) + '.txt', 'w')
doc.write(row[1])
doc.close()
counter += 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment