Skip to content

Instantly share code, notes, and snippets.

@winter-code
Last active September 17, 2021 15:45
Show Gist options
  • Save winter-code/f39afa1cf7213510728994d75a3a9368 to your computer and use it in GitHub Desktop.
Save winter-code/f39afa1cf7213510728994d75a3a9368 to your computer and use it in GitHub Desktop.
Converting plain textual files to JSONL format
# python2 <script> gs://<path_to_src_pdf> gs://<dest_bucket>/
# Converting one text file to JSONL format
# File name: src.pdf | GCS path for file: gs://test-bucket-for-automl-nlp/src.pdf
python2 input_helper_v2.py gs://test-bucket-for-automl-nlp/src.pdf gs://test-bucket-for-automl-nlp/
# Converting multiple text files with same extension to JSONL format
# File extension: *.pdf | GCS path for files: gs://test-bucket-for-automl-nlp/*.pdf
python2 input_helper_v2.py gs://test-bucket-for-automl-nlp/*.pdf gs://test-bucket-for-automl-nlp/
# File extension: *.txt | GCS path for files: gs://test-bucket-for-automl-nlp/*.txt
python2 input_helper_v2.py gs://test-bucket-for-automl-nlp/*.txt gs://test-bucket-for-automl-nlp/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment