Skip to content

Instantly share code, notes, and snippets.

@alrojo
Created March 2, 2018 11:56
Show Gist options
  • Save alrojo/d8adbd65c8fa80f327a8a7cfe01526a4 to your computer and use it in GitHub Desktop.
Save alrojo/d8adbd65c8fa80f327a8a7cfe01526a4 to your computer and use it in GitHub Desktop.
# made by Alexander Rosenberg Johansen
# BSD-3 license
import glob
import subprocess
test_paths_2014 = [
'data/test-full/newstest2014-deen-src.en.sgm',
'data/test-full/newstest2014-deen-src.de.sgm']
test_paths = test_paths_2014
to_paths_2014 = [
'data/test-full/newstest2014.deen.en',
'data/test-full/newstest2014.deen.de']
to_paths = to_paths_2014
for test_path, to_path in zip(test_paths, to_paths):
call = 'grep "^<seg" %s | sed "s/<\/\?[^>]\+>//g" > %s' %(test_path, to_path)
subprocess.call(call, shell=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment