This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package fixed2idiom { | |
rule _MISC_EXTPOS { | |
pattern { N[_MISC_EXTPOS] } | |
commands { | |
N._MISC_ExtPos = N._MISC_EXTPOS; | |
del_feat N._MISC_EXTPOS; | |
N._MISC_PhraseType = Idiom; | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package fixed2idiom { | |
rule EXTPOS { | |
pattern { N[EXTPOS] } | |
commands { | |
N.ExtPos = N.EXTPOS; | |
del_feat N.EXTPOS; | |
N.PhraseType = Idiom; | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
% GRS system to convert French verb "pouvoir", "devoir", "vouloir" and "aller" when they introduce a completive | |
% They were considered as AUX previously, but are now plain VERB in all French corpora | |
% In all rules above, the "without" clause with "AUX2" node is there to ensure that the outermost aux is considered first. | |
% This is needed in case of multiple tranformation. Ex: `fr_partut-ud-851` "…la mère peut aller se reproduire…" | |
% Move negative adverbs on the old AUX | |
rule neg { | |
pattern { | |
AUX[upos=AUX, lemma=pouvoir|devoir|vouloir|aller]; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rule det_N-Det{ | |
pattern{ | |
N[upos=NOUN]; | |
Det[upos=DET, PronType=DEM]; | |
%Det << N; | |
e: N-[det]->Det; | |
e.length = 2 | |
} | |
commands{ | |
add_edge f: N->Det; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rule r { | |
pattern { | |
N1 []; | |
N2 [textform="_"]; | |
N1 < N2; | |
N1 -> N2; | |
} | |
commands { | |
N1.form = N1.textform; | |
del_node N2; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# sent_id = en_partut-ud-1610 | |
# text = they were unimpressed. | |
1 they they PRON PE Number=Plur|Person=3|PronType=Prs 3 nsubj _ _ | |
2 were be AUX V Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin 3 cop _ _ | |
3 unimpressed unimpressed ADJ A Degree=Pos 0 root _ SpaceAfter=No | |
4 . . PUNCT FS _ 3 punct _ _ | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"META": [ | |
"sent_id", | |
"text", | |
"text_en", | |
"text_ortho", | |
"speaker_id", | |
"sound_url" | |
], | |
"UPOS": [ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from bs4 import BeautifulSoup | |
import requests | |
html_text = requests.get('https://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_feature.pl').text | |
soup = BeautifulSoup(html_text, features="lxml") | |
for a in soup.find_all('a'): | |
if len(a.text) > 1: # skip links to other similar pages | |
lang_code = a['href'].split('=')[1] | |
print (f'{a.text}\t{lang_code}') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SENTENCE_BY_FILE = 1000 | |
MAX_FILES=1000 | |
import sys, os | |
from xml.etree.ElementTree import iterparse | |
def us (s): | |
return s if s != "" else "_" | |
def to_conllu (sent_elem, fd=sys.stdout): |