Skip to content

Instantly share code, notes, and snippets.

@rdm
Created January 30, 2014 22:07
Show Gist options
  • Save rdm/8720980 to your computer and use it in GitHub Desktop.
Save rdm/8720980 to your computer and use it in GitHub Desktop.
Draft: extract data from wavxml files (or similar xml files)
filename=: 'name'&extract_fromxmlfile_
directory=: 'hash'&extract_fromxmlfile_
sosgatetime=: ('utterance';'startofspeechgate';'time')&extract_fromxmlfile_
sosgatevalue=: ('utterance';'startofspeechgate';'value')&extract_fromxmlfile_
sostime=: ('utterance';'startofspeech';'time')&extract_fromxmlfile_
sosvalue=: ('utterance';'startofspeech';'value')&extract_fromxmlfile_
eostime=: ('utterance';'endofspeech';'time')&extract_fromxmlfile_
eosvalue=: ('utterance';'endofspeech';'value')&extract_fromxmlfile_
finalresult=: ('utterance';'finalresult';'value')&extract_fromxmlfile_
confidence=: ('utterance';'finalresult';'confidence')&extract_fromxmlfile_
require 'xml/sax'
saxclass 'fromxmlfile'
startDocument=: 3 :0
Path=: ''
Result=: ''
)
startElement=: 4 :0
Path=: Path,<y
)
characters=: 3 :0
if. Target -: Path do.
Result=: Result, y
end.
)
endElement=: 3 :0
Path=: }: Path
)
endDocument=: 3 :0
Result
)
extract=: 4 :0
select. Target=: x
case. 'name' do. _1 {:: <;._1 '\',hostpathsep y
case. 'hash' do. _2 {:: <;._1 '\',hostpathsep y
case. do. process fread y
end.
)
@rdm
Copy link
Author

rdm commented Jan 30, 2014

To generate csv file whose first row is headers:

extractcore=:filename;directory;sosgatetime;sosgatevalue;sostime;sosvalue;eostime;eosvalue;finalresult;confidence

coreheaders=:'Filename';'Directory';'SOSGateTime';'SOSGateValue';'SOSTime';'SOSValue';'EOSTime';'EOSValue';'FinalResult';'Confidence'

require'csv'
generateCoreCsvFile=: 4 :0
NB. x: csv file name
NB. y: boxed list of file names to extract content from
x writecsv~ coreheaders,> extractcore each y
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment