Skip to content

Instantly share code, notes, and snippets.

@caot
Last active August 29, 2015 14:04
Show Gist options
  • Save caot/9da63b497859fa689b37 to your computer and use it in GitHub Desktop.
Save caot/9da63b497859fa689b37 to your computer and use it in GitHub Desktop.
tcl script to parse and clean datafile. the language seems one of the worst.
#! /bin/env tclsh
proc clean {infilename outfilename} {
set delimiter "|"
set char_to_be_removed "^"
set newline "\x0d"
set infile [open $infilename r]
set outfile [open $outfilename "w"]
set file_data [read $infile]
set data [split $file_data "\n"]
foreach block $data {
foreach ln [split $block $newline] {
set ln [string trimright $ln $delimiter]
set ln_ [split $ln $delimiter]
set ln_out {}
set i 0
foreach x $ln_ {
if {[string match "PID*" $ln] && 13 == $i} {
set x_out {}
foreach y [split $x "~"] {
set y [string trimright $y $char_to_be_removed]
lappend x_out $y
}
set x [join $x_out "~"]
}
if {[string match "DG1*" $ln] && $i == 9 && $x == "\"\""} {
set x ""
}
if {[string match "*$char_to_be_removed" $x]} {
set x [string trimright $x $char_to_be_removed]
}
lappend ln_out $x
set i [expr $i + 1]
}
puts $outfile [join $ln_out $delimiter]
}
}
close $infile
close $outfile
}
if {$argc == 1 || $argc == 2 && [string match "*dataclean.tcl" $argv0]} {
set infilename [lindex $argv 0]
if {$argc == 1} {
set outfilename "$infilename.cleaned"
} elseif {$argc == 2} {
set outfilename [lindex $argv 1]
}
clean $infilename $outfilename
puts "Done with output file: $outfilename"
} else {
puts "
How to use:
./dataclean.tcl input_filename
note: output_filename ends with input_filename.cleaned
./dataclean.tcl input_filename output_filename"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment