Skip to content

Instantly share code, notes, and snippets.

@aishfenton
Created June 19, 2013 20:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aishfenton/5817920 to your computer and use it in GitHub Desktop.
Save aishfenton/5817920 to your computer and use it in GitHub Desktop.
Converts file from TSV format with quoted fields into a format compatible with Hadoop Pig.
#!/usr/bin/env awk -f
BEGIN {
SEP="\t"
FS="\"" SEP "\"|^\"";
}
{
out=$2;
for (i=3;i<=NF-1;i++) {
gsub(/\t/, "_", $i);
out=out"" SEP ""$i
};
# Last column
gsub(/"\r?$/, "", $NF);
gsub(/\t/, "_", $NF);
out=out "" SEP "" $NF
if (NR!=1) {print out}
}
@aishfenton
Copy link
Author

Also strips off header row

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment