tomsaleeba/README.md

## README.md

      
    Raw
  

              README.md
            
          
    ttlcat

When you have a series of *.ttl files in a directory and you want to cat them all together, you need to make sure you strip out the @prefix and only prepend it once to the output.
Use the following commands
# run *in* the directory with the TTL files
head -n 50 -q *.ttl | grep '^@prefix' | sort -u > header
time cat *.ttl | grep -v '^@prefix' | cat header - | gzip > $(basename $(pwd)).ttl.gz
rm header
echo "output is $(basename $(pwd)).ttl.gz"
Or, if you want a one-liner (formatted over multiple lines) that can pipe to S3, use:
head -n 50 -q *.ttl | grep '^@prefix' | sort -u > header && \
time cat *.ttl | grep -v '^@prefix' | cat header - | gzip | aws s3 cp - s3://<bucket>/$(basename $(pwd)).ttl.gz; \
rm -f header

...just be sure to update the <bucket> placeholder with your S3 bucket name.
Limitations


Doesn't support spaces in the directory name
Doesn't support more than 50 @prefix lines. Just bump up the -n arg to head if you need more.