When you have a series of *.ttl files in a directory and you want to cat them all together, you need to make sure you strip out the @prefix
and only prepend it once to the output.
Use the following commands
# run *in* the directory with the TTL files
head -n 50 -q *.ttl | grep '^@prefix' | sort -u > header
time cat *.ttl | grep -v '^@prefix' | cat header - | gzip > $(basename $(pwd)).ttl.gz
rm header
echo "output is $(basename $(pwd)).ttl.gz"
Or, if you want a one-liner (formatted over multiple lines) that can pipe to S3, use:
head -n 50 -q *.ttl | grep '^@prefix' | sort -u > header && \
time cat *.ttl | grep -v '^@prefix' | cat header - | gzip | aws s3 cp - s3://<bucket>/$(basename $(pwd)).ttl.gz; \
rm -f header
...just be sure to update the <bucket>
placeholder with your S3 bucket name.
- Doesn't support spaces in the directory name
- Doesn't support more than 50
@prefix
lines. Just bump up the-n
arg tohead
if you need more.
Great chief thanks.