Skip to content

Instantly share code, notes, and snippets.

@cgravier
Created January 27, 2014 22:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cgravier/8658389 to your computer and use it in GitHub Desktop.
Save cgravier/8658389 to your computer and use it in GitHub Desktop.
Generates BSBM (http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/) dataset of 100k, 200k, 500k, 1M, 5M, 10M, 25M, 50M triples in n-triples format.
#!/bin/bash
datasetssize=( 256 527 1369 2808 14212 28453 71431 143700 288114 )
for dim in "${datasetssize[@]}"
do
echo "Generating dataset for $dim products..."
java -cp .:lib/bsbm.jar:lib/jdom.jar:lib/log4j-1.2.12.jar:lib/ssj.jar -Xmx256M benchmark.generator.Generator -pc $dim -s nt -fn datasettmp
NB=`more datasettmp.nt | wc -l`
mv datasettmp.nt dataset_$NB.nt
echo "done."
done
@cgravier
Copy link
Author

For the settings in this gist, I lazily rename generated files using :

mv dataset_99914.nt dataset_100k.nt
mv dataset_200007.nt dataset_200k.nt
mv dataset_500037.nt dataset_500k.nt
mv dataset_1000000.nt dataset_1M.nt
mv dataset_5000000.nt dataset_5M.nt
mv dataset_10000159.nt dataset_10M.nt
mv dataset_25000172.nt dataset_25M.nt
mv dataset_50000144.nt dataset_50M.nt
mv dataset_99999805.nt .ntdataset_100M.nt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment