Last active July 9, 2020 07:47
Setup Virtuoso running in a Docker container: Load data and option for RAM disk

A Docker container running Virtuoso

Lately I've been using Virtuoso for running some SPARQL. Here is my quick setup. (This has been posted also on my personal page)

Setup Docker container for virtuoso

    docker pull openlink/virtuoso-opensource-7:latest

    mkdir -p database
    cp  virtuoso.ini.example database/virtuoso.ini

    mkdir -p import

    docker run --name vos -d \
               -v `pwd`/database:/database \
               -v `pwd`/import:/import \
               -t -p 1111:1111 -p 8890:8890 -i openlink/virtuoso-opensource-7:latest

The commands above require a custom virtuoso.ini file (provided here). The main edits are based on my need to query a large dataset and I needed to process large resultsets. More information on the parameters are found on the official documentation.

My edits below are for a machine with ~64GB of RAM, and may not be optimal in general, so YMMV.

  1. Allow the /import folder where to put our files to be imported

    DirsAllowed		= ., /opt/virtuoso-opensource/vad, /import
  2. Change memory size thresholds: uncomment the following lines, and comment below the corresponding two (comment with ;)

    NumberOfBuffers  = 4000000	
    MaxDirtyBuffers  = 3000000
    ;NumberOfBuffers = 10000
    ;MaxDirtyBuffers = 6000

    few lines earlier you may want to change also

    MaxQueryMem    = 4G		; memory allocated to query processor
    VectorSize     = 2000		; initial parallel query vector (array of query operations) size
    MaxVectorSize  = 20000000	; query vector size threshold.
  3. Longer keep alive for large queries

    KeepAliveTimeout	= 30
  4. Allow for larger resultsets

    ResultSetMaxRows            = 50000
    MaxQueryCostEstimationTime  = 0	; in seconds
    MaxQueryExecutionTime       = 600	; in seconds

To use a RAM Disk (in the example of size 8GB)

This is in READ ONLY to have faster query performance. All edits will be lost.

sudo mkdir -p /media/ramdisk1
sudo mount -t tmpfs -o size=8192M tmpfs /media/ramdisk1

docker run --name vos -d -v/media/ramdisk1/database:/opt/virtuoso-opensource/database \
               -v `pwd`/import:/import  \
               -t -p 1111:1111 -p 8890:8890 -i openlink/virtuoso-opensource-7:latest

Run the CLI

docker exec -it vos /opt/virtuoso-opensource/bin/isql

create graphs

SPARQL create GRAPH <>;

Import data

delete from DB.DBA.load_list;
ld_dir ('/import', 'my_file.ttl', '');
rdf_loader_run ();

Check existing graphs

   GRAPH ?g {?s ?p ?t}
; virtuoso.ini
; Configuration file for the OpenLink Virtuoso VDBMS Server
; To learn more about this product, or any other product in our
; portfolio, please check out our web site at:
; or contact us at:
; If you have any technical questions, please contact our support
; staff at:
; Database setup
DatabaseFile = virtuoso.db
ErrorLogFile = virtuoso.log
LockFile = virtuoso.lck
TransactionFile = virtuoso.trx
xa_persistent_file = virtuoso.pxa
ErrorLogLevel = 7
FileExtend = 200
MaxCheckpointRemap = 2000
Striping = 0
TempStorage = TempDatabase
DatabaseFile = virtuoso-temp.db
TransactionFile = virtuoso-temp.trx
MaxCheckpointRemap = 2000
Striping = 0
; Server parameters
ServerPort = 1111
LiteMode = 0
DisableUnixSocket = 1
DisableTcpSocket = 0
;SSLServerPort = 2111
;SSLCertificate = cert.pem
;SSLPrivateKey = pk.pem
;X509ClientVerify = 0
;X509ClientVerifyDepth = 0
;X509ClientVerifyCAFile = ca.pem
MaxClientConnections = 10
CheckpointInterval = 60
CaseMode = 2
MaxStaticCursorRows = 5000
CheckpointAuditTrail = 0
AllowOSCalls = 0
SchedulerInterval = 10
DirsAllowed = ., ../vad, /usr/share/proj, /import
ThreadCleanupInterval = 0
ThreadThreshold = 10
ResourcesCleanupInterval = 0
FreeTextBatchSize = 100000
SingleCPU = 0
VADInstallDir = ../vad/
PrefixResultNames = 0
RdfFreeTextRulesSize = 100
IndexTreeMaps = 256
MaxMemPoolSize = 200000000
PrefixResultNames = 0
MacSpotlight = 0
IndexTreeMaps = 64
MaxQueryMem = 4G ; memory allocated to query processor
VectorSize = 2000 ; initial parallel query vector (array of query operations) size
MaxVectorSize = 20000000 ; query vector size threshold.
AdjustVectorSize = 0
ThreadsPerQuery = 4
AsyncQueueMaxThreads = 10
;; When running with large data sets, one should configure the Virtuoso
;; process to use between 2/3 to 3/5 of free system memory and to stripe
;; storage on all available disks.
;; Uncomment next two lines if there is 2 GB system memory free
;NumberOfBuffers = 170000
;MaxDirtyBuffers = 130000
;; Uncomment next two lines if there is 4 GB system memory free
;NumberOfBuffers = 340000
; MaxDirtyBuffers = 250000
;; Uncomment next two lines if there is 8 GB system memory free
;NumberOfBuffers = 680000
;MaxDirtyBuffers = 500000
;; Uncomment next two lines if there is 16 GB system memory free
;NumberOfBuffers = 1360000
;MaxDirtyBuffers = 1000000
;; Uncomment next two lines if there is 32 GB system memory free
;NumberOfBuffers = 2720000
;MaxDirtyBuffers = 2000000
;; Uncomment next two lines if there is 48 GB system memory free
;NumberOfBuffers = 4000000
;MaxDirtyBuffers = 3000000
;; Uncomment next two lines if there is 64 GB system memory free
NumberOfBuffers = 5450000
MaxDirtyBuffers = 4000000
;; Note the default settings will take very little memory
;; but will not result in very good performance
;NumberOfBuffers = 10000
;MaxDirtyBuffers = 6000
ServerPort = 8890
ServerRoot = ../vsp
MaxClientConnections = 10
DavRoot = DAV
EnabledDavVSP = 0
HTTPProxyEnabled = 0
TempASPXDir = 0
DefaultMailServer = localhost:25
ServerThreads = 10
MaxKeepAlives = 30
KeepAliveTimeout = 30
MaxCachedProxyConnections = 10
ProxyConnectionCacheTimeout = 15
HTTPThreadSize = 280000
HttpPrintWarningsInOutput = 0
Charset = UTF-8
;HTTPLogFile = logs/http.log
MaintenancePage = atomic.html
EnabledGzipContent = 1
BadParentLinks = 0
ArrayOptimization = 0
NumArrayParameters = 10
VDBDisconnectTimeout = 1000
KeepConnectionOnFixedThread = 0
ServerName = db-CENTOS5-PORT
ServerEnable = 1
QueueMax = 50000
; Striping setup
; These parameters have only effect when Striping is set to 1 in the
; [Database] section, in which case the DatabaseFile parameter is ignored.
; With striping, the database is spawned across multiple segments
; where each segment can have multiple stripes.
; Format of the lines below:
; Segment<number> = <size>, <stripe file name> [, <stripe file name> .. ]
; <number> must be ordered from 1 up.
; The <size> is the total size of the segment which is equally divided
; across all stripes forming the segment. Its specification can be in
; gigabytes (g), megabytes (m), kilobytes (k) or in database blocks
; (b, the default)
; Note that the segment size must be a multiple of the database page size
; which is currently 8k. Also, the segment size must be divisible by the
; number of stripe files forming the segment.
; The example below creates a 200 meg database striped on two segments
; with two stripes of 50 meg and one of 100 meg.
; You can always add more segments to the configuration, but once
; added, do not change the setup.
Segment1 = 100M, db-seg1-1.db, db-seg1-2.db
Segment2 = 100M, db-seg2-1.db
;Segment1 = 100M, db-seg1-1.db, db-seg1-2.db
;Segment2 = 100M, db-seg2-1.db
;UcmPath = <path>
;Ucm1 = <file>
;Ucm2 = <file>
[Zero Config]
ServerName = virtuoso (CENTOS5-PORT)
;ServerDSN = ZDSN
;SSLServerName =
;SSLServerDSN =
;MONO_PATH = <path_here>
;MONO_ROOT = <path_here>
;MONO_CFG_DIR = <path_here>
;virtclr.dll =
DynamicLocal = 0
DefaultHost = localhost:8890
;ExternalQuerySource = 1
;ExternalXsltSource = 1
;DefaultGraph = http://localhost:8890/dataspace
;ImmutableGraphs = http://localhost:8890/dataspace
ResultSetMaxRows = 100000
MaxQueryCostEstimationTime = 600 ; in seconds
MaxQueryExecutionTime = 260 ; in seconds
DefaultQuery = select distinct ?Concept where {[] a ?Concept} LIMIT 100
DeferInferenceRulesInit = 0 ; controls inference rules loading
;PingService =
LoadPath = ../hosting
Load1 = plain, geos
Load2 = plain, proj4
Load3 = plain, shapefileio
