Skip to content

Instantly share code, notes, and snippets.

@justin2004
Last active September 8, 2023 17:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save justin2004/8640abc292170823f9df24f868fc3fd5 to your computer and use it in GitHub Desktop.
Save justin2004/8640abc292170823f9df24f868fc3fd5 to your computer and use it in GitHub Desktop.
Using SPARQL Anything with parquet files

this approach relies on the fx:command property to transform the parquet file into csv

justin@X~/github/sparql.anything$ bash -c "python3 -c \"import pandas as pd ; pd.read_parquet(\\\"some.parquet\\\").to_csv(\\\"/dev/stdout\\\")\""
,name,age
0,bob,45
1,fred,3

then running the query below like this:

curl --silent  'http://localhost:3000/sparql.anything'  \
--header "Accept: text/csv" \
--data-urlencode 'query=
PREFIX  xyz:  <http://sparql.xyz/facade-x/data/>
PREFIX  fx:   <http://sparql.xyz/facade-x/ns/>
SELECT  *
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:command     "python3 -c \"import pandas as pd ; pd.read_parquet(\\\"some.parquet\\\").to_csv(\\\"/dev/stdout\\\")\"" ;
                  fx:media-type  "text/csv" ;
                  fx:csv.headers "true" .
        ?s        ?p             ?o
      }
  }
'

produces:

s p o
_:b0 http://www\.w3\.org/1999/02/22\-rdf\-syntax\-ns\#type http://sparql\.xyz/facade\-x/ns/root
_:b0 http://www\.w3\.org/1999/02/22\-rdf\-syntax\-ns\#\_1 _:b1
_:b0 http://www\.w3\.org/1999/02/22\-rdf\-syntax\-ns\#\_2 _:b2
_:b2 http://sparql\.xyz/facade\-x/data/ 1
_:b2 http://sparql\.xyz/facade\-x/data/name fred
_:b2 http://sparql\.xyz/facade\-x/data/age 3
_:b1 http://sparql\.xyz/facade\-x/data/ 0
_:b1 http://sparql\.xyz/facade\-x/data/name bob
_:b1 http://sparql\.xyz/facade\-x/data/age 45
# this assumes you have `some.parquet` in your current directory directory
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
SELECT *
WHERE
{ SERVICE <x-sparql-anything:>
{ fx:properties
fx:command "python3 -c \"import pandas as pd ; pd.read_parquet(\\\"some.parquet\\\").to_csv(\\\"/dev/stdout\\\")\"" ;
fx:media-type "text/csv" ;
fx:csv.headers "true" .
?s ?p ?o
}
}
@justin2004
Copy link
Author

NOTE:
you'll have to wait until the next release of sparql anything (the release after 0.8.2) to use this
OR
you'll have to run the docker container after this gets fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment