Skip to content

Instantly share code, notes, and snippets.

@ryu1kn
Last active March 24, 2024 10:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ryu1kn/337bde34d2198fea185cef94bb120e20 to your computer and use it in GitHub Desktop.
Save ryu1kn/337bde34d2198fea185cef94bb120e20 to your computer and use it in GitHub Desktop.
Work with DataHub GraphQL API

Work with DataHub GraphQL API

Prerequisites

  • Python (here tested with 3.12.2)
  • Docker
  • make
  • curl
  • jq

Usage

  1. Setup DataHub

    make init
  2. Create an access token for making API requests.

    1. Go to http://localhost:9002/settings/tokens and login (username/password are datahub/datahub)
    2. Create a token, then on your terminal, make it available as an environment variable: DATAHUB_API_KEY
  3. Create sample data in DataHub

    datahub docker ingest-sample-data --token "$DATAHUB_API_KEY"
  4. Ready to experiment with DataHub!

    To learn about DataHub GraphQL API, take advantage of its GraphiQL! at http://localhost:9002/api/graphiql

  5. You can add Account Balance term to all fields in SampleKafkaDataset dataset.

    1. First check fields in SamplekafkaDataset before making changes at http://localhost:9002/dataset/urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)/Schema
    2. Run bash add-glossary-terms.sh
    3. Check the fields again to confirm the change.

References

#!/bin/bash
set -euo pipefail
function to_json_string() {
echo "\"$(echo "$1" | sed 's/"/\\"/g' | tr '\n' ' ' | sed 's/ */ /g')\""
}
function send_query() {
curl -L -X POST http://localhost:8080/api/graphql \
--silent \
-H "Authorization: Bearer $DATAHUB_API_KEY" \
-H 'Content-Type: application/json' \
-d "$(cat << EOF
{
"query": $(to_json_string "$1"),
"variables":{}
}
EOF
)"
}
function build_q_add_glossaryterm() {
echo "$(cat << EOF
mutation addTerms {
addTerms(input: {
termUrns: ["urn:li:glossaryTerm:AccountBalance"],
resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)",
subResourceType: DATASET_FIELD,
subResource: "$1"
})
}
EOF
)"
}
q_field_paths='{
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)") {
urn
schemaMetadata {
fields {
schemaFieldEntity {
fieldPath
}
}
}
}
}'
field_paths="$(send_query "$q_field_paths" | jq -r '.data.dataset.schemaMetadata.fields[].schemaFieldEntity.fieldPath')"
for field_path in $field_paths ; do
send_query "$(build_q_add_glossaryterm "$field_path")"
done
# Need to turn on `METADATA_SERVICE_AUTH_ENABLED` so that we can access
# DataHub with API keys.
# c.f. https://datahubproject.io/docs/authentication/introducing-metadata-service-authentication#configuring-metadata-service-authentication
#
# To check how docker-compose merge its setting files, read:
# https://docs.docker.com/compose/multiple-compose-files/merge/#merging-rules
services:
datahub-frontend-react:
environment:
- METADATA_SERVICE_AUTH_ENABLED=true
datahub-gms:
environment:
- METADATA_SERVICE_AUTH_ENABLED=true
.PHONY: init use-virtual-env
init:
pip install 'acryl-datahub[datahub-rest]'
curl -L -o docker-compose.yaml https://raw.githubusercontent.com/datahub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j-m1.quickstart.yml
datahub docker quickstart -f docker-compose.yaml -f docker-compose.override.yaml
use-virtual-env:
python -m venv .venv
source .venv/bin/activate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment