Skip to content

Instantly share code, notes, and snippets.

@n0531m
Last active February 22, 2022 04:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save n0531m/15e308f2824f65592d2d2047fb6b7148 to your computer and use it in GitHub Desktop.
Save n0531m/15e308f2824f65592d2d2047fb6b7148 to your computer and use it in GitHub Desktop.
Extract metadata (Reports/Data Srouces) from Data Studio

Extract metadata from Data Studio

this is some example code I used to extract metadata (REPORT/DATA_SOURCE) from Data Studio.

Warning : Information provided via Data Studio API very limited. So if you are looking for ways to understand more details such as what are the constructs of a REPORT or what backend a DATA_SOURCE is connecting to, unfortunately these will not help. That said, it can still be better than nothing so that you know what assets are existing within the organization.

Extracted information can be potentially cataloged in some other services such as Data Catalog. However, that is beyond what this sample covers.

products in use

Data Studio

Service Accounts with G Suite domain wide delegation

can be potentially expanded to also use

not within scope of examples here.

G Suite Audit logs

Data Catalog

files layout

.
├── assets2json.py
├── dsclient
│   ├── __init__.py
│   ├── auth.py
│   └── datastudio.py
└── key.json // your service account key file

usage

something like this

python asset2json.py > data.json

output

output is an array of objects that maps to metadata of either a REPORT or DATA_SOURCE

[
{
   "name": "fcadaf00-6a5e-4c99-b56d-XXXXXXXXXXXX",
   "title": "gcp_billing_export_v1_XXXXXX_XXXXXX_XXXXXX",
   "assetType": "DATA_SOURCE",
   "updateTime": "2020-11-09T14:24:38Z",
   "updateByMeTime": "2020-11-09T14:24:38Z",
   "createTime": "2020-11-06T16:37:35Z",
   "lastViewByMeTime": "2020-11-09T14:24:38Z",
   "owner": "AAAAAAAA@imfeelinglucky.dev",
   "permissions": {
     "OWNER": {
       "members": [
         "user:AAAAAAAA@imfeelinglucky.dev"
       ]
     },
     "VIEWER": {
       "members": [
         "user:BBBBBBBB@imfeelinglucky.dev"
       ]
     }
   }
 },
 {
   "name": "7578a73a-4c39-4947-88b5-XXXXXXXXXXXX",
   "title": "Copy of [Sample] Google Analytics Marketing Website",
   "assetType": "REPORT",
   "updateTime": "2020-11-06T10:03:27Z",
   "updateByMeTime": "2020-11-06T09:36:55Z",
   "createTime": "2020-11-06T08:46:22Z",
   "lastViewByMeTime": "2020-11-06T10:03:39Z",
   "owner": "CCCCCCCC@imfeelinglucky.dev",
   "permissions": {
     "OWNER": {
       "members": [
         "user:CCCCCCCC@imfeelinglucky.dev"
       ]
     },
     "VIEWER": {
       "members": [
         "user:DDDDDDDD@imfeelinglucky.dev"
       ]
     }
   }
 }
......
]

considerations

Data Studio API

  • Can list assets, but can be very inefficient. Number of requests is (number of users) * (number of reports each) *2 (an additional call to fetch permissions on the report) at least. Thus, This code will be pretty slow in large orgs as it has to fetch the list of reports per user. Better to be implemented as a parallel fetch.
  • User list should be fetched from G Suite directory, or expanded from Groups
import json
from dsclient import auth as auth
from dsclient import datastudio as ds
def listUsers():
return [
"moritani@imfeelinglucky.dev",
"ciadmin@imfeelinglucky.dev"
]
def listAssetsByUser(user, token):
assets = []
for asset in ds.searchAssetsByOwner(user, token):
p = ds.getPermissions(asset['name'], token)
if p:
asset['permissions'] = p
assets.append(asset)
return assets
def listAssets(users):
list = []
for user in users:
list.append(listAssetsByUser(user, auth.getAccessTokenAsUser(user)))
return list
result = listAssets(listUsers())
print(json.dumps(result, indent=2))
import google.auth.transport.requests
from google.oauth2 import service_account
from google.oauth2.service_account import Credentials
scopes = [
'https://www.googleapis.com/auth/datastudio',
'https://www.googleapis.com/auth/userinfo.email',
'https://www.googleapis.com/auth/userinfo.profile',
'openid'
]
def getCredentialsAsUser(user):
credentials = service_account.Credentials.from_service_account_file(
'key.json', # service account key file
scopes=scopes,
subject=user
)
credentials.refresh(google.auth.transport.requests.Request())
return credentials
def getAccessTokenAsUser(user):
return getCredentialsAsUser(user).token
import json
import requests
# ref : https://developers.google.com/datastudio/api/reference/assets/search
def _searchAssets(params, token):
r = requests.get(
"https://datastudio.googleapis.com/v1/assets:search",
params=params,
headers={'Authorization': "Bearer {}".format(token)}
)
return r.json()['assets'] if r.ok and r.json() else []
def searchReportsByOwner(user, token):
params = {
'assetTypes': ["REPORT"],
'owner': user,
'pageSize': 100
}
return _searchAssets(params, token)
def searchDatasourcesByOwner(user, token):
params = {
'assetTypes': ["DATA_SOURCE"],
'owner': user,
'pageSize': 100
}
return _searchAssets(params, token)
def searchAssetsByOwner(user, token):
return searchReportsByOwner(user, token)+searchDatasourcesByOwner(user, token)
def getPermissions(reportname, token):
assetId = reportname
r = requests.get(
"https://datastudio.googleapis.com/v1/assets/{}/permissions".format(
assetId),
headers={'Authorization': "Bearer {}".format(token)}
)
return r.json()['permissions'] if r.ok else []
def addMembers(reportname, newmembers, token):
assetId = reportname
# note
# api documentation seems not aligned with behaviour
# call below has to be "post" not "patch".
r = requests.post(
"https://datastudio.googleapis.com/v1/assets/{}/permissions:addMembers".format(
assetId),
data=newmembers,
headers={'Authorization': "Bearer {}".format(token)}
)
print(r.status_code)
print(r.text)
def main():
print("nothing here")
if __name__ == '__main__':
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment