Skip to content

Instantly share code, notes, and snippets.

@lmeyerov
Last active May 29, 2020 01:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lmeyerov/5dbc411a7ce1cf21d57aab0dcdc42d54 to your computer and use it in GitHub Desktop.
Save lmeyerov/5dbc411a7ce1cf21d57aab0dcdc42d54 to your computer and use it in GitHub Desktop.
Graphistry 2.0 REST API Tutorial: Login, create dataset, and upload nodes/edges as json, csv, parquet, arrow, and more
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Demos: New Graphistry Upload API\n",
"\n",
"**NOTE**: A production version of the below `Uploader` reference helper class will be built into PyGraphistry\n",
"\n",
"The initial upload API release is REST-only. It enables faster and larger uploads. We recommend using a language client if one is available for you as it will be upgraded to automatically use it and keep your code more maintainable.\n",
"\n",
"The below example shares how to use via the Python `requests` package to directly call the raw REST API for several data types:\n",
"\n",
"**Reference: PyGraphistry API**\n",
"\n",
"**In-memory**:\n",
"* PyGraphistry object\n",
"* JSON dictionary\n",
"* pandas dataframe\n",
"* arrow (fastest)\n",
"\n",
"**File**:\n",
"* json (multiple formats)\n",
"* csv\n",
"* arrow\n",
"* parquet\n",
"* NodeXL\n",
"\n",
"In the demos, `http://nginx` is the name of the upload server, such as an internal servername visible to a notebook kernel, and `http://localhost` is the public web server, which should be accessible through the viewer's browser.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Config"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"creds = {'username': 'my_account', 'password': 'my_pwd'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Helpers"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"import graphistry, io, json, pandas as pd, pyarrow as pa, requests\n",
"from uploader import Uploader"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sample Data"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"df_small = pd.DataFrame({'s': ['a', 'b', 'c'], 'd': ['b', 'c', 'a']})\n",
"df_med = pd.DataFrame({'s': [0, 1, 2, 3, 4] * 90000, 'd': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] * 45000})"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"g = graphistry.bind(source='s', destination='d').settings(url_params={'play': 0})\n",
"\n",
"g_small = g.edges(df_small)\n",
"g_med = g.edges(df_med)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Old API (reference)\n",
"\n",
"Subsequent Graphistry releases will use the new API internally"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 30.8 ms, sys: 237 µs, total: 31.1 ms\n",
"Wall time: 83.7 ms\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=07f6f10a123854d0a261e693198c0e5b&type=vgraph&viztoken=8ca64dd957bd419392a902a9145433cd&usertag=834f74a8-pygraphistry-0.10.6&splashAfter=1590714139&info=true&play=0&play=0'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"graphistry.register(api=1)\n",
"'http://localhost' + g_small.plot(render=False) + '&play=0'"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.04 s, sys: 19.5 ms, total: 1.05 s\n",
"Wall time: 1.12 s\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=137b170328e753a79020a7a7301b9e53&type=jsonMeta&viztoken=c092f5e75cab4398a04b1a60c189d9de&usertag=834f74a8-pygraphistry-0.10.6&splashAfter=1590714141&info=true&play=0&play=0'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"graphistry.register(api=2)\n",
"'http://localhost' + g_med.plot(render=False) + '&play=0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## From a PyGraphistry object"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4f9a8eca2d3448cd8ded525655ff2e48\n",
"CPU times: user 22.7 ms, sys: 11.9 ms, total: 34.6 ms\n",
"Wall time: 540 ms\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=4f9a8eca2d3448cd8ded525655ff2e48&info=true&play=0&play=0'"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"u = Uploader('http://nginx').login(**creds).post_g(g_med)\n",
"\n",
"print(u.dataset_id)\n",
"u.to_url('http://localhost') + '&play=0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Manual with in-memory API: Pandas to Arrow\n",
"\n",
"Convert to an Arrow object, such as from cudf, pandas, or spark"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': {'dataset_id': '0705c67c4571457cbad6edead36b9445'}, 'message': 'Dataset created', 'success': True}\n",
"{'data': {'dataset_id': '0705c67c4571457cbad6edead36b9445', 'dtypes': {'d': 'int32', 's': 'int32'}, 'num_cols': 2, 'num_rows': 450000, 'time_parsing_s': 0}, 'message': 'Dataset edges created', 'success': True}\n",
"CPU times: user 18.7 ms, sys: 3.84 ms, total: 22.5 ms\n",
"Wall time: 488 ms\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=0705c67c4571457cbad6edead36b9445&play=0'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"u = Uploader('http://nginx').login(**creds)\n",
"\n",
"out = u.create_dataset({\n",
" \"node_encodings\": {\"bindings\": {}},\n",
" \"edge_encodings\": {\"bindings\": {\"source\": \"s\", \"destination\": \"d\"}},\n",
" \"metadata\": {},\n",
" \"name\": \"mytestviz\"\n",
"})\n",
"print(out)\n",
"\n",
"arr = pa.Table.from_pandas(g_med._edges, preserve_index=False).replace_schema_metadata({})\n",
"out = u.post_edges_arrow(arr)\n",
"print(out)\n",
"\n",
"u.to_url('http://localhost') + '&play=0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Manual with in-memory API: Both nodes and edges"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>n</th>\n",
" <th>some_val</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>aa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>aa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>aa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>aa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>aa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5</td>\n",
" <td>aa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>6</td>\n",
" <td>aa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>7</td>\n",
" <td>aa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>8</td>\n",
" <td>aa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>9</td>\n",
" <td>aa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" n some_val\n",
"0 0 aa\n",
"1 1 aa\n",
"2 2 aa\n",
"3 3 aa\n",
"4 4 aa\n",
"5 5 aa\n",
"6 6 aa\n",
"7 7 aa\n",
"8 8 aa\n",
"9 9 aa"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ns = pd.concat([\n",
" g_med._edges[g_med._source],\n",
" g_med._edges[g_med._destination]\n",
" ], ignore_index=True, sort=False).unique()\n",
"nodes_df_med = pd.DataFrame({'n': ns, 'some_val': ['aa'] * len(ns)})\n",
"nodes_df_med"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dataset_id 321e80d5b8ca43228eb962de8c494df8\n",
"CPU times: user 21.6 ms, sys: 11.5 ms, total: 33.1 ms\n",
"Wall time: 1.32 s\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=321e80d5b8ca43228eb962de8c494df8&play=0'"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"u = Uploader('http://nginx').login(**creds)\n",
"\n",
"u.create_dataset({\n",
" \"node_encodings\": {\"bindings\": {\"node\": \"n\"}},\n",
" \"edge_encodings\": {\"bindings\": {\"source\": \"s\", \"destination\": \"d\"}},\n",
" \"metadata\": {},\n",
" \"name\": \"mytestviz\"\n",
"})\n",
"print('dataset_id', u.dataset_id)\n",
"\n",
"arr = pa.Table.from_pandas(g_med._edges, preserve_index=False).replace_schema_metadata({})\n",
"u.post_edges_arrow(arr)\n",
"\n",
"arr = pa.Table.from_pandas(nodes_df_med, preserve_index=False).replace_schema_metadata({})\n",
"u.post_nodes_arrow(arr)\n",
"\n",
"u.to_url('http://localhost') + '&play=0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Manual with in-memory API: JSON to Arrow"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'s': 0, 'd': 0}, {'s': 1, 'd': 1}, {'s': 2, 'd': 2}]"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sample_json = df_med.to_dict(orient='rows')\n",
"sample_json[:3]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>s</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>449995</th>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>449996</th>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>449997</th>\n",
" <td>2</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>449998</th>\n",
" <td>3</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>449999</th>\n",
" <td>4</td>\n",
" <td>9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>450000 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" s d\n",
"0 0 0\n",
"1 1 1\n",
"2 2 2\n",
"3 3 3\n",
"4 4 4\n",
"... .. ..\n",
"449995 0 5\n",
"449996 1 6\n",
"449997 2 7\n",
"449998 3 8\n",
"449999 4 9\n",
"\n",
"[450000 rows x 2 columns]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(sample_json)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': {'dataset_id': 'b3c045c759cd433ea33a447cbc619b41'}, 'message': 'Dataset created', 'success': True}\n",
"{'data': {'dataset_id': 'b3c045c759cd433ea33a447cbc619b41', 'dtypes': {'d': 'int32', 's': 'int32'}, 'num_cols': 2, 'num_rows': 450000, 'time_parsing_s': 0}, 'message': 'Dataset edges created', 'success': True}\n",
"CPU times: user 19.4 ms, sys: 4.41 ms, total: 23.8 ms\n",
"Wall time: 480 ms\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=b3c045c759cd433ea33a447cbc619b41&play=0'"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"u = Uploader('http://nginx').login(**creds)\n",
"\n",
"out = u.create_dataset({\n",
" \"node_encodings\": {\"bindings\": {}},\n",
" \"edge_encodings\": {\"bindings\": {\"source\": \"s\", \"destination\": \"d\"}},\n",
" \"metadata\": {},\n",
" \"name\": \"mytestviz\"\n",
"})\n",
"print(out)\n",
"\n",
"arr = pa.Table.from_pandas(g_med._edges, preserve_index=False).replace_schema_metadata({})\n",
"out = u.post_edges_arrow(arr)\n",
"print(out)\n",
"\n",
"u.to_url('http://localhost') + '&play=0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## From a file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CSV"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"df_med.to_csv('./edges.csv', index=False)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': {'dataset_id': '73ba4f91df344ceb9bddd183bb39c0a0'}, 'message': 'Dataset created', 'success': True}\n",
"{'data': {'dataset_id': '73ba4f91df344ceb9bddd183bb39c0a0', 'dtypes': {'d': 'int32', 's': 'int32'}, 'num_cols': 2, 'num_rows': 450000, 'time_parsing_s': 0}, 'message': 'Dataset edges created', 'success': True}\n",
"CPU times: user 16.1 ms, sys: 4.01 ms, total: 20.1 ms\n",
"Wall time: 524 ms\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=73ba4f91df344ceb9bddd183bb39c0a0&play=0'"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"u = Uploader('http://nginx').login(**creds)\n",
"\n",
"out = u.create_dataset({\n",
" \"node_encodings\": {\"bindings\": {}},\n",
" \"edge_encodings\": {\"bindings\": {\"source\": \"s\", \"destination\": \"d\"}},\n",
" \"metadata\": {},\n",
" \"name\": \"mytestviz\"\n",
"})\n",
"print(out)\n",
"\n",
"out = u.post_edges_file('./edges.csv', 'csv')\n",
"print(out)\n",
"\n",
"u.to_url('http://localhost') + '&play=0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### JSON"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Columnar"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'s': ['a', 'b', 'c'], 'd': ['b', 'c', 'a'], 'ccc': [5, 5, 5]}"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"json_dict = df_small.assign(ccc=5).to_dict(orient='list')\n",
"json_dict"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"with open('df_med.json', 'w') as outfile:\n",
" json.dump(json_dict, outfile)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': {'dataset_id': 'f8b94415c1ca4241a68dc8cfdc89ea32'}, 'message': 'Dataset created', 'success': True}\n",
"{'data': {'dataset_id': 'f8b94415c1ca4241a68dc8cfdc89ea32', 'dtypes': {'ccc': 'int32', 'd': 'object', 's': 'object'}, 'num_cols': 3, 'num_rows': 3, 'time_parsing_s': 0}, 'message': 'Dataset edges created', 'success': True}\n",
"CPU times: user 16.1 ms, sys: 2 µs, total: 16.1 ms\n",
"Wall time: 526 ms\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=f8b94415c1ca4241a68dc8cfdc89ea32&play=0'"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"u = Uploader('http://nginx').login(**creds)\n",
"\n",
"out = u.create_dataset({\n",
" \"node_encodings\": {\"bindings\": {}},\n",
" \"edge_encodings\": {\"bindings\": {\"source\": \"s\", \"destination\": \"d\", \"edge_color\": \"ccc\"}},\n",
" \"metadata\": {},\n",
" \"name\": \"mytestviz\"\n",
"})\n",
"print(out)\n",
"\n",
"out = u.post_edges_file('./df_med.json', 'json')\n",
"print(out)\n",
"\n",
"u.to_url('http://localhost') + '&play=0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Rows"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'s': 0, 'd': 0}, {'s': 1, 'd': 1}, {'s': 2, 'd': 2}]"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"json_rows = df_med.to_dict(orient='rows')\n",
"json_rows[:3]"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"with open('df_med_rows.json', 'w') as outfile:\n",
" json.dump(json_rows, outfile)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': {'dataset_id': '5a1110c07aae43109027df39bf8fc645'}, 'message': 'Dataset created', 'success': True}\n",
"{'data': {'dataset_id': '5a1110c07aae43109027df39bf8fc645', 'dtypes': {'d': 'int32', 's': 'int32'}, 'num_cols': 2, 'num_rows': 450000, 'time_parsing_s': 2}, 'message': 'Dataset edges created', 'success': True}\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=5a1110c07aae43109027df39bf8fc645&play=0'"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"u = Uploader('http://nginx').login(**creds)\n",
"\n",
"out = u.create_dataset({\n",
" \"node_encodings\": {\"bindings\": {}},\n",
" \"edge_encodings\": {\"bindings\": {\"source\": \"s\", \"destination\": \"d\"}},\n",
" \"metadata\": {},\n",
" \"name\": \"mytestviz\"\n",
"})\n",
"print(out)\n",
"\n",
"out = u.post_edges_file('./df_med_rows.json', 'json')\n",
"print(out)\n",
"\n",
"u.to_url('http://localhost') + '&play=0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parquet"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"df_med.to_parquet('./edges.parquet')"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-r--r-- 1 graphistry graphistry 21K May 29 01:02 edges.parquet\r\n"
]
}
],
"source": [
"! ls -alh edges.parquet"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': {'dataset_id': '51b51f6b9f664c50ba1f6c4ca073e200'}, 'message': 'Dataset created', 'success': True}\n",
"{'data': {'dataset_id': '51b51f6b9f664c50ba1f6c4ca073e200', 'dtypes': {'d': 'int32', 's': 'int32'}, 'num_cols': 2, 'num_rows': 450000, 'time_parsing_s': 0}, 'message': 'Dataset edges created', 'success': True}\n",
"CPU times: user 24.4 ms, sys: 4.02 ms, total: 28.4 ms\n",
"Wall time: 624 ms\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=51b51f6b9f664c50ba1f6c4ca073e200&play=0'"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"u = Uploader('http://nginx').login(**creds)\n",
"\n",
"out = u.create_dataset({\n",
" \"node_encodings\": {\"bindings\": {}},\n",
" \"edge_encodings\": {\"bindings\": {\"source\": \"s\", \"destination\": \"d\"}},\n",
" \"metadata\": {},\n",
" \"name\": \"mytestviz\"\n",
"})\n",
"print(out)\n",
"\n",
"out = u.post_edges_file('./edges.parquet', 'parquet')\n",
"print(out)\n",
"\n",
"u.to_url('http://localhost') + '&play=0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Arrow"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"arr = pa.Table.from_pandas(g_med._edges, preserve_index=False).replace_schema_metadata({})\n",
"writer = pa.RecordBatchFileWriter('./edges.arrow', arr.schema)\n",
"writer.write_table(arr)\n",
"writer.close()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-r--r-- 1 graphistry graphistry 6.9M May 29 01:02 ./edges.arrow\r\n"
]
}
],
"source": [
"! ls -alh ./edges.arrow "
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'data': {'dataset_id': '59f20c76617a4da2a44dc844d75351ac'}, 'message': 'Dataset created', 'success': True}\n",
"{'data': {'dataset_id': '59f20c76617a4da2a44dc844d75351ac', 'dtypes': {'d': 'int32', 's': 'int32'}, 'num_cols': 2, 'num_rows': 450000, 'time_parsing_s': 0}, 'message': 'Dataset edges created', 'success': True}\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=59f20c76617a4da2a44dc844d75351ac&play=0'"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"u = Uploader('http://nginx').login(**creds)\n",
"\n",
"out = u.create_dataset({\n",
" \"node_encodings\": {\"bindings\": {}},\n",
" \"edge_encodings\": {\"bindings\": {\"source\": \"s\", \"destination\": \"d\"}},\n",
" \"metadata\": {},\n",
" \"name\": \"mytestviz\"\n",
"})\n",
"print(out)\n",
"\n",
"out = u.post_edges_file('./edges.arrow', 'arrow')\n",
"print(out)\n",
"\n",
"u.to_url('http://localhost') + '&play=0'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### NodeXL: From a public URL\n",
"See also:\n",
" * `POST /upload/datasets/<dataset_id>/nodexl/file`\n",
" * `POST /upload/datasets/<dataset_id>/nodexl/url`\n",
" * `GET+POST /upload/nodexl/url`"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"request header {'User-Agent': 'python-requests/2.23.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Authorization': 'Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VybmFtZSI6Imxlb3Rlc3QiLCJpYXQiOjE1OTA3MTQxNTUsImV4cCI6MTU5MDcxNzc1NSwidXNlcl9pZCI6MSwib3JpZ19pYXQiOjE1OTA3MTQxNTV9.6CX2TpZOlppKBUQQ-4hiI8b3xQNh8nSl6cutwOgVmNA'}\n",
"{'data': {'dataset_id': '554756890172494c9692a1f71c893574', 'edges': {'dtypes': {'Add Your Own Columns Here': 'float32', 'Added By Extended Analysis': 'object', 'Color': 'object', 'ColorInt': 'int32', 'Corrected By Extended Analysis': 'object', 'Date': 'datetime64[ns]', 'Domains in Tweet': 'object', 'Dynamic Filter': 'float32', 'Edge Content Word Count': 'object', 'Edge Weight': 'object', 'Favorite Count': 'object', 'Favorited': 'bool', 'Hashtags in Tweet': 'object', 'ID': 'object', 'Imported ID': 'object', 'Imported Tweet Type': 'object', 'In-Reply-To Tweet ID': 'object', 'In-Reply-To User ID': 'object', 'Is Quote Status': 'bool', 'Label': 'float32', 'Label Font Size': 'float32', 'Label Text Color': 'float32', 'Language': 'object', 'Latitude': 'float32', 'Longitude': 'float32', 'Media in Tweet': 'object', 'Non-categorized Word Count': 'object', 'Non-categorized Word Percentage (%)': 'object', 'Opacity': 'object', 'Place Bounding Box': 'object', 'Place Country': 'object', 'Place Country Code': 'object', 'Place Full Name': 'object', 'Place ID': 'object', 'Place Name': 'object', 'Place Type': 'object', 'Place URL': 'object', 'Possibly Sensitive': 'float32', 'Quoted Status ID': 'object', 'Reciprocated?': 'object', 'Relationship': 'object', 'Relationship Date (UTC)': 'datetime64[ns]', 'Retweet Count': 'object', 'Retweet ID': 'object', 'Retweeted': 'bool', 'Sentiment List #1: List1 Word Count': 'object', 'Sentiment List #1: List1 Word Percentage (%)': 'object', 'Sentiment List #2: List2 Word Count': 'object', 'Sentiment List #2: List2 Word Percentage (%)': 'object', 'Sentiment List #3: List3 Word Count': 'object', 'Sentiment List #3: List3 Word Percentage (%)': 'object', 'Source': 'object', 'Style': 'object', 'Time': 'datetime64[ns]', 'Truncated': 'bool', 'Tweet': 'object', 'Tweet Date (UTC)': 'datetime64[ns]', 'Tweet Image File': 'object', 'Twitter Page for Tweet': 'object', 'URLs in Tweet': 'object', 'Unified Twitter ID': 'object', 'Vertex 1': 'object', 'Vertex 1 Group': 'object', 'Vertex 2': 'object', 'Vertex 2 Group': 'object', 'Visibility': 'float32', 'Width': 'object'}, 'num_cols': 67, 'num_rows': 3142}, 'nodes': {'dtypes': {'Add Your Own Columns Here': 'float32', 'Betweenness Centrality': 'object', 'Closeness Centrality': 'object', 'Clustering Coefficient': 'object', 'Color': 'float32', 'Color2': 'int32', 'Custom Menu Item': 'object', 'Default Profile': 'bool', 'Default Profile Image': 'bool', 'Degree': 'float32', 'Description': 'object', 'Domains in Tweet by Count': 'object', 'Domains in Tweet by Salience': 'object', 'Dynamic Filter': 'float32', 'Eigenvector Centrality': 'object', 'Favorites': 'object', 'Followed': 'object', 'Followers': 'object', 'Geo Enabled': 'bool', 'Hashtags in Tweet by Count': 'object', 'Hashtags in Tweet by Salience': 'object', 'ID': 'object', 'Image File': 'object', 'In-Degree': 'object', 'Joined Twitter Date (UTC)': 'datetime64[ns]', 'Label': 'object', 'Label Fill Color': 'float32', 'Label Position': 'object', 'Language': 'object', 'Layout Order': 'object', 'Listed Count': 'object', 'Location': 'object', 'Locked?': 'float32', 'Name': 'object', 'Non-categorized Word Count': 'object', 'Non-categorized Word Percentage (%)': 'object', 'Opacity': 'float32', 'Out-Degree': 'object', 'PageRank': 'object', 'Polar Angle': 'float32', 'Polar R': 'float32', 'Profile Background Image Url': 'object', 'Profile Banner Url': 'object', 'Reciprocated Vertex Pair Ratio': 'object', 'Sentiment List #1: List1 Word Count': 'object', 'Sentiment List #1: List1 Word Percentage (%)': 'object', 'Sentiment List #2: List2 Word Count': 'object', 'Sentiment List #2: List2 Word Percentage (%)': 'object', 'Sentiment List #3: List3 Word Count': 'object', 'Sentiment List #3: List3 Word Percentage (%)': 'object', 'Shape': 'object', 'Size': 'object', 'Time Zone': 'object', 'Time Zone UTC Offset (Seconds)': 'object', 'Tooltip': 'object', 'Top Word Pairs in Tweet by Count': 'object', 'Top Word Pairs in Tweet by Salience': 'object', 'Top Words in Tweet by Count': 'object', 'Top Words in Tweet by Salience': 'object', 'Tweeted Search Term?': 'object', 'Tweets': 'object', 'URLs in Tweet by Count': 'object', 'URLs in Tweet by Salience': 'object', 'Verified': 'bool', 'Vertex': 'object', 'Vertex Content Word Count': 'object', 'Vertex Group': 'object', 'Visibility': 'float32', 'Web': 'object', 'x': 'object', 'y': 'object'}, 'num_cols': 71, 'num_rows': 529}, 'url': '/graph/graph.html?dataset=554756890172494c9692a1f71c893574&splashAfter=1590714179&play=0'}, 'message': 'Dataset created', 'success': True}\n",
"CPU times: user 20.4 ms, sys: 2.92 ms, total: 23.3 ms\n",
"Wall time: 23.7 s\n"
]
},
{
"data": {
"text/plain": [
"'http://localhost/graph/graph.html?dataset=554756890172494c9692a1f71c893574&splashAfter=1590714179&play=0'"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"import urllib.parse\n",
"#urllib.parse.quote(\n",
"\n",
"file_url = urllib.parse.quote(\"https://nodexlgraphgallery.org/Pages/Workbook.ashx?graphID=227114\", safe='')\n",
"base_path = 'http://nginx'\n",
"\n",
"u = Uploader(base_path).login(**creds)\n",
"tok = u.token\n",
"\n",
"resp = requests.get(\n",
" f'{base_path}/api/v2/upload/nodexl/url?template=twitter&url={file_url}',\n",
" headers={'Authorization': f'Bearer {tok}'})\n",
"\n",
"print('request header', resp.request.headers)\n",
"\n",
"out = resp.json()\n",
"print(out)\n",
"\n",
"subpath = out['data']['url']\n",
"f'http://localhost{subpath}'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.7 (RAPIDS)",
"language": "python",
"name": "rapids"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
import graphistry, io, json, logging, pandas as pd, pyarrow as pa, requests
logger = logging.getLogger('Uploader')
class Uploader:
@property
def token(self) -> str:
if self.__token is None:
raise Exception("Not logged in")
return self.__token
@property
def dataset_id(self) -> str:
if self.__dataset_id is None:
raise Exception("Must first create a dataset")
return self.__dataset_id
@property
def base_path(self) -> str:
return self.__base_path
@property
def view_base_path(self) -> str:
return self.__view_base_path
@property
def url_params(self) -> dict:
if self.__url_params is None:
return {}
else:
return self.__url_params
def settings(self, url_params=None):
if not (url_params is None):
self.__url_params = url_params
return self
def __init__(self, base_path='http://nginx', view_base_path='http://localhost'):
self.__base_path = base_path
self.__view_base_path = view_base_path
self.__token = None
self.__dataset_id = None
self.__url_params = None
def login(self, username, password):
base_path = self.base_path
out = requests.post(
f'{base_path}/api-token-auth/',
json={'username': username, 'password': password})
json_response = None
try:
json_response = out.json()
if not ('token' in json_response):
raise Exception(out.text)
except Exception as e:
logger.error('Error: %s', out)
raise Exception(out.text)
self.__token = out.json()['token']
return self
def create_dataset(self, json):
tok = self.token
out = requests.post(
self.base_path + '/api/v2/upload/datasets/',
headers={'Authorization': f'Bearer {tok}'},
json=json).json()
if not out['success']:
raise Exception(out)
self.__dataset_id = out['data']['dataset_id']
return out
#PyArrow's table.getvalues().to_pybytes() fails to hydrate some reason,
# so work around by consolidate into a virtual file and sending that
def arrow_to_buffer(self, table: pa.Table):
b = io.BytesIO()
writer = pa.RecordBatchFileWriter(b, table.schema)
writer.write_table(table)
writer.close()
return b.getvalue()
def post_g(self, g, name=None):
def maybe_bindings(g, bindings):
out = {}
for old_field_name, new_field_name in bindings:
try:
val = getattr(g, old_field_name)
if val is None:
continue
else:
out[new_field_name] = val
except AttributeError:
continue
logger.debug('bindings: %s', out)
return out
self.__url_params = g._url_params if not (g._url_params is None) else {}
node_encodings = maybe_bindings(
g,
[
['_node', 'node'],
['_point_color', 'node_color'],
['_point_label', 'node_label'],
['_point_opacity', 'node_opacity'],
['_point_size', 'node_size'],
['_point_title', 'node_title'],
['_point_weight', 'node_weight']
])
if not (g._nodes is None):
if 'x' in g._nodes:
node_encodings['x'] = 'x'
if 'y' in g._nodes:
node_encodings['y'] = 'y'
self.create_dataset({
"node_encodings": {"bindings": node_encodings},
"edge_encodings": {"bindings": maybe_bindings(
g,
[
['_source', 'source'],
['_destination', 'destination'],
['_edge_color', 'edge_color'],
['_edge_label', 'edge_label'],
['_edge_opacity', 'edge_opacity'],
['_edge_size', 'edge_size'],
['_edge_title', 'edge_title'],
['_edge_weight', 'edge_weight']
])
},
"metadata": {},
"name": ("mytestviz" if name is None else name)
})
self.g_post_edges(g)
if not (g._nodes is None):
self.g_post_nodes(g)
return self
def to_url(self, view_base_path=None):
path = view_base_path if not (view_base_path is None) else self.view_base_path
dataset_id = self.dataset_id
params = [ str(k) + '=' + str(v) for k, v in self.url_params.items() ]
url_params = ('&' + '&'.join(params)) if len(params) > 0 else ''
return f'{path}/graph/graph.html?dataset={dataset_id}{url_params}'
def plot(self, render=True):
if render:
try:
from IPython.core.display import display, HTML
url = self.to_url()
logger.debug('url: %s', url)
return display(HTML(f'<iframe src="{url}" width="100%" height="600"/>'))
except Exception as e:
logger.debug(e)
return self.to_url()
def g_post_edges(self, g):
arr = pa.Table.from_pandas(g._edges, preserve_index=False).replace_schema_metadata({})
buf = self.arrow_to_buffer(arr)
dataset_id = self.dataset_id
tok = self.token
base_path = self.base_path
out = requests.post(
f'{base_path}/api/v2/upload/datasets/{dataset_id}/edges/arrow',
headers={'Authorization': f'Bearer {tok}'},
data=buf).json()
if not out['success']:
raise Exception(out)
return out
def g_post_nodes(self, g):
arr = pa.Table.from_pandas(g._nodes, preserve_index=False).replace_schema_metadata({})
buf = self.arrow_to_buffer(arr)
dataset_id = self.dataset_id
tok = self.token
base_path = self.base_path
out = requests.post(
f'{base_path}/api/v2/upload/datasets/{dataset_id}/nodes/arrow',
headers={'Authorization': f'Bearer {tok}'},
data=buf).json()
if not out['success']:
raise Exception(out)
return out
def post_edges_arrow(self, arr, opts=''):
return self.post_arrow(arr, 'edges', opts)
def post_nodes_arrow(self, arr, opts=''):
return self.post_arrow(arr, 'nodes', opts)
def post_arrow(self, arr, graph_type, opts=''):
buf = self.arrow_to_buffer(arr)
dataset_id = self.dataset_id
tok = self.token
base_path = self.base_path
url = f'{base_path}/api/v2/upload/datasets/{dataset_id}/{graph_type}/arrow'
if len(opts) > 0:
url = f'{url}?{opts}'
out = requests.post(
url,
headers={'Authorization': f'Bearer {tok}'},
data=buf).json()
if not out['success']:
raise Exception(out)
return out
def post_edges_file(self, file_path, file_type='csv'):
return self.post_file(file_path, 'edges', file_type)
def post_nodes_file(self, file_path, file_type='csv'):
return self.post_file(file_path, 'nodes', file_type)
def post_file(self, file_path, graph_type='edges', file_type='csv'):
dataset_id = self.dataset_id
tok = self.token
base_path = self.base_path
with open(file_path, 'rb') as file:
out = requests.post(
f'{base_path}/api/v2/upload/datasets/{dataset_id}/{graph_type}/{file_type}',
headers={'Authorization': f'Bearer {tok}'},
data=file.read()).json()
if not out['success']:
raise Exception(out)
return out
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment