Skip to content

Instantly share code, notes, and snippets.

@bumatic
Last active January 4, 2022 06:19
Show Gist options
  • Save bumatic/83c3423595cde010da7ad059c6b8b2f5 to your computer and use it in GitHub Desktop.
Save bumatic/83c3423595cde010da7ad059c6b8b2f5 to your computer and use it in GitHub Desktop.
Notebook with widgets for interactively using PyCatFlow in a Jupyter Notebook.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "eb905db4-6b79-4018-b454-5dcd370985e7",
"metadata": {},
"source": [
"# [PyCatFlow](https://github.com/bumatic/PyCatFlow) \n",
"\n",
"This notebook allows you to create a [PyCatFlow](https://github.com/bumatic/PyCatFlow) visualization. Run the notebook cell by cell and adjust the required information. **Users do not have to change any code, but make inputs with so-called widgets.** Most of the code in the this notebook implements these interface elements. Using the tool in plain Python reqires far less code. An example of this is included in the code repository of [PyCatFlow](https://github.com/bumatic/PyCatFlow).\n",
"\n",
"A file with sample data can be downloaded [here](https://raw.githubusercontent.com/bumatic/PyCatFlow/main/example/sample_data_ChatterBot_Requirements.csv). In case you want to play around with this data make sure that it is saved with the extension '.csv'."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "65622b3d-803e-40cd-b2fe-95e4231e080e",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import pycatflow as pcf\n",
"import ipywidgets as widgets\n",
"from io import StringIO\n",
"from IPython.display import display"
]
},
{
"cell_type": "markdown",
"id": "e7cfb77d-36b4-4afc-98b0-9427c9a81276",
"metadata": {},
"source": [
"## Step 1: Loading data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a0606af9-1c1f-4436-98a6-78c3f9e92ecb",
"metadata": {},
"outputs": [],
"source": [
"uploader = widgets.FileUpload(accept='.csv', description='Select csv file')\n",
"display(uploader)\n",
"\n",
"separator = widgets.Dropdown(options=[('Tabulator', '\\t'), ('Comma',','), ('Semicolon', ';')],\n",
" value='\\t',\n",
" description='Separator:',\n",
" disabled=False)\n",
"display(separator)\n",
"\n",
"# Once you ran the code and selected a file for upload with the interactive widget\n",
"# DON'T rerun this cell because it resets the selection. Just proceed with the next cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da54714d-6aaa-4f38-8972-ce7d6ec8e26f",
"metadata": {},
"outputs": [],
"source": [
"data_loaded = False\n",
"if len(list(uploader.value.keys())) == 1:\n",
" data_file = list(uploader.value.keys())[0]\n",
" raw_data = uploader.value[list(uploader.value.keys())[0]]['content'].decode('UTF-8')\n",
" data = pd.read_csv(StringIO(raw_data), sep=separator.value)\n",
" columns = list(data.columns)\n",
" print('First 5 rows of loaded data:')\n",
" print()\n",
" print(data.head(5))\n",
" print()\n",
" data_loaded = True\n",
"else: \n",
" print('Please select a data file before running this cell.')"
]
},
{
"cell_type": "markdown",
"id": "c551a553-c1db-4fba-8899-98b5de51a4fe",
"metadata": {},
"source": [
"## Step 2: Mapping data columns to the visualization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d4f61d6-738a-4d6b-916c-b7b5d4e50c7a",
"metadata": {},
"outputs": [],
"source": [
"if data_loaded:\n",
" style = {'description_width': 'initial'}\n",
" print()\n",
" print('Select data columns for visualization. (Leave blank, if it does not apply.)')\n",
" print()\n",
" viz_columns = widgets.Dropdown(options=columns, value=columns[0], description='Viz columns*: ', disabled=False, style=style)\n",
" display(viz_columns)\n",
"\n",
" viz_nodes = widgets.Dropdown(options=columns, value=columns[1], description='Viz nodes*: ', disabled=False, style=style)\n",
" display(viz_nodes)\n",
" \n",
" viz_category = widgets.Dropdown(options=columns, value=None, description='Viz category: ', disabled=False, style=style)\n",
" display(viz_category)\n",
" \n",
" viz_col_order = widgets.Dropdown(options=columns, value=None, description='Column order: ', disabled=False, style=style)\n",
" display(viz_col_order)\n",
"else: \n",
" print('No data loaded. Start in the cells above.')"
]
},
{
"cell_type": "markdown",
"id": "c02676cb-d223-4ca8-8f44-cee891a871ce",
"metadata": {},
"source": [
"## Step 3: Set properties of the visualization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a387d29-b2b5-4eea-b361-1c9310b6cdda",
"metadata": {},
"outputs": [],
"source": [
"data = pcf.read(raw_data, columns=viz_columns.value, \n",
" nodes=viz_nodes.value, categories=viz_category.value, \n",
" column_order=viz_col_order.value, delimiter=separator.value)\n",
"\n",
"if data:\n",
" viz_width = widgets.IntText(value=800, description='Width:', disabled=False, style=style)\n",
" display(viz_width)\n",
" viz_node_min = widgets.IntText(value=2, description='Node min size:', disabled=False, style=style) \n",
" display(viz_node_min)\n",
" viz_node_max = widgets.IntText(value=20, description='Node max size:', disabled=False, style=style) \n",
" display(viz_node_max)\n",
" viz_spacing = widgets.IntText(value=20, description='Node spacing:', disabled=False, style=style)\n",
" display(viz_spacing)\n",
" viz_connection_type = widgets.Dropdown(options=['semi-curved', 'curved', 'straight'], description='Connection type: ', \n",
" disabled=False, style=style )\n",
" display(viz_connection_type)\n",
" viz_order = widgets.Dropdown(options=['frequency', 'alphabetical', 'category'], description='Sort nodes by: ', \n",
" disabled=False, style=style )\n",
" display(viz_order)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "169a5d01-f810-4448-8f4d-fae36ee19beb",
"metadata": {},
"outputs": [],
"source": [
"viz = pcf.visualize(data, spacing=viz_spacing.value, width=viz_width.value, maxValue=viz_node_max.value, \n",
" minValue=viz_node_min.value, connection_type=viz_connection_type.value, sort_by=viz_order.value)\n",
"viz"
]
},
{
"cell_type": "markdown",
"id": "2a4981eb-0f39-48aa-9d62-54df0a20c550",
"metadata": {},
"source": [
"## Step 4: Save the result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4938f8be-e04d-40bf-94e9-a4a207c1e3f4",
"metadata": {},
"outputs": [],
"source": [
"viz_file_name = widgets.Text(value='CatFlow_visual', description='File name: ')\n",
"display(viz_file_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c4e8f24a-6162-4c07-913a-f10bd94269c0",
"metadata": {},
"outputs": [],
"source": [
"fname = viz_file_name.value+'.svg'\n",
"viz.saveSvg(fname)\n",
"fname = viz_file_name.value+'.png'\n",
"viz.savePng(fname) "
]
},
{
"cell_type": "markdown",
"id": "21faf2f0-be22-4b78-ac66-6aa42f820667",
"metadata": {},
"source": [
"**For dowloading the results file select it in the file list on the left hand side, open the context menu with a \"right click\" or \"two finger tap\" and choose \"Download\" from the options.**"
]
},
{
"cell_type": "markdown",
"id": "491323c0-3dbf-4ba6-a5dd-29a5290b44f6",
"metadata": {},
"source": [
"## Advanced settings\n",
"\n",
"PyCatFlow offers more settings to adjust the visualization. If you want to make use of them you can genrate graphs by invoking the visualize funktion with your custom settings. The following example contains all parameters that can be passed to adjust the graph. Copy the code in a new code cell for running it and customizing your visualization.\n",
"\n",
"\n",
"```Python\n",
"viz = pcf. visualize(data, spacing=50, node_size=10, width=None, height=None, minValue=1, \n",
" maxValue=10, node_scaling=\"linear\", connection_type=\"semi-curved\", \n",
" color_startEnd=True, color_categories=True, nodes_color=\"gray\",\n",
" start_node_color=\"green\", end_node_color=\"red\", palette=None, show_labels=True,\n",
" label_text=\"item\", label_font=\"sans-serif\", label_color=\"black\", label_size=5,\n",
" label_shortening=\"clip\", label_position=\"nodes\", line_opacity=0.5, \n",
" line_stroke_color=\"white\", line_stroke_width=0.5, \n",
" line_stroke_thick=0.5, legend=True, sort_by=\"frequency\")\n",
"```\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7dd17601-33cc-4fb2-bc8f-ff7315c63ce1",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "6279beef-a789-424d-a5f0-189fb96749d2",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
pycatflow
pandas
@ihmon
Copy link

ihmon commented Dec 12, 2021

Thank you very much for your notebook.
I think I will have oppotunity to use pycatflow in graph application, so I borrowed your notebook.
I run Step3 in your notebook with your sample dataset in my environment however following list index out of range error occurred.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_12628/3875842020.py in <module>
----> 1 data = pcf.read(raw_data, columns=viz_columns.value, 
      2                 nodes=viz_nodes.value, categories=viz_category.value,
      3                 column_order=viz_col_order.value)
      4 
      5 if data:

~/miniconda3/envs/edalab1/lib/python3.8/site-packages/pycatflow/input.py in read(data, columns, nodes, categories, column_order, orientation, delimiter, line_delimiter, prefix)
    211         for h in headers:
    212             # data[h] = [line.split(delimiter)[headers.index(h)] for line in lines]
--> 213             data[h.replace('\r', '')] = [line.split(delimiter)[headers.index(h)].replace('\r', '') for line in lines]
    214     if type(data) == list:
    215         headers = data[0]

~/miniconda3/envs/edalab1/lib/python3.8/site-packages/pycatflow/input.py in <listcomp>(.0)
    211         for h in headers:
    212             # data[h] = [line.split(delimiter)[headers.index(h)] for line in lines]
--> 213             data[h.replace('\r', '')] = [line.split(delimiter)[headers.index(h)].replace('\r', '') for line in lines]
    214     if type(data) == list:
    215         headers = data[0]

IndexError: list index out of range

Do you have any idea to solve this error?
Thank you,

FYI: following is assigned to raw_data variable. I changed delimiter to comma. but tab did not work too.

column,items,category,column order\n2015-09-08,fuzzywuzzy,A_Requirements,1\n2015-09-08,requests,A_Requirements,1\n2015-09-08,requests-oauthlib,A_Requirements,1\n2015-09-08,pymongo,A_Requirements,1\n2015-09-08,jsondatabase,A_Requirements,1\n2016-09-08,jsondatabase...

@bumatic
Copy link
Author

bumatic commented Dec 17, 2021

Thanks for your interest and for pointing me to the error. However, running the Notebook in mybinder.org and on my local machine, I was unable to replicate it.

Taking a closer look at step 3 I encountered that the delimiter was not passed on to the pcf.read function. This has been fixed. However, it shouldn’t have caused an error on the default setup and the example data since pcf.read tries to identify the delimiter based on heuristics and these heuristics work at least for the sample data.

One possible cause of your error could be that you run an outdated version of the package. Please make sure that you are running the current version of PyCatFlow.

For providing further assistance I need more detailed information.

Have you checked the output of the second code cell of step 1 (Execution counter 3 in the below image)? It should look something like this:

pycatflow

@ihmon
Copy link

ihmon commented Jan 4, 2022

Thank you for your response.
I ran all cells one by one and I don't think I missed something because your notebook is very user friendly.
I run the same code on Colab which I used at that time, and then I find it is no problem w/o any error. Everything is perfect. I have not changed anything, only difference is version of pycatflow.
I do not remember which version it was, but this time it is following.

"Successfully installed cairoSVG-2.5.2 cairocffi-1.3.0 cssselect2-0.4.1 drawSVG-1.8.3 pycatflow-0.0.8 tinycss2-1.1.1"

As you mentioned, version might have been old at that time.
I also checked the same code on my conda env w/ pycatflow-0.0.8, it is perfectly processed.

Thank you for your kindly support.
B.R,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment