Skip to content

Instantly share code, notes, and snippets.

@sechilds
Last active July 25, 2018 22:28
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sechilds/832961815e6597436d5d to your computer and use it in GitHub Desktop.
Save sechilds/832961815e6597436d5d to your computer and use it in GitHub Desktop.
git filter to strip the thumnails from a tableau workbook.
<?xml version='1.0' encoding='utf-8' ?>
<!-- build 9100.15.1013.2200 -->
<workbook source-platform='mac' version='9.1' xmlns:user='http://www.tableausoftware.com/xml/user'>
<preferences>
<preference name='ui.encoding.shelf.height' value='24' />
<preference name='ui.shelf.height' value='26' />
</preferences>
<datasources>
<datasource caption='Earnings - ICT (ICT_Vetted)' inline='true' name='excel-direct.42297.702443067130' version='9.1'>
<connection class='excel-direct' cleaning='no' compat='no' dataRefreshTime='' filename='/Users/sechilds/Dropbox/Work/Space/Tableau Tax/ICT_Vetted.xlsx' interpretationMode='0' password='' server='' username='' validate='no'>
<relation name='&apos;Earnings - ICT$&apos;' table='[&apos;Earnings - ICT$&apos;]' type='table'>
<columns gridOrigin='A1:H183:no:A1:H183' header='yes' outcome='6'>
<column datatype='string' name='Graduation Year' ordinal='0' />
<column datatype='integer' name='Years Since Graduation' ordinal='1' />
<column datatype='string' name='ICT Flag' ordinal='2' />
<column datatype='integer' name='Mean' ordinal='3' />
<column datatype='integer' name='Mean Low' ordinal='4' />
<column datatype='integer' name='Mean High' ordinal='5' />
<column datatype='integer' name='Median' ordinal='6' />
<column datatype='integer' name='Rounded Count' ordinal='7' />
</columns>
</relation>
<metadata-records>
<metadata-record class='column'>
The root of the XML tree is a workbook tag
with the attribute {'version': '9.1', 'source-platform': 'mac'}
The children of the root are:
preferences
datasources
worksheets
windows
thumbnails
Looking at each individual thubnail.
Printing first three lines only.
thumbnail {'width': '192', 'name': 'ICT Graduates', 'height': '192'}
iVBORw0KGgoAAAANSUhEUgAAAMAAAADACAYAAABS3GwHAAAACXBIWXMAAAsTAAALEwEAmpwY
AAAgAElEQVR4nO2deZycVZX3v7XvWy/Ve3d6S9LZSCchEBYRgXHEGQERB0HAZWTGEXl1RmZG
35Hx1XF0dMAR0XGBQRBl3wZEERCQkASydfalk/TeXb1Wde1Vz3LfP550gJClkt6qU883n/50
...
thumbnail {'width': '192', 'name': 'Faculty Groups', 'height': '192'}
iVBORw0KGgoAAAANSUhEUgAAAMAAAADACAYAAABS3GwHAAAACXBIWXMAAAsTAAALEwEAmpwY
AAAgAElEQVR4nOy9d3xcV5n//57eR9M00ox6b7Ysdye2Yyex00MCSVhI2MAXFlgg/HbZpZcN
9Uv7wrLAUneBhYRkgUASElKdOLbjbsu2rN6lURlpem/33t8fsiYxTpG7Q/R+ve5LI2lm7p07
...
#! /usr/bin/python
"""
git filter to strip the thumnails from a tableau workbook.
To install, store a copy of this file into your $PATH.
Then you need to tell git about the filter
git config --global filter.tableau.smudge "tableau_clean_thumnails.py"
Then you need to specify which files to filter.
In this case, you want to filter Tableau workbooks (.twb)
echo '*.twb filter=tableau' >> .gitattributes
You can learn more about git filters and .gitattributes here:
https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes
"""
import sys
import os
from xml.etree import ElementTree
def smudge():
#to_parse = sys.stdin.read()
tree = ElementTree.parse(sys.stdin)
root = tree.getroot()
thumbnails = list(root.findall('thumbnails'))[0]
for i in thumbnails.findall('thumbnail'):
thumbnails.remove(i)
tree.write(sys.stdout, encoding="unicode")
if __name__ == "__main__":
smudge()

Filter Tableau Workbook Thumbnails

This is a git filter to strip the thumnails from a tableau workbook.

To install, store a copy of this file into your $PATH. Git looks for a script to run that takes a file in as standard input and outputs a text file.

Then you need to tell git about the filter

git config --global filter.tableau.smudge "tableau_clean_thumnails.py"

Then you need to specify which files to filter. In this case, you want to filter Tableau workbooks (.twb)

echo '*.twb filter=tableau' >> .gitattributes

You can learn more about git filters and .gitattributes in the git book

The script uses the xml module from Python's standard library. It parses the XML tree of a tableau workbook and removes all the thumbnails. Then it writes the remaining XML tree to the standard output.

"""
Simple python script to look at the top level of a
XML file.
"""
from xml.etree import ElementTree
import sys
def xml_hierarchy(filename):
with open(filename, 'r') as f:
tree = ElementTree.parse(f)
root = tree.getroot()
print('The root of the XML tree is a {tag} tag\nwith the attribute {attrib}\n'.format(tag=root.tag, attrib=root.attrib))
print('The children of the root are:\n')
for child in root:
print(child.tag)
print('\nLooking at each individual thubnail.')
print('Printing first three lines only.\n')
thumbnails = list(root.findall('thumbnails'))[0]
for tb in thumbnails.findall('thumbnail'):
print(tb.tag, tb.attrib)
for line in tb.text.split('\n', 5)[1:4]:
print(line)
print(' ...\n')
if __name__=='__main__':
xml_hierarchy(sys.argv[1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment