Skip to content

Instantly share code, notes, and snippets.

@paulk-asert
Created December 11, 2019 16:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save paulk-asert/d0b921ec0b25c9c2175a1492c3bcaf97 to your computer and use it in GitHub Desktop.
Save paulk-asert/d0b921ec0b25c9c2175a1492c3bcaf97 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{"cells":[{"metadata":{},"cell_type":"markdown","source":"# Whiskey clustering \n\n[Tablesaw](https://tablesaw.tech/) provides the ability to easily transform, summarize, and filter data, as well as computing descriptive statistics. It can also be used easily with libraries like Smile, which provides fundamental machine learning algorithms.\n\nThis notebook has some basic demos of how to use Tablesaw, including visualizing the results for which it uses the BeakerX interactive visualization APIs. Tablesaw also provides its own visualization APIs if you wish to do visualization outside of BeakerX. The notebook covers basic table manipulation, k-means clustering, linear regression, and fetching financial data."},{"metadata":{"trusted":true},"cell_type":"code","source":"%%classpath add mvn\ntech.tablesaw tablesaw-beakerx 0.36.0\ncom.github.haifengl smile-core 1.5.3","execution_count":11,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"","version_major":2}},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"%import static tech.tablesaw.aggregate.AggregateFunctions.*\n%import tech.tablesaw.api.*\n%import tech.tablesaw.columns.*\n%import smile.clustering.*\n%import smile.regression.*\n\n// display Tablesaw tables with BeakerX table display widget\ntech.tablesaw.beakerx.TablesawDisplayer.register()","execution_count":12,"outputs":[{"output_type":"execute_result","execution_count":12,"data":{"text/plain":"null"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"## K-means clustering\n\nK-means is the most common form of “centroid” clustering. Unlike classification, clustering is an unsupervised learning method. The categories are not predetermined. Instead, the goal is to search for natural groupings in the dataset, such that the members of each group are similar to each other and different from the members of the other groups. The K represents the number of groups to find.\n\nWe’ll use a well known Scotch Whiskey dataset, which is used to cluster whiskeys according to their taste based on data collected from tasting notes. As always, we start by loading data and printing its structure."},{"metadata":{"trusted":true},"cell_type":"code","source":"whiskeyData = Table.read().csv(\"../resources/whiskey.csv\")\nwhiskeyData.shape()","execution_count":13,"outputs":[{"output_type":"execute_result","execution_count":13,"data":{"text/plain":"86 rows X 14 cols"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"whiskeyData.structure()","execution_count":14,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"78a540bb-4fbe-4435-b738-acb3adee7ab1","version_major":2}},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"kMeans = new KMeans(whiskeyData.as().doubleMatrix(\"Body\", \"Sweetness\", \"Smoky\", \"Medicinal\", \"Tobacco\", \"Honey\", \"Spicy\", \"Winey\", \"Nutty\", \"Malty\", \"Fruity\", \"Floral\"), 5)","execution_count":15,"outputs":[{"output_type":"execute_result","execution_count":15,"data":{"text/plain":"K-Means distortion: 388.69993\nClusters of 86 data points of dimension 12:\n 0\t 38 (44.2%)\n 1\t 22 (25.6%)\n 2\t 9 (10.5%)\n 3\t 14 (16.3%)\n 4\t 3 ( 3.5%)\n"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"Table whiskeyClusters = Table.create(\"Clusters\", whiskeyData.stringColumn(\"Distillery\"), DoubleColumn.create(\"Cluster\", kMeans.getClusterLabel()))\nwhiskeyClusters = whiskeyClusters.sortAscendingOn(\"Cluster\", \"Distillery\")","execution_count":16,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"85d6f754-1211-4b3e-9b40-e7e7fdc628f1","version_major":2}},"metadata":{}}]},{"metadata":{"trusted":false},"cell_type":"code","source":"","execution_count":null,"outputs":[]}],"metadata":{"anaconda-cloud":{},"kernelspec":{"name":"groovy","display_name":"Groovy","language":"groovy"},"language_info":{"nbconverter_exporter":"","codemirror_mode":"groovy","name":"Groovy","mimetype":"","file_extension":".groovy","version":"2.5.6"},"toc":{"nav_menu":{},"number_sections":false,"sideBar":false,"skip_h1_title":false,"base_numbering":1,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{},"toc_section_display":false,"toc_window_display":false}},"nbformat":4,"nbformat_minor":2}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment