paulk-asert/Whiskey.ipynb

## Whiskey.ipynb
{"cells":[{"metadata":{},"cell_type":"markdown","source":"# Whiskey clustering \n\n[Tablesaw](https://tablesaw.tech/) provides the ability to easily transform, summarize, and filter data, as well as computing descriptive statistics. It can also be used easily with libraries like Smile, which provides fundamental machine learning algorithms.\n\nThis notebook has some basic demos of how to use Tablesaw, including visualizing the results for which it uses the BeakerX interactive visualization APIs. Tablesaw also provides its own visualization APIs if you wish to do visualization outside of BeakerX. The notebook covers basic table manipulation, k-means clustering, linear regression, and fetching financial data."},{"metadata":{"trusted":true},"cell_type":"code","source":"%%classpath add mvn\ntech.tablesaw tablesaw-beakerx 0.36.0\ncom.github.haifengl smile-core 1.5.3","execution_count":11,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"","version_major":2}},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"%import static tech.tablesaw.aggregate.AggregateFunctions.*\n%import tech.tablesaw.api.*\n%import tech.tablesaw.columns.*\n%import smile.clustering.*\n%import smile.regression.*\n\n// display Tablesaw tables with BeakerX table display widget\ntech.tablesaw.beakerx.TablesawDisplayer.register()","execution_count":12,"outputs":[{"output_type":"execute_result","execution_count":12,"data":{"text/plain":"null"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"## K-means clustering\n\nK-means is the most common form of “centroid” clustering. Unlike classification, clustering is an unsupervised learning method. The categories are not predetermined. Instead, the goal is to search for natural groupings in the dataset, such that the members of each group are similar to each other and different from the members of the other groups. The K represents the number of groups to find.\n\nWe’ll use a well known Scotch Whiskey dataset, which is used to cluster whiskeys according to their taste based on data collected from tasting notes. As always, we start by loading data and printing its structure."},{"metadata":{"trusted":true},"cell_type":"code","source":"whiskeyData = Table.read().csv(\"../resources/whiskey.csv\")\nwhiskeyData.shape()","execution_count":13,"outputs":[{"output_type":"execute_result","execution_count":13,"data":{"text/plain":"86 rows X 14 cols"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"whiskeyData.structure()","execution_count":14,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"78a540bb-4fbe-4435-b738-acb3adee7ab1","version_major":2}},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"kMeans = new KMeans(whiskeyData.as().doubleMatrix(\"Body\", \"Sweetness\", \"Smoky\", \"Medicinal\", \"Tobacco\", \"Honey\", \"Spicy\", \"Winey\", \"Nutty\", \"Malty\", \"Fruity\", \"Floral\"), 5)","execution_count":15,"outputs":[{"output_type":"execute_result","execution_count":15,"data":{"text/plain":"K-Means distortion: 388.69993\nClusters of 86 data points of dimension 12:\n  0\t   38 (44.2%)\n  1\t   22 (25.6%)\n  2\t    9 (10.5%)\n  3\t   14 (16.3%)\n  4\t    3 ( 3.5%)\n"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"Table whiskeyClusters = Table.create(\"Clusters\", whiskeyData.stringColumn(\"Distillery\"), DoubleColumn.create(\"Cluster\", kMeans.getClusterLabel()))\nwhiskeyClusters = whiskeyClusters.sortAscendingOn(\"Cluster\", \"Distillery\")","execution_count":16,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"85d6f754-1211-4b3e-9b40-e7e7fdc628f1","version_major":2}},"metadata":{}}]},{"metadata":{"trusted":false},"cell_type":"code","source":"","execution_count":null,"outputs":[]}],"metadata":{"anaconda-cloud":{},"kernelspec":{"name":"groovy","display_name":"Groovy","language":"groovy"},"language_info":{"nbconverter_exporter":"","codemirror_mode":"groovy","name":"Groovy","mimetype":"","file_extension":".groovy","version":"2.5.6"},"toc":{"nav_menu":{},"number_sections":false,"sideBar":false,"skip_h1_title":false,"base_numbering":1,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{},"toc_section_display":false,"toc_window_display":false}},"nbformat":4,"nbformat_minor":2}