-
Summary of what they are and how they are created.
- Scatter plots are a type of vsiualization where each record in the dataset shown by a shape placed at a certain pos
-
How they are used.
- The primary use of scatter plots is typically to compare two continuous features. This is done by useing one continous feature value as the x coordinate of each point and the other continous feature value as the y coordinate.
The dataset is a geojson file that specifies the shape of all the counties in the united states. I got shapefiles from the census website and converted into them into geojson using the QGIS software.
The color attribute is mapped to the length of the name of the county. The buttons at the top allow for zooming. The sliders along the left and bottom of the map allow for panning.
The Occupancy Detection Dataset
Experimental data used for binary classification (room occupancy) from Temperature, Humidity, Light and CO2. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute.
Dataset contains 20,560 records.
Source: https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
{"nodes":[{"id":3466,"group":8},{"id":10310,"group":13},{"id":5052,"group":29},{"id":5346,"group":20},{"id":15159,"group":4},{"id":19640,"group":25},{"id":10243,"group":14},{"id":18648,"group":4},{"id":16470,"group":1},{"id":17822,"group":1},{"id":14265,"group":37},{"id":19738,"group":5},{"id":8612,"group":18},{"id":10822,"group":2},{"id":16258,"group":13},{"id":21194,"group":1},{"id":14123,"group":13},{"id":2710,"group":33},{"id":18757,"group":8},{"id":16148,"group":18},{"id":10794,"group":2},{"id":7050,"group":6},{"id":4846,"group":22},{"id":824,"group":13},{"id":2133,"group":12},{"id":6610,"group":68},{"id":6700,"group":31},{"id":11082,"group":12},{"id":14419,"group":14},{"id":17330,"group":17},{"id":18487,"group":27},{"id":22779,"group":11},{"id":23382,"group":30},{"id":12928,"group":11},{"id":13740,"group":11},{"id":13096,"group":22},{"id":22393,"group":5},{"id":3872,"group":8},{"id":23096,"group":1},{"id":8862,"group":7},{"id":22598,"group":18},{"id":8254,"group":13},{"id":17309,"group":1},{"id":24833," |
Spark has two stateful locations that are used to manage tables when running locally.
-
The
spark_warehouse
is a directory specified by the session settingspark.sql.warehouse.dir
. This is the location where every table's data files are stored. -
The metastore is an in-memory database that stores metadata about each table (its location, etc.)
This gist documents an issue I have had when performing Spark interop from Clojure. When higher order functions are used, a serialization error is thrown that I can't make sense of.
not_working.clj
has the minimal Clojure to reproduce the issue.working.scala
has a direct translation of the Clojure code into Scala. It does not throw the exception.logs_and_exception.log
has the Spark logs and exception trace that are produced when runningnot_working.clj
.
Below is addition information about when the exception does/doesn't occur.
- The exception is not raised (and
-main
behaves correctly) when: not_working.clj
is compiled into an uberjar.
import ast | |
import operator as op | |
from abc import ABC, abstractmethod | |
from dataclasses import dataclass | |
from typing import List, Callable, Dict, Any, Optional, Tuple | |
import astor | |
from pyrsistent import pvector, PVector | |
(ns build | |
(:require [clojure.tools.build.api :as b])) | |
(def lib 'com.nortia-solutions/ppi-core) | |
(def version "0.0.1") | |
(def class-dir "target/classes") | |
(def uber-file (format "target/%s-%s-standalone.jar" (name lib) version)) |
(ns upush | |
(:require [clojure.math :refer [log]] | |
[clojure.string :as str] | |
[clojure.math.combinatorics :refer [selections]])) | |
(def instructions | |
{'+ {:fn + | |
:arity 2 | |
:invariant (fn [a b] |