pete-murphy/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Instructor notebooks:
https://observablehq.com/d/fa16a9680714478d?collection=@observablehq/data-vis-course
YouTube recordings:

Session 1 https://www.youtube.com/watch?v=WJ1c54Ab-o8


## session-1.md

      
    Raw
  

              session-1.md
            
          
    Scatterplot

Simple(st?) chart: just mapping two values from dataset to see relationship (e.g. fuel economy (mpg) vs power (hp))
If not seeing continuous values, probably not the chart to use (e.g. cylinder count, though a number, is not continuous)
Difference between bar chart and scatterplot

Bar chart can have its axis sorted arbitrarily, scatterplot has to be continuous scale
Bar charts

Precision doesn't matter so much for data vis
Good for quickly comparing values by category
Bad practice: starting at non-zero for bar chart value axis misrepresents data (makes it hard to make comparison between bars)
Pie charts

Unlike bar charts, need to show whole dataset because it represents parts-to-whole
Not good for comparison between the parts (use bar chart for that)
Pie chart & bar chart are similar in that they both have two mappings, to category and to value
Line chart

Generally "continuous process" that changes over time
Almost always the horizontal axis is time
When can line charts go wrong?

compressing the aspect ratio can skew the representation of slope

Rule of thumb for line chart (not based on research, but common practice): "bank to 45 degrees"
Other

Instructor answered question about "radar charts": "haven't seen a good use case for radar chart" :)
Bad practices are collected in "How to lie with statistics"

  
## session-2.md

      
    Raw
  

              session-2.md
            
          
    Charts are not meant to be used for high-precision, just to get a rough sense of what the numbers are
Using retinal variables to encode data: e.g. simple scatterplot, one variable would be position
Retinal variables also called visual encodings

... discussion of JavaScript, how Observable works, and Plot API ...

Marks

Marks are specified by function calls. For example dot
Plot.plot({
  marks: [
//                   👇 This is mapping cars data to the position retinal variable
    Plot.dot(cars, { x: "power (hp)", y: "economy (mpg)" })
  ]
})
Plot & vega-lite (and similar libraries) share terminology (such as "marks") from common ancestry in:

Grammar of Graphics by Leland Wilkinson (1999)
Semiology of Graphics by Jacques Bertin (1967)

Plot.plot({
  marks: [
//                                                       👇 This is a constant color
    Plot.dot(cars, { x: "power (hp)", y: "economy (mpg)", fill: "steelblue" })
  ]
})
Plot.plot({
  //                                                 👇 This is another encoding
  marks(cars, { x: "power (hp)", y: "economy (mpg)", fill: "cylinders" })
})
In general, it's not a good idea to add too many variables mapped to visual variables, becomes impossible to read
Distinguishing between continuous and categorical scales
Plot.plot({
  marks(cars, { x: "power (hp)", y: "economy (mpg)", fill: "cylinders" })
  color: {
    // By default this will be a _continuous_ color scheme
    // (because of magic inspection of cylinders domain)
    legend: true
  }
})
Plot.plot({
  marks(cars, { x: "power (hp)", y: "economy (mpg)", fill: "cylinders" })
  color: {
    legend: true,
    // This makes the categories of color scheme explicit
    // (and changes the color scheme, because no longer continuous)
+   domain: [3, 4, 6, 8]
  }
})
Dot plot

Different from scatterplot, has categorical axis (like car make) and within each category, a continuous scale that values within that category are mapped
Looking at cars dataset, can see the breakdown of different "tiers" of performance within car models: for Pontiac, a couple at the high-end, three or so in mid-high-range, then rest clustered towards the lower end
Table view

Plot's table view is really nice! Gives DB-table view but with some smart histograms/filtering/sorting baked in.
Line chart

Different from dot: for any given date (x value) there can only be one (and should be exactly one) y value
Force a line chart to include 0 with Plot.ruleY([0]) mark
Data types (categorical vs continuous)

Important distinction between different types of data:

There's nothing between Kia & Honda, or nothing between letters "A" & "B", they're just names
can be unsorted, lots of ways they can be sorted, but they don't have an inherent ordering
this is categorical data

People tend to get this wrong (e.g., showing alphabet frequency as line chart, doesn't make sense because it's showing continuity as if there were values between individual letters)
One of the reasons radar charts are bad, changing the axes around changes the shape of the chart
Distinction is not always clear, for example, aggregating continuos data, you might want to show large bins as categorical (years as bars)

  
## session-3.md

      
    Raw
  

              session-3.md
            
          
    Session 3

Pop out effect (pre-attentive vision) its importance is overstated in data vis
The thought is that a single color bar stands out and is processed before anything else
Motion effective, and distinctive color (single blue bar in an otherwise-gray bar chart)
Specifically useful for presentation, when pointing people to something
In early phases of data exploration don't want that (want to avoid any kind of "bias")
Early stages of data vis (exploration) should be boring
"Don't use presets because they often have colors" (this is true of a lot of chart libraries, Plot's defaults try to be boring)
Unusual charts are memorable, take longer to process (Sankey)
Comparison

Not biasing, but helping to understand the chart: e.g., adding a 1/26 line for the alphabet frequency (those bars extending above are more frequent than if they were all equally probable)
or adding reference line such as average among many lines in line chart
For comparisons between spikey data, useful to smooth the values (makes higher level pattern easier to parse)

  
## session-4.md

      
    Raw
  

              session-4.md
            
          
    Session 4: Transformations

Grouping

Binning & Histogram

Binning creates a histogram, shows distribution of data
If looking at a bunch of overlapping data, could use dodge which will push plotted points so they stack
dodge "pushes apart" plotted data points (dots)
Sometimes affected by quantization (or reveals quantization, you can see higher aggregations around specific points, like multiples of 10)
(Same as beeswarm chart? I think, but I missed what he said there)
Binning does similar to dodge, but slices data into ranges (bins), can control the granularity/size of bins to reduce the spikiness/noise of raw data
Can also pass in explicit threshold values instead of number of bins
"Noise can be interesting (might point to errors in data gathering), but could just be noise"
Stacking

Stacking can be a bad idea.
(Mentions bad idea to have too many categories)
Comparison can be hard with stacked bar charts (bars aren't starting at same point on axis).
People have a harder time with stacked bar charts than pie charts (comparison also difficult with pie charts).
Stacked area: same issue, can read the area on the bottom pretty well, but once they're stacked its hard to compare.
Hard to read the vertical width of each section, especially when at extreme angles, distorts the vertical width (which is actual value)
Other views (silhouette, wiggle) try to make the stack easier to interpret
Windowing

For continuous value, smooth by taking average over window (neighborhood of values)

  
## session-5.md

      
    Raw
  

              session-5.md
            
          
    Session 5:

(I was very tired and didn't take many notes on this one)
Facets
Grouped bar charts: give you something that is difficult in other repre

comparison within a group
comparison across groups

Index chart
Stock prices, e.g., can show starting from a collapsed single point (0) and then plot multiples of initial value
Shows change (not comparison of absolute value)
In Plot, you can use normalizeY for index chart