agness/F19 W4995 Dataviz Midterm Topics.md

## F19 W4995 Dataviz Midterm Topics.md

      
    Raw
  

              F19 W4995 Dataviz Midterm Topics.md
            
          
    Why Visualize?


The goals of visualization

Exploratory: to uncover a relationship in the data, to analyze data
Explanatory: to communicate a relationship in the data, to present data


Anscombe’s Quartet

Datasets that have identical summary statistics can appear very different when graphed.


Designing


Divergent vs. convergent idea generation
How to critique example: Tufte Challenger Criticisms

Part of the chart, i.e. the legend, was on previous slide
Chartjunk: Good design brings absolute attention to the data (rocket shapes distract when dots would suffice)
Obscures Cause and Effect: O-ring damage depicted as scattered location data when a summary “% damage” would be clearer
Wrong Order: ordering by sequence of launch hides relationship between damage and temperature


How we see

We don’t view in a fixed order
We see first what stands out
We see only a few things at once
We seek meaning and make connections
We rely on conventions and metaphors


Ingredients of a good critique

Encourage the designer to clarify their thought process.
Challenge the designer's assumptions.
Encourage the designer to consider alternative perspectives.
Encourage the designer to spell out the implications and consequences of their design.


How to practice design

Define the goal of the graphic

What story you want to tell
The key points to be made
What readers will be able to understand/accomplish with the viz


Diversify

Gather inspiration
Sketch or storyboard ideas


Refine

Synthesize best ideas in prioritized order
Move design to the computer and complete it


Data Models


Data types (Q, O, C) are semantic models of data (a.k.a. “levels of measurement” in statistics)
Programming “Data type” vs. Our conceptual model

Data types like int, float... are formal descriptions
Conceptual models are mental constructions: Include semantics and support reasoning
Examples data vs. conceptual

1D floats vs. temperatures
3D vector of floats vs. spatial location


Data types (Q ⊃ O ⊃ C)

C: Categorical
O: Ordinal
Q-interval: location of zero not important
Q-ratio: zero-fixed


Taxonomy of viz types

Ordinal-ordinal
Ordinal-quantitative
Quantitative-quantitative


Tour of Visualization Zoo

Stacked / Stream graphs
Small multiples
Horizon graph
Stem & leaf plot
Scatterplot Matrix
Parallel coordinates
Flow / Sankey diagram
Choropleth map
Graduated symbol map


Visualization Tools, listed in order of decreasing ease-of-use, and increasing expressiveness:

Chart typologies: e.g. Excel
Visual analysis grammars: ggplot, Vega, tableau
visualization grammars: D3
graphics APIs: processing, openGL


Visualization Grammar

Data: Input data to visualize
(Data) Transforms: grouping, binning, statistics
Marks: Data-representative graphics
Scales: Map data values to visual values
Guides: Axes & legends to show scales


Summary: Data Models

Data can be shaped and transformed into types
The type of the data partially determines acceptable format of the visualization
The communication goals determine the data and its type


Summary: Visualization Tools

Grammar of graphics defines a modular and scalable way to create expressive graphics
Different tools are designed for different uses (e.g. are you doing exploratory or explanatory vis?)


Exploratory Data Analysis


EDA is an approach/philosophy for data analysis using graphical methods to

uncover underlying structure
detect outliers and anomalies
test underlying assumptions


EDA was introduced by Tukey (statistician) with new techniques for visualizing and summarizing data:

5-number summary
box plots (visual 5-number)
stem & leaf diagrams


Difference from classical analysis

Exploratory data analysis ~ detective work ~ gathering evidence
Confirmatory data analysis ~ court trial ~ evaluating evidence


Iterative Hypotheses Refinement during EDA

Formulate > support hypotheses to later confirm them, or refute them, or drop them > repeat.


Characteristics of exploratory graphs

made quickly
a large number are made
goal is for personal understanding
axes/legends are generally cleaned up (later)
color/size are primarily used for information


Overview of process steps

import/clean
single variable exploration
pair-wise exploration
multivariate analysis


Visual Encoding


Marks = basic graphical element in image

Point (0-dimensions)
Line (1-dimensional)
Area (2-dimensional)


Channels = ways to control appearance of marks

position
shape
color
tilt
size


Visual Encoding = Mapping data to visual variables

Assign data fields (Q, O, C) to visual channels (x, y, color, size, etc.) for a graphical mark (point, bar, line, etc.)
...Also, choose encoding parameters (log scale, sorting, etc.) and data transformations (bin, aggregate, etc.)
...To maximize expressiveness and effectiveness.


Expressiveness

Tell the truth and nothing but the truth (don’t lie, and don’t lie by omission)


Effectiveness

Use encodings that people decode better (where better = faster and/or more accurate)


Interaction


Color

Color Spaces

RGB invented for screens, CMYK for print
HSV stands for

Hue: red v. blue v. Green
Saturation: amount of color
Value: lightness between black and white


CIELAB designed to be perceptually uniform with regards to human vision


Color schemes

sequential: perceptually equidistant steps for Ordinal data
divergent: perceivable “break” in middle of ordered range
qualitative aka Categorical: equidistant “bag of colors”, no order


Summary: Color

Equal distances in digital color space may not be perceived as equidistant colors.
Appropriate color scheme depends on

data model (Q, O, C)
whether a fixed point is meaningful
whether both ends of data range are meaningful (single vs. multi-hue)
whether there should be positive/negative associations (e.g. red v. green) or brand associations (e.g. red for Republican Party, blue for Democratic)


Color choice has accuracy, readability, and emotional impact on your visualization.


Gulfs of Execution and Evaluation

The user interfaces minimize how much the user has to think to use the software. Specifically we want to minimize:

Gulf of Execution = the difference what you want to do and what actions the software allows you to do
Gulf of Evaluation = the amount of effort it takes you to figure out whether what the software did is what you wanted to do


User interface ideal: “direct manipulation”

visual representation of objects and actions (vs. command line or text-menu)
selection by pointing (vs. typing in SQL)
Rapid, incremental and reversible actions (easier to understand/revise allowable actions at each state; gulf of execution)
Immediate and continuous display of results (so user knows what system is doing at each state; gulf of evaluation)


Taxonomy of Interactions

Data and View Specification

Visualize, Filter, Sort, Derive


View Manipulation

Select, Navigate, Coordinate, Organize


Process and Provenance

Record, Annotate, Share, Guide


Geometric vs. Semantic zoom

Geometric zoom does not give new information at different zoom levels
Semantic zoom yields more detailed information at deeper zoom levels (zooms into the information hierarchy as well as visually zooming)


Selection Methods

Point Selection

Mouse Hover
Click / Touch / Tap


Brushing = Region Selection

Rubber-band (rectangular) or Lasso (freehand)


Area cursors

Bubble Cursor, Voronoi selection


Cross-filtering = Brushing, Linking & Highlighting

Brush to choose a subgroup of the data points. Connect data from one part of the display to another; indicate this connection visually to make selected items stand out.


Principles of Interaction in Visualization

Rapid, reversible feedback
Immediate and continuous
First given an overview, then details on demand


Evaluation & Perception


Evaluation

Four-level Evaluation Framework by Munzner

Domain situation
Data/task abstraction
Vis encoding/interaction idiom
Algorithm


Purpose of levels: separately analyze whether goals of each level has been met: with a poor choice in the abstraction stage, even perfect choices at the idiom and algorithm levels will not result in a vis that solves the problem.
Domain: the field of interest of the target users of a vis tool.
Task Abstraction: questions from different domains can map to the same abstract vis tasks, e.g. browsing, comparing, summarizing.
Two approaches for vis design

problem-driven work: start at understanding domain and then identify appropriate abstraction
technique-driven work: invent new algorithms or idioms and then identify domains in which they would be useful


Perception

Pre-attentive attributes = Visual attributes whose detection precedes conscious attention

Form: length, width, orientation, size, shape
Color: hue, intensity
spatial grouping, motion


Perception is always relative

Steven’s law: “doubling the physical brightness results in a perception that is considerably less than twice as bright”

(Definition: difference between actual change in a physical stimulus and the perceived change follows a power function: S = I^n, where n ranges from the sublinear 0.5 for brightness to the superlinear 3.5 for electric current.)


Weber’s law = “the amount of length difference we can detect is a percentage of the object’s length.”

(Definition: the detectable difference in stimulus intensity I is a fixed percentage K of the object magnitude δI/I = K)


Gestalt (from German “form”): patterns that transcend the stimuli used to create them:

Proximity
Similarity
Continuity
Connectedness


Summary: Perception

We do not see 1:1, and we do not attend to everything that we see
We’re drawn to patterns we know and expect
Our working memory is limited
Use these techniques to direct attention in your visualizations, to build visual hierarchy


Design Principles

Above all else, show the data. Maximize the data-ink ratio.
Clear labeling and explanations on the graphic.
Number of information-carrying dimensions should not exceed number of dimensions in data.
The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities measured.


Skills


Be able to critique pros/cons of a visualization.
Be able to deconstruct/reconstruct a visualization to/from a set of visual encodings.

Components: Data types, data transformations, marks, channels, scales
Communication Goals: the underlying data/information priority intended by the vis author, and the corresponding visual hierarchy


For lists of definitions/taxonomies above, given a screenshot of a visualization, be able identify which of them it is.
Be able to interpret D3 code containing basic functions on this sheet.