agness/W4995.004 F2018 Midterm Topics.md

## W4995.004 F2018 Midterm Topics.md

      
    Raw
  

              W4995.004 F2018 Midterm Topics.md
            
          
    Why Visualize?


The goals of visualization

Exploratory: to uncover a relationship in the data, to analyze data
Explanatory: to communicate a relationship in the data, to present data


Anscombe’s Quartet

Datasets that have identical summary statistics can appear very different when graphed.


Designing


Divergent vs. convergent idea generation
Tufte Challenger Criticisms

Disappearing Legend: part of the chart (legend) was on previous slide
Chartjunk: Good design brings absolute attention to the data (depicting rocket shapes prominently misplaces priorities)
Lack of Clarity in Depicting Cause and Effect: O-ring damage depicted as scattered location data when a summary “% damage” would be clearer
Wrong Order: ordering by sequence of launch conceals the possible relationship between O-ring damage and temperature


How we see

We don’t view in a fixed order
We see first what stands out
We see only a few things at once
We seek meaning and make connections
We rely on conventions and metaphors


Ingredients of a good critique

Encourage the designer to clarify their thought process.
Challenge the designer's assumptions.
Encourage the designer to consider alternative perspectives.
Encourage the designer to spell out the implications and consequences of their design.


How to practice design

Define the goal of the graphic

What story you want to tell
The key points to be made
What readers will be able to understand/accomplish with the viz


Diversify

Gather inspiration
Sketch or storyboard ideas


Refine

Synthesize best ideas in prioritized order
Move design to the computer and complete it


Data Models


Data types (Q, O, C) are semantic models of data
Data model vs. Conceptual model

Data models are formal descriptions: Math: sets with operations on them
Conceptual models are mental constructions: Include semantics and support reasoning
Examples data vs. conceptual

1D floats vs. temperatures
3D vector of floats vs. spatial location


Data types (Q ⊃ O ⊃ C)

C: Categorical
O: Ordinal
Q-interval: location of zero not important
Q-ratio: zero-fixed


Taxonomy of viz types

Ordinal-ordinal
Ordinal-quantitative
Quantitative-quantitative


Tour of Visualization Zoo

Stacked graphs
Small multiples
Horizon graph
Stem & leaf plot
Scatterplot Matrix
Parallel coordinates
Flow / Sankey diagram
Choropleth map
Graduated symbol map


Visualization Tools

Chart typologies: e.g. Excel
Visual analysis grammars: ggplot, Vega, tableau
visualization grammars: D3
graphics APIs: processing, openGL


Visualization Grammar

Data: Input data to visualize
(Data) Transforms: grouping, binning, stats, (later: projection, layout)
Marks: Data-representative graphics
Scales: Map data values to visual values
Guides: Axes & legends to show scales


Summary: Data Models

Data can be shaped and transformed into types
The type of the data partially determines acceptable format of the visualization
The communication goals determine the data and its type


Summary: Visualization Tools

Grammar of graphics defines a modular and scalable way to create expressive graphics
Different tools are designed for different uses (e.g. are you doing exploratory or explanatory vis?)


Exploratory Data Analysis


EDA is an approach/philosophy for data analysis using graphical methods to

uncover underlying structure
detect outliers and anomalies
test underlying assumptions


EDA was introduced by Tukey (statistician) with new techniques for visualizing and summarizing data:

5-number summary
box plots (visual 5-number)
stem & leaf diagrams


Difference from classical analysis

Exploratory data analysis ~ detective work ~ gathering evidence
Confirmatory data analysis ~ court trial ~ evaluating evidence


Iterative Hypotheses Refinement during EDA:

Formulate > support hypotheses to later confirm them, or refute them, or drop them > repeat.


Characteristics of exploratory graphs

made quickly
a large number are made
goal is for personal understanding
axes/legends are generally cleaned up (later)
color/size are primarily used for information


Overview of process steps

import/clean
single variable exploration
pair-wise exploration
multivariate analysis


Visual Encoding


Marks = basic graphical element in image

Point (0-dimensions)
Line (1-dimensional)
Area (2-dimensional)


Channels = ways to control appearance of marks

position
shape
color
tilt
size


Visual Encoding = Mapping data to visual variables

Assign data fields (Q, O, C) to visual channels (x, y, color, size, etc.) for a graphical mark (point, bar, line, etc.)
Also, choose encoding parameters (log scale, sorting, etc.) and data transformations (bin, aggregate, etc.)
To maximize expressiveness and effectiveness.


Expressiveness

Tell the truth and nothing but the truth (don’t lie, and don’t lie by omission)


Effectiveness

Use encodings that people decode better (where better = faster and/or more accurate)


Evaluation & Perception


Evaluation

Four-level Evaluation Framework by Munzner

Domain situation
Data/task abstraction
Vis encoding/interaction idiom
Algorithm


Purpose of levels: separately analyze whether goals of each level has been met: with a poor choice in the abstraction stage, even perfect choices at the idiom and algorithm levels will not result in a vis that solves the problem.
Domain: the field of interest of the target users of a vis tool.
Task Abstraction: questions from different domains can map to the same abstract vis tasks, e.g. browsing, comparing, summarizing.
Two approaches for vis design

problem-driven work: start at understanding domain and then identify appropriate abstraction
technique-driven work: invent new algorithms or idioms and then identify domains in which they would be useful


Perception

Pre-attentive attributes = Visual attributes whose detection precedes conscious attention

Form: length, width, orientation, size, shape
Color: hue, intensity
spatial grouping, motion


Perception is always relative

Steven’s law: “doubling the physical brightness results in a perception that is considerably less than twice as bright”

(Definition: difference between actual change in a physical stimulus and the perceived change follows a power function: S = I^n, where n ranges from the sublinear 0.5 for brightness to the superlinear 3.5 for electric current.)


Weber’s law = “the amount of length difference we can detect is a percentage of the object’s length.”

(Definition: the detectable difference in stimulus intensity I is a fixed percentage K of the object magnitude δI/I = K)


Gestalt (from German “form”): patterns that transcend the stimuli used to create them.

Proximity
Similarity
Continuity
Connectedness


Color

Equal distances in digital color space may not be perceived as equidistant colors.
Color choice has accuracy, readability, and emotional impact on your visualization.


Summary: Perception

We do not see 1:1, and we do not attend to everything that we see
We’re drawn to patterns we know and expect
Our working memory is limited
Use these techniques to direct attention in your visualizations, to build visual hierarchy


Design Principles

Above all else, show the data. Maximize the data-ink ratio.
Clear labeling and explanations on the graphic.
Number of information-carrying dimensions should not exceed number of dimensions in data.
The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities measured.


Be able to critique & deconstruct a visualisation


Components: Data types, data transformations, marks, channels, scales
Communication Goals: the underlying data/information priority intended by the vis author, and the corresponding visual hierarchy