- The goals of visualization
- Exploratory: to uncover a relationship in the data, to analyze data
- Explanatory: to communicate a relationship in the data, to present data
- Anscombe’s Quartet
- Datasets that have identical summary statistics can appear very different when graphed.
- Divergent vs. convergent idea generation
- How to critique example: Tufte Challenger Criticisms
- Part of the chart, i.e. the legend, was on previous slide
- Chartjunk: Good design brings absolute attention to the data (rocket shapes distract when dots would suffice)
- Obscures Cause and Effect: O-ring damage depicted as scattered location data when a summary “% damage” would be clearer
- Wrong Order: ordering by sequence of launch hides relationship between damage and temperature
- How we see
- We don’t view in a fixed order
- We see first what stands out
- We see only a few things at once
- We seek meaning and make connections
- We rely on conventions and metaphors
- Ingredients of a good critique
- Encourage the designer to clarify their thought process.
- Challenge the designer's assumptions.
- Encourage the designer to consider alternative perspectives.
- Encourage the designer to spell out the implications and consequences of their design.
- How to practice design
- Define the goal of the graphic
- What story you want to tell
- The key points to be made
- What readers will be able to understand/accomplish with the viz
- Diversify
- Gather inspiration
- Sketch or storyboard ideas
- Refine
- Synthesize best ideas in prioritized order
- Move design to the computer and complete it
- Define the goal of the graphic
- Data types (Q, O, C) are semantic models of data (a.k.a. “levels of measurement” in statistics)
- Programming “Data type” vs. Our conceptual model
- Data types like int, float... are formal descriptions
- Conceptual models are mental constructions: Include semantics and support reasoning
- Examples data vs. conceptual
- 1D floats vs. temperatures
- 3D vector of floats vs. spatial location
- Data types (Q ⊃ O ⊃ C)
- C: Categorical
- O: Ordinal
- Q-interval: location of zero not important
- Q-ratio: zero-fixed
- Taxonomy of viz types
- Ordinal-ordinal
- Ordinal-quantitative
- Quantitative-quantitative
- Tour of Visualization Zoo
- Stacked / Stream graphs
- Small multiples
- Horizon graph
- Stem & leaf plot
- Scatterplot Matrix
- Parallel coordinates
- Flow / Sankey diagram
- Choropleth map
- Graduated symbol map
- Visualization Tools, listed in order of decreasing ease-of-use, and increasing expressiveness:
- Chart typologies: e.g. Excel
- Visual analysis grammars: ggplot, Vega, tableau
- visualization grammars: D3
- graphics APIs: processing, openGL
- Visualization Grammar
- Data: Input data to visualize
- (Data) Transforms: grouping, binning, statistics
- Marks: Data-representative graphics
- Scales: Map data values to visual values
- Guides: Axes & legends to show scales
- Summary: Data Models
- Data can be shaped and transformed into types
- The type of the data partially determines acceptable format of the visualization
- The communication goals determine the data and its type
- Summary: Visualization Tools
- Grammar of graphics defines a modular and scalable way to create expressive graphics
- Different tools are designed for different uses (e.g. are you doing exploratory or explanatory vis?)
- EDA is an approach/philosophy for data analysis using graphical methods to
- uncover underlying structure
- detect outliers and anomalies
- test underlying assumptions
- EDA was introduced by Tukey (statistician) with new techniques for visualizing and summarizing data:
- 5-number summary
- box plots (visual 5-number)
- stem & leaf diagrams
- Difference from classical analysis
- Exploratory data analysis ~ detective work ~ gathering evidence
- Confirmatory data analysis ~ court trial ~ evaluating evidence
- Iterative Hypotheses Refinement during EDA
- Formulate > support hypotheses to later confirm them, or refute them, or drop them > repeat.
- Characteristics of exploratory graphs
- made quickly
- a large number are made
- goal is for personal understanding
- axes/legends are generally cleaned up (later)
- color/size are primarily used for information
- Overview of process steps
- import/clean
- single variable exploration
- pair-wise exploration
- multivariate analysis
- Marks = basic graphical element in image
- Point (0-dimensions)
- Line (1-dimensional)
- Area (2-dimensional)
- Channels = ways to control appearance of marks
- position
- shape
- color
- tilt
- size
- Visual Encoding = Mapping data to visual variables
- Assign data fields (Q, O, C) to visual channels (x, y, color, size, etc.) for a graphical mark (point, bar, line, etc.)
- ...Also, choose encoding parameters (log scale, sorting, etc.) and data transformations (bin, aggregate, etc.)
- ...To maximize expressiveness and effectiveness.
- Expressiveness
- Tell the truth and nothing but the truth (don’t lie, and don’t lie by omission)
- Effectiveness
- Use encodings that people decode better (where better = faster and/or more accurate)
- Color
- Color Spaces
- RGB invented for screens, CMYK for print
- HSV stands for
- Hue: red v. blue v. Green
- Saturation: amount of color
- Value: lightness between black and white
- CIELAB designed to be perceptually uniform with regards to human vision
- Color schemes
- sequential: perceptually equidistant steps for Ordinal data
- divergent: perceivable “break” in middle of ordered range
- qualitative aka Categorical: equidistant “bag of colors”, no order
- Color Spaces
- Summary: Color
- Equal distances in digital color space may not be perceived as equidistant colors.
- Appropriate color scheme depends on
- data model (Q, O, C)
- whether a fixed point is meaningful
- whether both ends of data range are meaningful (single vs. multi-hue)
- whether there should be positive/negative associations (e.g. red v. green) or brand associations (e.g. red for Republican Party, blue for Democratic)
- Color choice has accuracy, readability, and emotional impact on your visualization.
- Gulfs of Execution and Evaluation
- The user interfaces minimize how much the user has to think to use the software. Specifically we want to minimize:
- Gulf of Execution = the difference what you want to do and what actions the software allows you to do
- Gulf of Evaluation = the amount of effort it takes you to figure out whether what the software did is what you wanted to do
- User interface ideal: “direct manipulation”
- visual representation of objects and actions (vs. command line or text-menu)
- selection by pointing (vs. typing in SQL)
- Rapid, incremental and reversible actions (easier to understand/revise allowable actions at each state; gulf of execution)
- Immediate and continuous display of results (so user knows what system is doing at each state; gulf of evaluation)
- The user interfaces minimize how much the user has to think to use the software. Specifically we want to minimize:
- Taxonomy of Interactions
- Data and View Specification
- Visualize, Filter, Sort, Derive
- View Manipulation
- Select, Navigate, Coordinate, Organize
- Process and Provenance
- Record, Annotate, Share, Guide
- Data and View Specification
- Geometric vs. Semantic zoom
- Geometric zoom does not give new information at different zoom levels
- Semantic zoom yields more detailed information at deeper zoom levels (zooms into the information hierarchy as well as visually zooming)
- Selection Methods
- Point Selection
- Mouse Hover
- Click / Touch / Tap
- Brushing = Region Selection
- Rubber-band (rectangular) or Lasso (freehand)
- Area cursors
- Bubble Cursor, Voronoi selection
- Point Selection
- Cross-filtering = Brushing, Linking & Highlighting
- Brush to choose a subgroup of the data points. Connect data from one part of the display to another; indicate this connection visually to make selected items stand out.
- Principles of Interaction in Visualization
- Rapid, reversible feedback
- Immediate and continuous
- First given an overview, then details on demand
- Evaluation
- Four-level Evaluation Framework by Munzner
- Domain situation
- Data/task abstraction
- Vis encoding/interaction idiom
- Algorithm
- Purpose of levels: separately analyze whether goals of each level has been met: with a poor choice in the abstraction stage, even perfect choices at the idiom and algorithm levels will not result in a vis that solves the problem.
- Domain: the field of interest of the target users of a vis tool.
- Task Abstraction: questions from different domains can map to the same abstract vis tasks, e.g. browsing, comparing, summarizing.
- Two approaches for vis design
- problem-driven work: start at understanding domain and then identify appropriate abstraction
- technique-driven work: invent new algorithms or idioms and then identify domains in which they would be useful
- Four-level Evaluation Framework by Munzner
- Perception
- Pre-attentive attributes = Visual attributes whose detection precedes conscious attention
- Form: length, width, orientation, size, shape
- Color: hue, intensity
- spatial grouping, motion
- Perception is always relative
- Steven’s law: “doubling the physical brightness results in a perception that is considerably less than twice as bright”
- (Definition: difference between actual change in a physical stimulus and the perceived change follows a power function: S = I^n, where n ranges from the sublinear 0.5 for brightness to the superlinear 3.5 for electric current.)
- Weber’s law = “the amount of length difference we can detect is a percentage of the object’s length.”
- (Definition: the detectable difference in stimulus intensity I is a fixed percentage K of the object magnitude δI/I = K)
- Steven’s law: “doubling the physical brightness results in a perception that is considerably less than twice as bright”
- Gestalt (from German “form”): patterns that transcend the stimuli used to create them:
- Proximity
- Similarity
- Continuity
- Connectedness
- Summary: Perception
- We do not see 1:1, and we do not attend to everything that we see
- We’re drawn to patterns we know and expect
- Our working memory is limited
- Use these techniques to direct attention in your visualizations, to build visual hierarchy
- Pre-attentive attributes = Visual attributes whose detection precedes conscious attention
- Design Principles
- Above all else, show the data. Maximize the data-ink ratio.
- Clear labeling and explanations on the graphic.
- Number of information-carrying dimensions should not exceed number of dimensions in data.
- The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities measured.
- Be able to critique pros/cons of a visualization.
- Be able to deconstruct/reconstruct a visualization to/from a set of visual encodings.
- Components: Data types, data transformations, marks, channels, scales
- Communication Goals: the underlying data/information priority intended by the vis author, and the corresponding visual hierarchy
- For lists of definitions/taxonomies above, given a screenshot of a visualization, be able identify which of them it is.
- Be able to interpret D3 code containing basic functions on this sheet.