- The goals of visualization
- Exploratory: to uncover a relationship in the data, to analyze data
- Explanatory: to communicate a relationship in the data, to present data
- Anscombe’s Quartet
- Datasets that have identical summary statistics can appear very different when graphed.
- Divergent vs. convergent idea generation
- Tufte Challenger Criticisms
- Disappearing Legend: part of the chart (legend) was on previous slide
- Chartjunk: Good design brings absolute attention to the data (depicting rocket shapes prominently misplaces priorities)
- Lack of Clarity in Depicting Cause and Effect: O-ring damage depicted as scattered location data when a summary “% damage” would be clearer
- Wrong Order: ordering by sequence of launch conceals the possible relationship between O-ring damage and temperature
- How we see
- We don’t view in a fixed order
- We see first what stands out
- We see only a few things at once
- We seek meaning and make connections
- We rely on conventions and metaphors
- Ingredients of a good critique
- Encourage the designer to clarify their thought process.
- Challenge the designer's assumptions.
- Encourage the designer to consider alternative perspectives.
- Encourage the designer to spell out the implications and consequences of their design.
- How to practice design
- Define the goal of the graphic
- What story you want to tell
- The key points to be made
- What readers will be able to understand/accomplish with the viz
- Diversify
- Gather inspiration
- Sketch or storyboard ideas
- Refine
- Synthesize best ideas in prioritized order
- Move design to the computer and complete it
- Define the goal of the graphic
- Data types (Q, O, C) are semantic models of data
- Data model vs. Conceptual model
- Data models are formal descriptions: Math: sets with operations on them
- Conceptual models are mental constructions: Include semantics and support reasoning
- Examples data vs. conceptual
- 1D floats vs. temperatures
- 3D vector of floats vs. spatial location
- Data types (Q ⊃ O ⊃ C)
- C: Categorical
- O: Ordinal
- Q-interval: location of zero not important
- Q-ratio: zero-fixed
- Taxonomy of viz types
- Ordinal-ordinal
- Ordinal-quantitative
- Quantitative-quantitative
- Tour of Visualization Zoo
- Stacked graphs
- Small multiples
- Horizon graph
- Stem & leaf plot
- Scatterplot Matrix
- Parallel coordinates
- Flow / Sankey diagram
- Choropleth map
- Graduated symbol map
- Visualization Tools
- Chart typologies: e.g. Excel
- Visual analysis grammars: ggplot, Vega, tableau
- visualization grammars: D3
- graphics APIs: processing, openGL
- Visualization Grammar
- Data: Input data to visualize
- (Data) Transforms: grouping, binning, stats, (later: projection, layout)
- Marks: Data-representative graphics
- Scales: Map data values to visual values
- Guides: Axes & legends to show scales
- Summary: Data Models
- Data can be shaped and transformed into types
- The type of the data partially determines acceptable format of the visualization
- The communication goals determine the data and its type
- Summary: Visualization Tools
- Grammar of graphics defines a modular and scalable way to create expressive graphics
- Different tools are designed for different uses (e.g. are you doing exploratory or explanatory vis?)
- EDA is an approach/philosophy for data analysis using graphical methods to
- uncover underlying structure
- detect outliers and anomalies
- test underlying assumptions
- EDA was introduced by Tukey (statistician) with new techniques for visualizing and summarizing data:
- 5-number summary
- box plots (visual 5-number)
- stem & leaf diagrams
- Difference from classical analysis
- Exploratory data analysis ~ detective work ~ gathering evidence
- Confirmatory data analysis ~ court trial ~ evaluating evidence
- Iterative Hypotheses Refinement during EDA:
- Formulate > support hypotheses to later confirm them, or refute them, or drop them > repeat.
- Characteristics of exploratory graphs
- made quickly
- a large number are made
- goal is for personal understanding
- axes/legends are generally cleaned up (later)
- color/size are primarily used for information
- Overview of process steps
- import/clean
- single variable exploration
- pair-wise exploration
- multivariate analysis
- Marks = basic graphical element in image
- Point (0-dimensions)
- Line (1-dimensional)
- Area (2-dimensional)
- Channels = ways to control appearance of marks
- position
- shape
- color
- tilt
- size
- Visual Encoding = Mapping data to visual variables
- Assign data fields (Q, O, C) to visual channels (x, y, color, size, etc.) for a graphical mark (point, bar, line, etc.)
- Also, choose encoding parameters (log scale, sorting, etc.) and data transformations (bin, aggregate, etc.)
- To maximize expressiveness and effectiveness.
- Expressiveness
- Tell the truth and nothing but the truth (don’t lie, and don’t lie by omission)
- Effectiveness
- Use encodings that people decode better (where better = faster and/or more accurate)
- Evaluation
- Four-level Evaluation Framework by Munzner
- Domain situation
- Data/task abstraction
- Vis encoding/interaction idiom
- Algorithm
- Purpose of levels: separately analyze whether goals of each level has been met: with a poor choice in the abstraction stage, even perfect choices at the idiom and algorithm levels will not result in a vis that solves the problem.
- Domain: the field of interest of the target users of a vis tool.
- Task Abstraction: questions from different domains can map to the same abstract vis tasks, e.g. browsing, comparing, summarizing.
- Two approaches for vis design
- problem-driven work: start at understanding domain and then identify appropriate abstraction
- technique-driven work: invent new algorithms or idioms and then identify domains in which they would be useful
- Four-level Evaluation Framework by Munzner
- Perception
- Pre-attentive attributes = Visual attributes whose detection precedes conscious attention
- Form: length, width, orientation, size, shape
- Color: hue, intensity
- spatial grouping, motion
- Perception is always relative
- Steven’s law: “doubling the physical brightness results in a perception that is considerably less than twice as bright”
- (Definition: difference between actual change in a physical stimulus and the perceived change follows a power function: S = I^n, where n ranges from the sublinear 0.5 for brightness to the superlinear 3.5 for electric current.)
- Weber’s law = “the amount of length difference we can detect is a percentage of the object’s length.”
- (Definition: the detectable difference in stimulus intensity I is a fixed percentage K of the object magnitude δI/I = K)
- Steven’s law: “doubling the physical brightness results in a perception that is considerably less than twice as bright”
- Gestalt (from German “form”): patterns that transcend the stimuli used to create them.
- Proximity
- Similarity
- Continuity
- Connectedness
- Color
- Equal distances in digital color space may not be perceived as equidistant colors.
- Color choice has accuracy, readability, and emotional impact on your visualization.
- Summary: Perception
- We do not see 1:1, and we do not attend to everything that we see
- We’re drawn to patterns we know and expect
- Our working memory is limited
- Use these techniques to direct attention in your visualizations, to build visual hierarchy
- Pre-attentive attributes = Visual attributes whose detection precedes conscious attention
- Design Principles
- Above all else, show the data. Maximize the data-ink ratio.
- Clear labeling and explanations on the graphic.
- Number of information-carrying dimensions should not exceed number of dimensions in data.
- The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities measured.
- Components: Data types, data transformations, marks, channels, scales
- Communication Goals: the underlying data/information priority intended by the vis author, and the corresponding visual hierarchy