Skip to content

Instantly share code, notes, and snippets.

@agness
Last active October 13, 2019 21:16
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save agness/4149f069e1f2e30fe187574f97ec038c to your computer and use it in GitHub Desktop.
Save agness/4149f069e1f2e30fe187574f97ec038c to your computer and use it in GitHub Desktop.

Why Visualize?

  • The goals of visualization
    • Exploratory: to uncover a relationship in the data, to analyze data
    • Explanatory: to communicate a relationship in the data, to present data
  • Anscombe’s Quartet
    • Datasets that have identical summary statistics can appear very different when graphed.

Designing

  • Divergent vs. convergent idea generation
  • How to critique example: Tufte Challenger Criticisms
    • Part of the chart, i.e. the legend, was on previous slide
    • Chartjunk: Good design brings absolute attention to the data (rocket shapes distract when dots would suffice)
    • Obscures Cause and Effect: O-ring damage depicted as scattered location data when a summary “% damage” would be clearer
    • Wrong Order: ordering by sequence of launch hides relationship between damage and temperature
  • How we see
    • We don’t view in a fixed order
    • We see first what stands out
    • We see only a few things at once
    • We seek meaning and make connections
    • We rely on conventions and metaphors
  • Ingredients of a good critique
    • Encourage the designer to clarify their thought process.
    • Challenge the designer's assumptions.
    • Encourage the designer to consider alternative perspectives.
    • Encourage the designer to spell out the implications and consequences of their design.
  • How to practice design
    • Define the goal of the graphic
      • What story you want to tell
      • The key points to be made
      • What readers will be able to understand/accomplish with the viz
    • Diversify
      • Gather inspiration
      • Sketch or storyboard ideas
    • Refine
      • Synthesize best ideas in prioritized order
      • Move design to the computer and complete it

Data Models

  • Data types (Q, O, C) are semantic models of data (a.k.a. “levels of measurement” in statistics)
  • Programming “Data type” vs. Our conceptual model
    • Data types like int, float... are formal descriptions
    • Conceptual models are mental constructions: Include semantics and support reasoning
    • Examples data vs. conceptual
      • 1D floats vs. temperatures
      • 3D vector of floats vs. spatial location
  • Data types (Q ⊃ O ⊃ C)
    • C: Categorical
    • O: Ordinal
    • Q-interval: location of zero not important
    • Q-ratio: zero-fixed
  • Taxonomy of viz types
    • Ordinal-ordinal
    • Ordinal-quantitative
    • Quantitative-quantitative
  • Tour of Visualization Zoo
    • Stacked / Stream graphs
    • Small multiples
    • Horizon graph
    • Stem & leaf plot
    • Scatterplot Matrix
    • Parallel coordinates
    • Flow / Sankey diagram
    • Choropleth map
    • Graduated symbol map
  • Visualization Tools, listed in order of decreasing ease-of-use, and increasing expressiveness:
    • Chart typologies: e.g. Excel
    • Visual analysis grammars: ggplot, Vega, tableau
    • visualization grammars: D3
    • graphics APIs: processing, openGL
  • Visualization Grammar
    • Data: Input data to visualize
    • (Data) Transforms: grouping, binning, statistics
    • Marks: Data-representative graphics
    • Scales: Map data values to visual values
    • Guides: Axes & legends to show scales
  • Summary: Data Models
    • Data can be shaped and transformed into types
    • The type of the data partially determines acceptable format of the visualization
    • The communication goals determine the data and its type
  • Summary: Visualization Tools
    • Grammar of graphics defines a modular and scalable way to create expressive graphics
    • Different tools are designed for different uses (e.g. are you doing exploratory or explanatory vis?)

Exploratory Data Analysis

  • EDA is an approach/philosophy for data analysis using graphical methods to
    • uncover underlying structure
    • detect outliers and anomalies
    • test underlying assumptions
  • EDA was introduced by Tukey (statistician) with new techniques for visualizing and summarizing data:
    • 5-number summary
    • box plots (visual 5-number)
    • stem & leaf diagrams
  • Difference from classical analysis
    • Exploratory data analysis ~ detective work ~ gathering evidence
    • Confirmatory data analysis ~ court trial ~ evaluating evidence
  • Iterative Hypotheses Refinement during EDA
    • Formulate > support hypotheses to later confirm them, or refute them, or drop them > repeat.
  • Characteristics of exploratory graphs
    • made quickly
    • a large number are made
    • goal is for personal understanding
    • axes/legends are generally cleaned up (later)
    • color/size are primarily used for information
  • Overview of process steps
    • import/clean
    • single variable exploration
    • pair-wise exploration
    • multivariate analysis

Visual Encoding

  • Marks = basic graphical element in image
    • Point (0-dimensions)
    • Line (1-dimensional)
    • Area (2-dimensional)
  • Channels = ways to control appearance of marks
    • position
    • shape
    • color
    • tilt
    • size
  • Visual Encoding = Mapping data to visual variables
    • Assign data fields (Q, O, C) to visual channels (x, y, color, size, etc.) for a graphical mark (point, bar, line, etc.)
    • ...Also, choose encoding parameters (log scale, sorting, etc.) and data transformations (bin, aggregate, etc.)
    • ...To maximize expressiveness and effectiveness.
  • Expressiveness
    • Tell the truth and nothing but the truth (don’t lie, and don’t lie by omission)
  • Effectiveness
    • Use encodings that people decode better (where better = faster and/or more accurate)

Interaction

  • Color
    • Color Spaces
      • RGB invented for screens, CMYK for print
      • HSV stands for
        • Hue: red v. blue v. Green
        • Saturation: amount of color
        • Value: lightness between black and white
      • CIELAB designed to be perceptually uniform with regards to human vision
    • Color schemes
      • sequential: perceptually equidistant steps for Ordinal data
      • divergent: perceivable “break” in middle of ordered range
      • qualitative aka Categorical: equidistant “bag of colors”, no order
  • Summary: Color
    • Equal distances in digital color space may not be perceived as equidistant colors.
    • Appropriate color scheme depends on
      • data model (Q, O, C)
      • whether a fixed point is meaningful
      • whether both ends of data range are meaningful (single vs. multi-hue)
      • whether there should be positive/negative associations (e.g. red v. green) or brand associations (e.g. red for Republican Party, blue for Democratic)
    • Color choice has accuracy, readability, and emotional impact on your visualization.
  • Gulfs of Execution and Evaluation
    • The user interfaces minimize how much the user has to think to use the software. Specifically we want to minimize:
      • Gulf of Execution = the difference what you want to do and what actions the software allows you to do
      • Gulf of Evaluation = the amount of effort it takes you to figure out whether what the software did is what you wanted to do
    • User interface ideal: “direct manipulation”
      • visual representation of objects and actions (vs. command line or text-menu)
      • selection by pointing (vs. typing in SQL)
      • Rapid, incremental and reversible actions (easier to understand/revise allowable actions at each state; gulf of execution)
      • Immediate and continuous display of results (so user knows what system is doing at each state; gulf of evaluation)
  • Taxonomy of Interactions
    • Data and View Specification
      • Visualize, Filter, Sort, Derive
    • View Manipulation
      • Select, Navigate, Coordinate, Organize
    • Process and Provenance
      • Record, Annotate, Share, Guide
  • Geometric vs. Semantic zoom
    • Geometric zoom does not give new information at different zoom levels
    • Semantic zoom yields more detailed information at deeper zoom levels (zooms into the information hierarchy as well as visually zooming)
  • Selection Methods
    • Point Selection
      • Mouse Hover
      • Click / Touch / Tap
    • Brushing = Region Selection
      • Rubber-band (rectangular) or Lasso (freehand)
    • Area cursors
      • Bubble Cursor, Voronoi selection
  • Cross-filtering = Brushing, Linking & Highlighting
    • Brush to choose a subgroup of the data points. Connect data from one part of the display to another; indicate this connection visually to make selected items stand out.
  • Principles of Interaction in Visualization
    • Rapid, reversible feedback
    • Immediate and continuous
    • First given an overview, then details on demand

Evaluation & Perception

  • Evaluation
    • Four-level Evaluation Framework by Munzner
      • Domain situation
      • Data/task abstraction
      • Vis encoding/interaction idiom
      • Algorithm
    • Purpose of levels: separately analyze whether goals of each level has been met: with a poor choice in the abstraction stage, even perfect choices at the idiom and algorithm levels will not result in a vis that solves the problem.
    • Domain: the field of interest of the target users of a vis tool.
    • Task Abstraction: questions from different domains can map to the same abstract vis tasks, e.g. browsing, comparing, summarizing.
    • Two approaches for vis design
      • problem-driven work: start at understanding domain and then identify appropriate abstraction
      • technique-driven work: invent new algorithms or idioms and then identify domains in which they would be useful
  • Perception
    • Pre-attentive attributes = Visual attributes whose detection precedes conscious attention
      • Form: length, width, orientation, size, shape
      • Color: hue, intensity
      • spatial grouping, motion
    • Perception is always relative
      • Steven’s law: “doubling the physical brightness results in a perception that is considerably less than twice as bright”
        • (Definition: difference between actual change in a physical stimulus and the perceived change follows a power function: S = I^n, where n ranges from the sublinear 0.5 for brightness to the superlinear 3.5 for electric current.)
      • Weber’s law = “the amount of length difference we can detect is a percentage of the object’s length.”
        • (Definition: the detectable difference in stimulus intensity I is a fixed percentage K of the object magnitude δI/I = K)
    • Gestalt (from German “form”): patterns that transcend the stimuli used to create them:
      • Proximity
      • Similarity
      • Continuity
      • Connectedness
    • Summary: Perception
      • We do not see 1:1, and we do not attend to everything that we see
      • We’re drawn to patterns we know and expect
      • Our working memory is limited
      • Use these techniques to direct attention in your visualizations, to build visual hierarchy
  • Design Principles
    • Above all else, show the data. Maximize the data-ink ratio.
    • Clear labeling and explanations on the graphic.
    • Number of information-carrying dimensions should not exceed number of dimensions in data.
    • The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities measured.

Skills

  • Be able to critique pros/cons of a visualization.
  • Be able to deconstruct/reconstruct a visualization to/from a set of visual encodings.
    • Components: Data types, data transformations, marks, channels, scales
    • Communication Goals: the underlying data/information priority intended by the vis author, and the corresponding visual hierarchy
  • For lists of definitions/taxonomies above, given a screenshot of a visualization, be able identify which of them it is.
  • Be able to interpret D3 code containing basic functions on this sheet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment