Skip to content

Instantly share code, notes, and snippets.

@agness
Last active October 13, 2018 18:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save agness/2c61e77213f7dfe8e53addc0ba4abdf7 to your computer and use it in GitHub Desktop.
Save agness/2c61e77213f7dfe8e53addc0ba4abdf7 to your computer and use it in GitHub Desktop.

Why Visualize?

  • The goals of visualization
    • Exploratory: to uncover a relationship in the data, to analyze data
    • Explanatory: to communicate a relationship in the data, to present data
  • Anscombe’s Quartet
    • Datasets that have identical summary statistics can appear very different when graphed.

Designing

  • Divergent vs. convergent idea generation
  • Tufte Challenger Criticisms
    • Disappearing Legend: part of the chart (legend) was on previous slide
    • Chartjunk: Good design brings absolute attention to the data (depicting rocket shapes prominently misplaces priorities)
    • Lack of Clarity in Depicting Cause and Effect: O-ring damage depicted as scattered location data when a summary “% damage” would be clearer
    • Wrong Order: ordering by sequence of launch conceals the possible relationship between O-ring damage and temperature
  • How we see
    • We don’t view in a fixed order
    • We see first what stands out
    • We see only a few things at once
    • We seek meaning and make connections
    • We rely on conventions and metaphors
  • Ingredients of a good critique
    • Encourage the designer to clarify their thought process.
    • Challenge the designer's assumptions.
    • Encourage the designer to consider alternative perspectives.
    • Encourage the designer to spell out the implications and consequences of their design.
  • How to practice design
    • Define the goal of the graphic
      • What story you want to tell
      • The key points to be made
      • What readers will be able to understand/accomplish with the viz
    • Diversify
      • Gather inspiration
      • Sketch or storyboard ideas
    • Refine
      • Synthesize best ideas in prioritized order
      • Move design to the computer and complete it

Data Models

  • Data types (Q, O, C) are semantic models of data
  • Data model vs. Conceptual model
    • Data models are formal descriptions: Math: sets with operations on them
    • Conceptual models are mental constructions: Include semantics and support reasoning
    • Examples data vs. conceptual
      • 1D floats vs. temperatures
      • 3D vector of floats vs. spatial location
  • Data types (Q ⊃ O ⊃ C)
    • C: Categorical
    • O: Ordinal
    • Q-interval: location of zero not important
    • Q-ratio: zero-fixed
  • Taxonomy of viz types
    • Ordinal-ordinal
    • Ordinal-quantitative
    • Quantitative-quantitative
  • Tour of Visualization Zoo
    • Stacked graphs
    • Small multiples
    • Horizon graph
    • Stem & leaf plot
    • Scatterplot Matrix
    • Parallel coordinates
    • Flow / Sankey diagram
    • Choropleth map
    • Graduated symbol map
  • Visualization Tools
    • Chart typologies: e.g. Excel
    • Visual analysis grammars: ggplot, Vega, tableau
    • visualization grammars: D3
    • graphics APIs: processing, openGL
  • Visualization Grammar
    • Data: Input data to visualize
    • (Data) Transforms: grouping, binning, stats, (later: projection, layout)
    • Marks: Data-representative graphics
    • Scales: Map data values to visual values
    • Guides: Axes & legends to show scales
  • Summary: Data Models
    • Data can be shaped and transformed into types
    • The type of the data partially determines acceptable format of the visualization
    • The communication goals determine the data and its type
  • Summary: Visualization Tools
    • Grammar of graphics defines a modular and scalable way to create expressive graphics
    • Different tools are designed for different uses (e.g. are you doing exploratory or explanatory vis?)

Exploratory Data Analysis

  • EDA is an approach/philosophy for data analysis using graphical methods to
    • uncover underlying structure
    • detect outliers and anomalies
    • test underlying assumptions
  • EDA was introduced by Tukey (statistician) with new techniques for visualizing and summarizing data:
    • 5-number summary
    • box plots (visual 5-number)
    • stem & leaf diagrams
  • Difference from classical analysis
    • Exploratory data analysis ~ detective work ~ gathering evidence
    • Confirmatory data analysis ~ court trial ~ evaluating evidence
  • Iterative Hypotheses Refinement during EDA:
    • Formulate > support hypotheses to later confirm them, or refute them, or drop them > repeat.
  • Characteristics of exploratory graphs
    • made quickly
    • a large number are made
    • goal is for personal understanding
    • axes/legends are generally cleaned up (later)
    • color/size are primarily used for information
  • Overview of process steps
    • import/clean
    • single variable exploration
    • pair-wise exploration
    • multivariate analysis

Visual Encoding

  • Marks = basic graphical element in image
    • Point (0-dimensions)
    • Line (1-dimensional)
    • Area (2-dimensional)
  • Channels = ways to control appearance of marks
    • position
    • shape
    • color
    • tilt
    • size
  • Visual Encoding = Mapping data to visual variables
    • Assign data fields (Q, O, C) to visual channels (x, y, color, size, etc.) for a graphical mark (point, bar, line, etc.)
    • Also, choose encoding parameters (log scale, sorting, etc.) and data transformations (bin, aggregate, etc.)
    • To maximize expressiveness and effectiveness.
  • Expressiveness
    • Tell the truth and nothing but the truth (don’t lie, and don’t lie by omission)
  • Effectiveness
    • Use encodings that people decode better (where better = faster and/or more accurate)

Evaluation & Perception

  • Evaluation
    • Four-level Evaluation Framework by Munzner
      • Domain situation
      • Data/task abstraction
      • Vis encoding/interaction idiom
      • Algorithm
    • Purpose of levels: separately analyze whether goals of each level has been met: with a poor choice in the abstraction stage, even perfect choices at the idiom and algorithm levels will not result in a vis that solves the problem.
    • Domain: the field of interest of the target users of a vis tool.
    • Task Abstraction: questions from different domains can map to the same abstract vis tasks, e.g. browsing, comparing, summarizing.
    • Two approaches for vis design
      • problem-driven work: start at understanding domain and then identify appropriate abstraction
      • technique-driven work: invent new algorithms or idioms and then identify domains in which they would be useful
  • Perception
    • Pre-attentive attributes = Visual attributes whose detection precedes conscious attention
      • Form: length, width, orientation, size, shape
      • Color: hue, intensity
      • spatial grouping, motion
    • Perception is always relative
      • Steven’s law: “doubling the physical brightness results in a perception that is considerably less than twice as bright”
        • (Definition: difference between actual change in a physical stimulus and the perceived change follows a power function: S = I^n, where n ranges from the sublinear 0.5 for brightness to the superlinear 3.5 for electric current.)
      • Weber’s law = “the amount of length difference we can detect is a percentage of the object’s length.”
        • (Definition: the detectable difference in stimulus intensity I is a fixed percentage K of the object magnitude δI/I = K)
    • Gestalt (from German “form”): patterns that transcend the stimuli used to create them.
      • Proximity
      • Similarity
      • Continuity
      • Connectedness
    • Color
      • Equal distances in digital color space may not be perceived as equidistant colors.
      • Color choice has accuracy, readability, and emotional impact on your visualization.
    • Summary: Perception
      • We do not see 1:1, and we do not attend to everything that we see
      • We’re drawn to patterns we know and expect
      • Our working memory is limited
      • Use these techniques to direct attention in your visualizations, to build visual hierarchy
  • Design Principles
    • Above all else, show the data. Maximize the data-ink ratio.
    • Clear labeling and explanations on the graphic.
    • Number of information-carrying dimensions should not exceed number of dimensions in data.
    • The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities measured.

Be able to critique & deconstruct a visualisation

  • Components: Data types, data transformations, marks, channels, scales
  • Communication Goals: the underlying data/information priority intended by the vis author, and the corresponding visual hierarchy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment