Skip to content

Instantly share code, notes, and snippets.

@tomires
Last active June 10, 2018 22:32
Show Gist options
  • Save tomires/bfb9dddb8c7ff16af0625edf44c9f980 to your computer and use it in GitHub Desktop.
Save tomires/bfb9dddb8c7ff16af0625edf44c9f980 to your computer and use it in GitHub Desktop.
VIZ

1 - introduction to visualization

  • visualization - use of computer-supported and interactive visual representations of data to amplify cognition

  • significant chunk of visual information processing occurs at the pre-attentive level (ex. popout)

  • visualization pipeline

    • data
    • enrichment - interpolating/approximating raw data, thereby creating a model
      • interpolation or approximation
    • filtering - choosing portion of data we want to analyze
      • remove irrelevant data and outliers (portion caused by measurement error), smooth data
    • mapping - data onto visual parameters
      • arrows, glyphs, colors, trees, ...
    • rendering - creating an image
      • 2D/3D, problem with interactivity
  • interactivity is need to overcome limitations of computer, humans, displays

  • kinds of visualization

    • scientific visualization
      • visualization of data with spatial attributes (coordinates)
      • ex. map plots
    • information visualization
      • visualization of abstract data structures

2 - data and task classification

  • dataset types

    • spatial data
      • spatial fields
        • grid (vertices, edges, cells between edges), attributes (vertices and cells can both contain several values)
        • grid tells us form which samples to interpolate
      • geometry
        • vertices, edges (contain data attributes), faces
    • abstract data
      • tabular data (items, attributes, cells containing value of attribute for item)
      • relational data (nodes, relations, attributes)
        • ex. relational database
      • text
      • interpolation and other enrichment techniques do not work!
    • mix of both
      • geographical data
        • geometry + abstract data (ex. population)
        • we can combine both in our queries
  • ordering direction

    • sequential (height)
    • diverging (temperature)
    • cyclic (hours)

3 - visual encoding of attributes

  • marks

    • points, lines, areas
  • visual channels (position rocks because it can be used with all types of abstract data)

    • nominal
      • spatial region
      • shape
    • ordered
      • color hue / saturation / lightness
      • length / area / volume
      • angle
    • grouping
      • gestalt grouping
        • containment
        • similarity
        • connection
        • proximity
  • some visual channels are not completely independent from one another (width x height, hue x saturation, shape x size)

  • popout

    • is pre-attentive
    • reduces cognitive load, no need for focus, fast (~ 200ms)
  • size

    • length is perceptionally accurate, area somewhat accurate, volumne inaccurate
  • orientation

    • can be used for ordered attributes
    • accuracy of perception isn't uniform (acute angles)
  • shape

    • high discriminality
    • can only be used for categorical attributes
  • color

    • use hue for discriminality, saturation and luminance for ordering
    • perception is relative (it depends on surrounding colors)
    • color blindness
  • grouping (ascending order of magnitude)

    • similarity
    • connectedness
    • containment
  • 3D challenges

    • depth perception is poor (stereoscopic 3D / VR as a solution?)
    • occlusion (interaction is required)
    • perspective distortion
    • shading interferes with color channel

4 - interaction in visualization

  • methods of interaction

    • changing/transforming data
    • changing visualization technique
    • changing data enrichment
    • modifying the filter
    • changing mapping to graphical elements
  • brushing

    • selecting subset of data items with input device to emphasize them
  • linking

    • highlight brushed data items in different views or partites of visualization
  • rearrangment and sorting (ex. parallel coordinates)

  • navigation

    • overview / detail (ex. minimap)
      • one detailed view, one complete view
      • two separate views, spatial separation
    • pan and zoom
      • infinite plane, allow the user to move and pan
      • temporal separation - the user needs to remember certain information
    • focus+context (ex. fish eye)
      • display most data with less details and small portion of data with a lot of detail in same view
      • deformation
  • reduction of data

    • filtering
      • dynamic queries
        • deliver continuous updates
        • low latency
        • easy to use, visualizes bounds
        • allow the user to change query by moving sliders and other basic UI elements
    • aggregation
      • binning
        • divide range of attributes into bins
        • count number of items in each bin and map the number to a channel
      • clustering
        • group data items based on similarity
        • calculate average and map the result onto channels
  • reduction of attributes

    • filtering - remove the attribute altogether
    • aggregation - via dimension reduction
  • placement of multiple views

    • juxtaposition
      • side by side
      • requires brushing/linking or coordination
      • large number of views - small size
    • superimposition
      • position views on top of one another
    • embedding
      • embed one view into another (ex. focus+context)

5 - visualization of scalar fields

  • colormap

    • changes in value are perceived uniformly across the colormap
    • map implies correct ordering
    • it should work in grayscale and for color blind people
    • colors should be selected intuitively (water - blue, terrain - green,...)
    • allow for inversion of mapping
  • contour line

    • all points in a dataset that have the same scalar value
    • boundaries between regions
    • represented by curves in 2D (isolines), surfaces in 3D (isosurfaces)
    • contours cannot intersect
    • distance between two contours indicates magnitude of gradient (speed of change in data)
  • marching squares/cubes

    • 2^|F| ways to divice the cell

6 - visualization of volumetric data

  • volumetric data

    • spatial field
    • grid is in 3D
  • texture based volume rendering

    • use planes with 2D textures
    • slicing plane switching needed when changing viewpoints

7 - visualization of vector fields

  • data enrichment via bilinear/trilinear interpolation

  • glyphs

    • displayed at sampling points
    • direction mapped on orientation of arrow
    • magnitude mapped on length / color
    • challenges
      • overlapping
      • in 3D occlusion, direction interpretation ambiguity
        • shading, more complex objects
      • we as humans suck at interpolating glyphs
  • alleviate occlusion by subsampling, opacity

  • stream objects

    • choose seed points and trace them in field for a number of steps
    • visualize trajectories using vectors
  • stream ribbon

    • color mapping vortacity (tendency of something to rotate, local spinning motion)

8 - tabular data

  • tabular data

    • rows - items
    • columns - attributes
    • cells - scalar values
  • attribute types

    • nominal (categories)
    • ordinal (S, M, L) - not measurable intervals
    • quantitative - we can do arithmetics
      • discrete, continuous
  • abstract data

    • no spatial coordinates at which data was measured
    • impossible to do data enrichment, no relation between data
  • axis layout

    • orthogonal
    • non-orthogonal ("basis" vectors are not lin. independent -> hard to interpret imo)
  • glyphs

    • geometry that changes shape with data
  • identification tasks

    • identify attribute (range, distribution, outliers, value for given item)
    • identify item (for given attribute)
    • identify attributes (is there correlation between the attributes? clustering)
  • techniques

    • 2 attributes - scatterplot
    • 3 attributes - 3D scatterplot
      • stereoscopy, VR, rotation
    • 4-5 attributes - colour, shapes in addition to 3 spatial dimensions
  • interaction

    • data manipulation
      • selection, view transformation
    • data reduction
      • filtering, clustering
    • view organization
      • juxtaposition, brushing, inking
  • faceting

    • visualising every combination of attributes
    • allows us to spot correlation between attributes
    • cluster identification
      • brushing - selection of a subset of data using input devices (emphasising or deemphising it)
      • linking - selecting an item across multiple plots highlighting each of the item's attributes
  • parallel coordinates

    • place axis parallel and join the dots
    • can be used to identify correlation between neighbouring attributes
    • hierarchical approach - organize data into clusters and ignore outliers -> visual clarity on upper layers
    • pipeline: data -> binning -> outlier detection -> trend mapping (ignore outliers) -> graph (combined with interaction and outliers)
  • star glyphs

    • similar to parallel coords but mapped onto a polyline
    • attributes are spaced out at equal angles around a circle
    • saves space
      • less screen space for items closer to centre
    • can be projected into scatterplot to map 5+-dimensional data
  • star coordinates

    • distribute vectors evenly on unit circle
      • "base" vectors are not lin. independent, but we still project into 2D space -> fuck linear algebra -> leads to ambiguity
    • create points via linear combination of attributes
    • apparently letting the user decide on the orientation of "base" vectors can help with interpreting such abomination
  • bargrams

    • works with nominal, ordinal attributes
    • proportion of each category is mapped onto length of line
    • parallel set
      • enhancement of bargrams
      • visualizes relations between attributes
      • interaction
        • reordering categories, brushing
      • bundled layout
        • we only connect neighbouring categories

9 - relational data

  • we introduce relations between tables

    • relation - subset of cartesian product, can be unary or binary
  • attributes can be stored in nodes as well as links (we do not have to do M:N decomposition as in rel. DB)

  • data is typically abstract preventing us from doing data enrichment

  • encoding attributes

    • of nodes - shape, color, size
    • of relations - width, color
  • visualization tasks

    • all the tasks discussed prior
    • node incidence
    • shortest path
  • treemap

    • using containment to encode hierarchical relations
    • makes use of only one attribute (size of files)
    • recursive space dividing technique that alternates axes based on tree depth
    • we also project depth into color of squares
    • other examples - stock market divided into industries and then companies, tasks
    • treemap gives us an overview of the entire hierarchy
  • A E S T H E T I C S

    • minimize number of crossings
    • minimize area
    • minimize aspect ratio
    • angular resolution between edges incident to a node
    • edge length (total, maximum, uniform)
    • bends
    • symmetry

10 - big data

  • big data

    • high velocity, volume, variety
    • ex. NSA
      • 300m US citizens
      • metadata, calls, texts, surveillence images,...
    • veracity - possibility of including shit data (fake profiles on FB)
  • visual analytics

    • combination of automated analysis techniques with interactive visualization
      • to make sense of large and complex datasets
    • visualization + data mining + interaction
    • making use of full perceptual and cognitive abilities during analysis
      • quick informed decisions from people who aren't experts on data mining / visualization
  • conceptual challenges

    • we cannot use standard visualization techiques
    • heavy use of binning and clustering
      • ex. earthquakes, density visualization (M25 traffic accidents)
  • clustering

    • grouping a set of objects in such a way that objects in these groups are more similar to one another than to objects outside
  • data mining

    • process of extraction of interesting patterns from huge amounts of data
    • regression for data enrichment
    • clustering for simplification
    • box plots for statistic analysis
    • outlier detection for detecting anomalies
    • classificiation using ANN

11 - text visualization

  • issues

    • subtle
    • abstract
    • meaning ambigiousness
    • context
  • analysis levels

    • lexical - strings
    • syntactic - word types
    • semantic - meanings
  • visualizing text

    • understanding
    • grouping for future classification
    • comparison (git diffs)
    • correlation (detecting plagiarism)
  • word clouds

    • frequency analysis
  • judging relative word importance in document

    • tf * log(N / df)
    • tf - term frequency
    • df - documents including the word
    • N - total number of documents

12 - visualization of geographical data

  • geographical data structure
    • geometry stored as vector or raster representation
    • non-spatial attributes

13 - visualization of time-oriented data

  • 4D - 3 spatial dimensions + 1 temporal dimension

  • time is unidirectional (we cannot go back)

  • time-oriented data

    • temporal aspect is of interest to us
      • ordinal / discrete / continuous
    • problem with granularity
    • ex. gantt chart
  • arrangements

    • linear or cyclic
  • point in time vs interval

  • mapping of time

    • static - map onto spatial dimension
      • ThemeRiver
    • dynamic - create an animation (series of views)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment