tomires/viz.md

## viz.md

      
    Raw
  

              viz.md
            
          
    1 - introduction to visualization


visualization - use of computer-supported and interactive visual representations of data to amplify cognition


significant chunk of visual information processing occurs at the pre-attentive level (ex. popout)


visualization pipeline

data
enrichment - interpolating/approximating raw data, thereby creating a model

interpolation or approximation


filtering - choosing portion of data we want to analyze

remove irrelevant data and outliers (portion caused by measurement error), smooth data


mapping - data onto visual parameters

arrows, glyphs, colors, trees, ...


rendering - creating an image

2D/3D, problem with interactivity


interactivity is need to overcome limitations of computer, humans, displays


kinds of visualization

scientific visualization

visualization of data with spatial attributes (coordinates)
ex. map plots


information visualization

visualization of abstract data structures


2 - data and task classification


dataset types

spatial data

spatial fields

grid (vertices, edges, cells between edges), attributes (vertices and cells can both contain several values)
grid tells us form which samples to interpolate


geometry

vertices, edges (contain data attributes), faces


abstract data

tabular data (items, attributes, cells containing value of attribute for item)
relational data (nodes, relations, attributes)

ex. relational database


text
interpolation and other enrichment techniques do not work!


mix of both

geographical data

geometry + abstract data (ex. population)
we can combine both in our queries


ordering direction

sequential (height)
diverging (temperature)
cyclic (hours)


3 - visual encoding of attributes


marks

points, lines, areas


visual channels (position rocks because it can be used with all types of abstract data)

nominal

spatial region
shape


ordered

color hue / saturation / lightness
length / area / volume
angle


grouping

gestalt grouping

containment
similarity
connection
proximity


some visual channels are not completely independent from one another (width x height, hue x saturation, shape x size)


popout

is pre-attentive
reduces cognitive load, no need for focus, fast (~ 200ms)


size

length is perceptionally accurate, area somewhat accurate, volumne inaccurate


orientation

can be used for ordered attributes
accuracy of perception isn't uniform (acute angles)


shape

high discriminality
can only be used for categorical attributes


color

use hue for discriminality, saturation and luminance for ordering
perception is relative (it depends on surrounding colors)
color blindness


grouping (ascending order of magnitude)

similarity
connectedness
containment


3D challenges

depth perception is poor (stereoscopic 3D / VR as a solution?)
occlusion (interaction is required)
perspective distortion
shading interferes with color channel


4 - interaction in visualization


methods of interaction

changing/transforming data
changing visualization technique
changing data enrichment
modifying the filter
changing mapping to graphical elements


brushing

selecting subset of data items with input device to emphasize them


linking

highlight brushed data items in different views or partites of visualization


rearrangment and sorting (ex. parallel coordinates)


navigation

overview / detail (ex. minimap)

one detailed view, one complete view
two separate views, spatial separation


pan and zoom

infinite plane, allow the user to move and pan
temporal separation - the user needs to remember certain information


focus+context (ex. fish eye)

display most data with less details and small portion of data with a lot of detail in same view
deformation


reduction of data

filtering

dynamic queries

deliver continuous updates
low latency
easy to use, visualizes bounds
allow the user to change query by moving sliders and other basic UI elements


aggregation

binning

divide range of attributes into bins
count number of items in each bin and map the number to a channel


clustering

group data items based on similarity
calculate average and map the result onto channels


reduction of attributes

filtering - remove the attribute altogether
aggregation - via dimension reduction


placement of multiple views

juxtaposition

side by side
requires brushing/linking or coordination
large number of views - small size


superimposition

position views on top of one another


embedding

embed one view into another (ex. focus+context)


5 - visualization of scalar fields


colormap

changes in value are perceived uniformly across the colormap
map implies correct ordering
it should work in grayscale and for color blind people
colors should be selected intuitively (water - blue, terrain - green,...)
allow for inversion of mapping


contour line

all points in a dataset that have the same scalar value
boundaries between regions
represented by curves in 2D (isolines), surfaces in 3D (isosurfaces)
contours cannot intersect
distance between two contours indicates magnitude of gradient (speed of change in data)


marching squares/cubes

2^|F| ways to divice the cell


6 - visualization of volumetric data


volumetric data

spatial field
grid is in 3D


texture based volume rendering

use planes with 2D textures
slicing plane switching needed when changing viewpoints


7 - visualization of vector fields


data enrichment via bilinear/trilinear interpolation


glyphs

displayed at sampling points
direction mapped on orientation of arrow
magnitude mapped on length / color
challenges

overlapping
in 3D occlusion, direction interpretation ambiguity

shading, more complex objects


we as humans suck at interpolating glyphs


alleviate occlusion by subsampling, opacity


stream objects

choose seed points and trace them in field for a number of steps
visualize trajectories using vectors


stream ribbon

color mapping vortacity (tendency of something to rotate, local spinning motion)


8 - tabular data


tabular data

rows - items
columns - attributes
cells - scalar values


attribute types

nominal (categories)
ordinal (S, M, L) - not measurable intervals
quantitative - we can do arithmetics

discrete, continuous


abstract data

no spatial coordinates at which data was measured
impossible to do data enrichment, no relation between data


axis layout

orthogonal
non-orthogonal ("basis" vectors are not lin. independent -> hard to interpret imo)


glyphs

geometry that changes shape with data


identification tasks

identify attribute (range, distribution, outliers, value for given item)
identify item (for given attribute)
identify attributes (is there correlation between the attributes? clustering)


techniques

2 attributes - scatterplot
3 attributes - 3D scatterplot

stereoscopy, VR, rotation


4-5 attributes - colour, shapes in addition to 3 spatial dimensions


interaction

data manipulation

selection, view transformation


data reduction

filtering, clustering


view organization

juxtaposition, brushing, inking


faceting

visualising every combination of attributes
allows us to spot correlation between attributes
cluster identification

brushing - selection of a subset of data using input devices (emphasising or deemphising it)
linking - selecting an item across multiple plots highlighting each of the item's attributes


parallel coordinates

place axis parallel and join the dots
can be used to identify correlation between neighbouring attributes
hierarchical approach - organize data into clusters and ignore outliers -> visual clarity on upper layers
pipeline: data -> binning -> outlier detection -> trend mapping (ignore outliers) -> graph (combined with interaction and outliers)


star glyphs

similar to parallel coords but mapped onto a polyline
attributes are spaced out at equal angles around a circle
saves space

less screen space for items closer to centre


can be projected into scatterplot to map 5+-dimensional data


star coordinates

distribute vectors evenly on unit circle

"base" vectors are not lin. independent, but we still project into 2D space -> fuck linear algebra -> leads to ambiguity


create points via linear combination of attributes
apparently letting the user decide on the orientation of "base" vectors can help with interpreting such abomination


bargrams

works with nominal, ordinal attributes
proportion of each category is mapped onto length of line
parallel set

enhancement of bargrams
visualizes relations between attributes
interaction

reordering categories, brushing


bundled layout

we only connect neighbouring categories


9 - relational data


we introduce relations between tables

relation - subset of cartesian product, can be unary or binary


attributes can be stored in nodes as well as links (we do not have to do M:N decomposition as in rel. DB)


data is typically abstract preventing us from doing data enrichment


encoding attributes

of nodes - shape, color, size
of relations - width, color


visualization tasks

all the tasks discussed prior
node incidence
shortest path


treemap

using containment to encode hierarchical relations
makes use of only one attribute (size of files)
recursive space dividing technique that alternates axes based on tree depth
we also project depth into color of squares
other examples - stock market divided into industries and then companies, tasks
treemap gives us an overview of the entire hierarchy


A E S T H E T I C S

minimize number of crossings
minimize area
minimize aspect ratio
angular resolution between edges incident to a node
edge length (total, maximum, uniform)
bends
symmetry


10 - big data


big data

high velocity, volume, variety
ex. NSA

300m US citizens
metadata, calls, texts, surveillence images,...


veracity - possibility of including shit data (fake profiles on FB)


visual analytics

combination of automated analysis techniques with interactive visualization

to make sense of large and complex datasets


visualization + data mining + interaction
making use of full perceptual and cognitive abilities during analysis

quick informed decisions from people who aren't experts on data mining / visualization


conceptual challenges

we cannot use standard visualization techiques
heavy use of binning and clustering

ex. earthquakes, density visualization (M25 traffic accidents)


clustering

grouping a set of objects in such a way that objects in these groups are more similar to one another than to objects outside


data mining

process of extraction of interesting patterns from huge amounts of data
regression for data enrichment
clustering for simplification
box plots for statistic analysis
outlier detection for detecting anomalies
classificiation using ANN


11 - text visualization


issues

subtle
abstract
meaning ambigiousness
context


analysis levels

lexical - strings
syntactic - word types
semantic - meanings


visualizing text

understanding
grouping for future classification
comparison (git diffs)
correlation (detecting plagiarism)


word clouds

frequency analysis


judging relative word importance in document

tf * log(N / df)
tf - term frequency
df - documents including the word
N - total number of documents


12 - visualization of geographical data


geographical data structure

geometry stored as vector or raster representation
non-spatial attributes


13 - visualization of time-oriented data


4D - 3 spatial dimensions + 1 temporal dimension


time is unidirectional (we cannot go back)


time-oriented data

temporal aspect is of interest to us

ordinal / discrete / continuous


problem with granularity
ex. gantt chart


arrangements

linear or cyclic


point in time vs interval


mapping of time

static - map onto spatial dimension

ThemeRiver


dynamic - create an animation (series of views)