joyrexus/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Proximal Action Streams

A case study in the use of proximal sensors for constructing multimodal datasets.


Converting the behavioral data streams afforded by
proximal sensors into explorable multimodal datasets
is a new challenge faced by both HCI practitioners as
well as the behavioral research community focused on
more theoretical questions. We intend to present a case
study that demonstrates a particular approach to this
problem by leveraging open web standards and lightweight
development tools for rapid prototyping.


In our submission to the workshop, we intend to describe
our development toolchain, which we've found to be
particularly appropriate for this task, and to highlight
some basic techniques for working with (viz., capturing,
filtering, synchronizing, rendering, extracting,
analyzing, and replicating) the varied behavioral
activity streams made available by proximal sensors.


Voice, image, touch, motion, orientation, and geolocation sensors are now ubiquitous. They're embedded in the communication devices we carry around in our pockets.  Complementing this already pervasive technology is a new breed of compact, powerful, low-cost motion and depth sensors.  We'll refer to the various sensors designed to gauge user behavior within close range of the human body as proximal sensors, distinguishing them from distal sensors that are designed to sense activity from a distance.
Much of the appeal of proximal sensors lies in their potential for enabling new
forms of perceptual computing and natural user interaction, especially when
used in concert with one another.  They can function as a coordinated set of input devices, each sensing one facet of the user's behavior or context, which can then in turn be rendered in an appropriate form for the user to respond to as part of a tight perceptual feedback loop.
However, designing new forms of user interaction requires a lot of empirical
groundwork: gathering observations, testing what works and what doesn't.
And therein lies a futher appeal of proximal sensors: they can function
as effective tools for gathering the very behavioral data that is being designed for and needs to be evaluated.
But while enabling new forms of behavorial data collection, integrating this
new range of data into a unified event stream synchronously with conventional A/V media recordings can be a challenge.  This is a challenge faced both by HCI practitioners and the behavioral research community focused on more theoretical questions.
We intend to present a case study that demonstrates a particular approach to this problem by leveraging open web standards and lightweight development tools for rapid prototyping.  Our approach emerged in the course of a research study focused on action gestures, described below.  With the arrival of proximal sensors and open web standards, we were able to piece together a suite of low-cost tools and techniques for producing explorable multimodal datasets.  The tools and techinques we describe should be of use to interaction researchers seeking new ways of exploring nonverbal communication and proximal action streams generally.
In next section we elaborate on just one of the motivating problems we encountered in our research study.  We aim to address this problem in some detail in our final submission.
The challenge of multi-modal event synchrony and faceted playback

For anyone looking to utilize this new breed of sensors, one particular problem is integrating their output into a unified, synchronous event stream to enable simultaneous rendering and faceted playback.
Each sensor stream captures particular behavioral features of the user/subject's activity. Enabling researchers to selectively view particular sets of features during playback (faceted playback) can be of great value during the exploratory phase of research.
For example, a researcher might want to capture gestural movements performed by
a subject with a close-range time-of-flight sensor while using an embedded video camera and mic to record their image and speech. A researcher would ideally like to view the A/V media in tight synchrony with a rendering of the captured motion data and select particular facets of the motion stream in the course of playback.  It may be useful to view finger orientation relative to the hand in one instance, but then turn off finger rendering and isolate hand movements along a particular axis to better scrutinize certain gesture transition points.
We'll describe one approach to synchronization and faceted playback with media cueing standards that handle timed metadata.  We'll also look at some emerging standards for media/data integration at the time of capture.
Research study description

The set of practices we intend to describe in our submission emerged in the
course of an experimental study focused on action gestures.  For this study we
utlized the Leap Motion Controller, a low-cost motion tracking device, to
capture gestures produced by the subjects in the study.  What follows is a
description of the research question targeted in our study.
When people describe actions in the world, such as how to tie a tie or how to assemble a piece of furniture, they produce gestures that look to the naked eye much like the actions they are describing. This observation has led many researchers to endorse the view that gesture is a kind of “simulated action”. In our first project using the LEAP we are seeking to understand precisely how action gestures are like (and unlike) the actions they represent.
Actions in the world require the finely calibrated deployment of force to move objects fluidly and safely. Do gestures somehow represent this kind of force information, too, or do they abstract it away? To address this question we are using a classic reasoning puzzle, the Tower of Hanoi, which requires the solver to move disks of different weights on to and off of pegs. Though the physical details of the puzzle (e.g. the height of the pegs, size of the disks, etc.) are irrelevant to the logical structure of the solution, people reliably encode such information in gesture when describing their solutions. Previous measurements of people’s Tower of Hanoi gestures have been relatively coarse-grained, however, limited by what features of gesture could be reliably extracted from video. By using the LEAP to track people’s gestures as they describe moving the disks, we hope to learn whether their movements differ systematically according to the weight of the disk they are describing.
In the course of this study we've developed a number of basic command-line
tools for capturing, filtering, viewing and extracting gesture data.  An outline of how we're using these tools to record gesture samples and extract position and veloctiy data can be found here.
Implications for cognitive science and gesture research

Proximal sensors are arriving at a time of surging interest in the dynamics of human movement across the cognitive sciences. Over the last decade researchers have begun to see movement in all its richness — the precise arc the hand takes during reaching, the swaying of the hips as one weighs a decision — as a special window into the moment-to-moment dynamics of cognition (Spivey, 2007). One type of movement in particular that has captured the attention of researchers is hand gestures. Video has been the traditional tool for analyzing such gestures, but researchers are beginning to run up against the limits of this technology, formulating questions which would be extremely time-consuming to answer with video data or which cannot be answered at all.
While motion-tracking technologies have been used in research for several years, their use has been prohibitive for all but the most determined and invested researchers. Aside from the obvious barrier of cost, there has been an even more daunting barrier of expertise. Extracting and analyzing data from a motion-capture system requires intensive training if not dedicated personnel. With the arrival of proximal sensors and open web standards, we have a new suite of low-cost tools and techniques for producing explorable multimodal datasets.  These datasets can be designed to enable researchers to graphically visualize different dimensions of the recorded data and to query and annotate particular portions of it. For a gesture researcher, for example, such tools might have the “look and feel” of video, with similar playback controls, while also reducing the dimensionality of the data to suit the researcher’s needs.

  
## hands.jpg

      
    Raw
  

              hands.jpg
            
          
## info.md

      
    Raw
  

              info.md
            
          
    CHI 2014 | Gesture Interaction Workshop

Gesture-Based Interaction: Communication and Cognition

CHI 2014 Workshop

Saturday, April 26th, 2014

Toronto, Canada
This workshop will discuss the implications and role of gestures in communication and cognition on interaction design where gesture is an integral part of the design. We will explore underlying relationships between gesture-based computing devices, communication, and cognition, with a focus on both basic and applied research topics, including:

how gesture-based devices affect cognition
how their design can be oriented towards enhancing cognition
how they may enhance creativity and learning
discovery of and definition of a standard set of gestures
methodologies for studying these topics

Participation is encouraged from researchers and practitioners in HCI, psychology, cognitive science, computer science, software systems, and education.
Our goal in this workshop is to discuss foundational and application topics that can form a significant monograph on the topic. We plan to discuss the potential for each submission to be expanded into a chapter in a book on cognitive impacts of gesture-based interaction technologies.
Participants will be selected based on the submission of a 4 page position paper on a topic relevant to the workshop submitted to m.maher@uncc.edu. At least one author of each accepted position paper must attend the workshop and all participants must register for the workshop and for at least one day of the conference.
Important Dates:

Jan 17th — Paper submission deadline
Feb 10th — Notification of acceptance
April  1st — Camera-ready submission deadline
April 26th - Workshop


## notes.md

      
    Raw
  

              notes.md
            
          
    Notes

Topics

Basic concepts we need to touch upon:


evented I/O

streams
redirection
pipelines


sync / async


synchronous events / event synchronization


synchronization of multiple modalities


timed metadata


coordinated views


features / dimensions (detection, extraction, learning)


Tools

Node - evented I/O platform and ecosystem for building fast, scalable network applications
D3/Crossfilter - enables flexible rendering, fast feature filtering, and coordinated views
LevelDB/IndexedDB - distributed datastores, replication
Standards

WebRTC - plugin-free realtime communication protocol
Tracks/Cues/WebVTT - open standards for annotated media, timed metadata, and event synchronization
SLEEP - Syncable Lightweight Event Emitting Persistence
Reference

Client-side dataset replication
Using Device Orientation
Getting Started with WebRTC
[Getting Started with the Track Element](Getting Started With the Track
Element)
HTML5 Video

Node.js is a platform built on Chrome's JavaScript runtime for easily building
fast, scalable network applications. Node.js uses an event-driven, non-blocking
I/O model that makes it lightweight and efficient, perfect for data-intensive
real-time applications that run across distributed devices.

The main tool in node's evented toolbox is the Stream. Stream instances are
basically Unix pipes. They can be readable, writable or both and are easy to
reason about.
Streams come to us from the earliest days of unix and have proven themselves
over the decades as a dependable way to compose large systems out of small
components that do one thing well. In unix, streams are implemented by the
shell with | pipes. In node, the built-in stream module is used by the core
libraries and can also be used by user-space modules. Similar to unix, the node
stream module's primary composition operator is called .pipe() and you get a
backpressure mechanism for free to throttle writes for slow consumers.
Streams make programming in node simple, elegant, and composable. They can help to separate your concerns because they restrict the implementation surface area into a consistent interface that can be reused. You can then plug the output of one stream to the input of another and use libraries that operate abstractly on streams to institute higher-level flow control.
Once you learn the stream api, you can just snap together these streaming
modules like lego bricks or garden hoses instead of having to remember how to
push data through wonky non-streaming custom APIs.
-- The Stream Handbook

  
## xbd.md

      
    Raw
  

              xbd.md
            
          
    XBD

We're pushing the idea of explorable behavioral datasets.
We're investigating:

tooling
standards
best practices

Currently exploring strategies for the immediate capture and playback of gesture/hand-motion samples, but ultimately interested in techniques for persisting, editing, annotating, and replicating behavioral activity streams in general.
Why?

The rise of proximal sensors.
They provide a stream of behavioral data that can be ...

captured
visualized
edited
munged
analyzed
replicated

Relevant Domains


motion capture


motion analysis


motor learning


motor control


Tooling

In general, the plan is to build off D3 for visualization/interaction and IndexedDB as a distributed datastore.
The following modules encapsulate some key methods and techniques that we plan to utilize.
Munging


nest - convert tables to trees


crossfilter - fast multidimensional filtering for coordinated views


Visualization


dc.js - reactive dimensional charting


catcorr.js - visualize correlations across many dimensions of categorical data


Persistence and Replication


dat - real-time replication and versioning for large tabular data sets


level.js - leveldb for the browser


levelweb - leveldb gui w/ builtin
visualization tools


Further Reading


The vision behind Harvest
Replicating large datasets in HTML5
Your API does REST, but can it SLEEP?
Designing coordinated visualizations