Skip to content

Instantly share code, notes, and snippets.

@joyrexus
Last active January 3, 2016 12:19
Show Gist options
  • Save joyrexus/8461988 to your computer and use it in GitHub Desktop.
Save joyrexus/8461988 to your computer and use it in GitHub Desktop.
Proposal for CHI2014 workshop

Proximal Action Streams

A case study in the use of proximal sensors for constructing multimodal datasets.

rendered gesture data

Converting the behavioral data streams afforded by proximal sensors into explorable multimodal datasets is a new challenge faced by both HCI practitioners as well as the behavioral research community focused on more theoretical questions. We intend to present a case study that demonstrates a particular approach to this problem by leveraging open web standards and lightweight development tools for rapid prototyping.

In our submission to the workshop, we intend to describe our development toolchain, which we've found to be particularly appropriate for this task, and to highlight some basic techniques for working with (viz., capturing, filtering, synchronizing, rendering, extracting, analyzing, and replicating) the varied behavioral activity streams made available by proximal sensors.


Voice, image, touch, motion, orientation, and geolocation sensors are now ubiquitous. They're embedded in the communication devices we carry around in our pockets. Complementing this already pervasive technology is a new breed of compact, powerful, low-cost motion and depth sensors. We'll refer to the various sensors designed to gauge user behavior within close range of the human body as proximal sensors, distinguishing them from distal sensors that are designed to sense activity from a distance.

Much of the appeal of proximal sensors lies in their potential for enabling new forms of perceptual computing and natural user interaction, especially when used in concert with one another. They can function as a coordinated set of input devices, each sensing one facet of the user's behavior or context, which can then in turn be rendered in an appropriate form for the user to respond to as part of a tight perceptual feedback loop.

However, designing new forms of user interaction requires a lot of empirical groundwork: gathering observations, testing what works and what doesn't. And therein lies a futher appeal of proximal sensors: they can function as effective tools for gathering the very behavioral data that is being designed for and needs to be evaluated.

But while enabling new forms of behavorial data collection, integrating this new range of data into a unified event stream synchronously with conventional A/V media recordings can be a challenge. This is a challenge faced both by HCI practitioners and the behavioral research community focused on more theoretical questions.

We intend to present a case study that demonstrates a particular approach to this problem by leveraging open web standards and lightweight development tools for rapid prototyping. Our approach emerged in the course of a research study focused on action gestures, described below. With the arrival of proximal sensors and open web standards, we were able to piece together a suite of low-cost tools and techniques for producing explorable multimodal datasets. The tools and techinques we describe should be of use to interaction researchers seeking new ways of exploring nonverbal communication and proximal action streams generally.

In next section we elaborate on just one of the motivating problems we encountered in our research study. We aim to address this problem in some detail in our final submission.

The challenge of multi-modal event synchrony and faceted playback

For anyone looking to utilize this new breed of sensors, one particular problem is integrating their output into a unified, synchronous event stream to enable simultaneous rendering and faceted playback.

Each sensor stream captures particular behavioral features of the user/subject's activity. Enabling researchers to selectively view particular sets of features during playback (faceted playback) can be of great value during the exploratory phase of research.

For example, a researcher might want to capture gestural movements performed by a subject with a close-range time-of-flight sensor while using an embedded video camera and mic to record their image and speech. A researcher would ideally like to view the A/V media in tight synchrony with a rendering of the captured motion data and select particular facets of the motion stream in the course of playback. It may be useful to view finger orientation relative to the hand in one instance, but then turn off finger rendering and isolate hand movements along a particular axis to better scrutinize certain gesture transition points.

We'll describe one approach to synchronization and faceted playback with media cueing standards that handle timed metadata. We'll also look at some emerging standards for media/data integration at the time of capture.

Research study description

The set of practices we intend to describe in our submission emerged in the course of an experimental study focused on action gestures. For this study we utlized the Leap Motion Controller, a low-cost motion tracking device, to capture gestures produced by the subjects in the study. What follows is a description of the research question targeted in our study.

When people describe actions in the world, such as how to tie a tie or how to assemble a piece of furniture, they produce gestures that look to the naked eye much like the actions they are describing. This observation has led many researchers to endorse the view that gesture is a kind of “simulated action”. In our first project using the LEAP we are seeking to understand precisely how action gestures are like (and unlike) the actions they represent.

Actions in the world require the finely calibrated deployment of force to move objects fluidly and safely. Do gestures somehow represent this kind of force information, too, or do they abstract it away? To address this question we are using a classic reasoning puzzle, the Tower of Hanoi, which requires the solver to move disks of different weights on to and off of pegs. Though the physical details of the puzzle (e.g. the height of the pegs, size of the disks, etc.) are irrelevant to the logical structure of the solution, people reliably encode such information in gesture when describing their solutions. Previous measurements of people’s Tower of Hanoi gestures have been relatively coarse-grained, however, limited by what features of gesture could be reliably extracted from video. By using the LEAP to track people’s gestures as they describe moving the disks, we hope to learn whether their movements differ systematically according to the weight of the disk they are describing.

In the course of this study we've developed a number of basic command-line tools for capturing, filtering, viewing and extracting gesture data. An outline of how we're using these tools to record gesture samples and extract position and veloctiy data can be found here.

Implications for cognitive science and gesture research

Proximal sensors are arriving at a time of surging interest in the dynamics of human movement across the cognitive sciences. Over the last decade researchers have begun to see movement in all its richness — the precise arc the hand takes during reaching, the swaying of the hips as one weighs a decision — as a special window into the moment-to-moment dynamics of cognition (Spivey, 2007). One type of movement in particular that has captured the attention of researchers is hand gestures. Video has been the traditional tool for analyzing such gestures, but researchers are beginning to run up against the limits of this technology, formulating questions which would be extremely time-consuming to answer with video data or which cannot be answered at all.

While motion-tracking technologies have been used in research for several years, their use has been prohibitive for all but the most determined and invested researchers. Aside from the obvious barrier of cost, there has been an even more daunting barrier of expertise. Extracting and analyzing data from a motion-capture system requires intensive training if not dedicated personnel. With the arrival of proximal sensors and open web standards, we have a new suite of low-cost tools and techniques for producing explorable multimodal datasets. These datasets can be designed to enable researchers to graphically visualize different dimensions of the recorded data and to query and annotate particular portions of it. For a gesture researcher, for example, such tools might have the “look and feel” of video, with similar playback controls, while also reducing the dimensionality of the data to suit the researcher’s needs.

CHI 2014 | Gesture Interaction Workshop

Gesture-Based Interaction: Communication and Cognition
CHI 2014 Workshop
Saturday, April 26th, 2014
Toronto, Canada

This workshop will discuss the implications and role of gestures in communication and cognition on interaction design where gesture is an integral part of the design. We will explore underlying relationships between gesture-based computing devices, communication, and cognition, with a focus on both basic and applied research topics, including:

  • how gesture-based devices affect cognition
  • how their design can be oriented towards enhancing cognition
  • how they may enhance creativity and learning
  • discovery of and definition of a standard set of gestures
  • methodologies for studying these topics

Participation is encouraged from researchers and practitioners in HCI, psychology, cognitive science, computer science, software systems, and education.

Our goal in this workshop is to discuss foundational and application topics that can form a significant monograph on the topic. We plan to discuss the potential for each submission to be expanded into a chapter in a book on cognitive impacts of gesture-based interaction technologies.

Participants will be selected based on the submission of a 4 page position paper on a topic relevant to the workshop submitted to m.maher@uncc.edu. At least one author of each accepted position paper must attend the workshop and all participants must register for the workshop and for at least one day of the conference.

Important Dates:

  • Jan 17th — Paper submission deadline
  • Feb 10th — Notification of acceptance
  • April 1st — Camera-ready submission deadline
  • April 26th - Workshop

Notes

Topics

Basic concepts we need to touch upon:

  • evented I/O

    • streams
    • redirection
    • pipelines
  • sync / async

  • synchronous events / event synchronization

  • synchronization of multiple modalities

  • timed metadata

  • coordinated views

  • features / dimensions (detection, extraction, learning)

Tools

Node - evented I/O platform and ecosystem for building fast, scalable network applications

D3/Crossfilter - enables flexible rendering, fast feature filtering, and coordinated views

LevelDB/IndexedDB - distributed datastores, replication

Standards

WebRTC - plugin-free realtime communication protocol

Tracks/Cues/WebVTT - open standards for annotated media, timed metadata, and event synchronization

SLEEP - Syncable Lightweight Event Emitting Persistence

Reference

Client-side dataset replication

Using Device Orientation

Getting Started with WebRTC

[Getting Started with the Track Element](Getting Started With the Track Element)

HTML5 Video


Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.


The main tool in node's evented toolbox is the Stream. Stream instances are basically Unix pipes. They can be readable, writable or both and are easy to reason about.

Streams come to us from the earliest days of unix and have proven themselves over the decades as a dependable way to compose large systems out of small components that do one thing well. In unix, streams are implemented by the shell with | pipes. In node, the built-in stream module is used by the core libraries and can also be used by user-space modules. Similar to unix, the node stream module's primary composition operator is called .pipe() and you get a backpressure mechanism for free to throttle writes for slow consumers.

Streams make programming in node simple, elegant, and composable. They can help to separate your concerns because they restrict the implementation surface area into a consistent interface that can be reused. You can then plug the output of one stream to the input of another and use libraries that operate abstractly on streams to institute higher-level flow control.

Once you learn the stream api, you can just snap together these streaming modules like lego bricks or garden hoses instead of having to remember how to push data through wonky non-streaming custom APIs.

-- The Stream Handbook

XBD

We're pushing the idea of explorable behavioral datasets.

We're investigating:

Currently exploring strategies for the immediate capture and playback of gesture/hand-motion samples, but ultimately interested in techniques for persisting, editing, annotating, and replicating behavioral activity streams in general.

Why?

The rise of proximal sensors.

They provide a stream of behavioral data that can be ...

  • captured
  • visualized
  • edited
  • munged
  • analyzed
  • replicated

Relevant Domains

Tooling

In general, the plan is to build off D3 for visualization/interaction and IndexedDB as a distributed datastore.

The following modules encapsulate some key methods and techniques that we plan to utilize.

Munging

  • nest - convert tables to trees

  • crossfilter - fast multidimensional filtering for coordinated views

Visualization

  • dc.js - reactive dimensional charting

  • catcorr.js - visualize correlations across many dimensions of categorical data

Persistence and Replication

  • dat - real-time replication and versioning for large tabular data sets

  • level.js - leveldb for the browser

  • levelweb - leveldb gui w/ builtin visualization tools

Further Reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment