Last active
October 4, 2016 15:25
-
-
Save spanishgum/22843dd0e99e0654d958e375ee9a4523 to your computer and use it in GitHub Desktop.
My abstract submission for the AGU meeting in California.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hey friends, I figured I would just a post a gist here so ya'll can get a feel for what I am working on. | |
You'll find the actual abstract below (between the lines). Or skip further down to see a layman's terms | |
description for the less technical folk :) | |
----------------------------------------------------------------------- | |
Enhancing SAMOS Data Access in DOMS via a Neo4j Property Graph Database. | |
The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine | |
access to high-quality marine meteorological and near-surface oceanographic observations from research | |
vessels. The Distributed Oceanographic Match-Up Service (DOMS) under development is a centralized service | |
that allows researchers to easily match in situ and satellite oceanographic data from distributed | |
sources to facilitate satellite calibration, validation, and retrieval algorithm development. The service | |
currently uses Apache Solr as a backend search engine on each node in the distributed network. While Solr | |
is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited | |
in the sense that its schema is fixed. The property graph model escapes this limitation by creating | |
relationships between data objects. | |
The authors will present the development of the SAMOS Neo4j property graph database including new search | |
possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, | |
and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS | |
Neo4j graph into DOMS will also be described. Currently, Neo4j contains spatial and temporal records from | |
SAMOS which are modeled into a time tree and r-tree using Graph Aware and Spatial plugin tools for Neo4j. | |
These extensions provide callable Java procedures within CYPHER (Neo4j's query language) that generate | |
in-graph structures. Once generated, these structures can be queried using procedures from these | |
libraries, or directly via CYPHER statements. | |
Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases | |
because they require memory intensive joins due to the limitation of their design. Consider a user who | |
wants to find records over several years, but only for specific months. If a traditional database only | |
stores timestamps, this type of query would be complex and likely prohibitively slow. Using the time tree | |
model, one can specify a path from the root to the data which restricts resolutions to certain | |
timeframes (e.g., months). This query can be executed without joins, unions, or other compute-intensive | |
operations, putting Neo4j at a computational advantage to the SQL database alternative. | |
------------------------------------------------------------------ | |
OK. So what the heck was all that right? Don't worry too much about all the acronyms. Rather, consider | |
the simple idea that I am trying to process LOTS of data REALLY quickly. I have millions of data points | |
of the following form -> (latitude, longitude, time, ...other sciency variables...). Essentially I am | |
orchestrating a series of software tools to facilitate moving all this stuff around across a distributed | |
network (Currently between people here at FSU-Florida, some at NCAR-Colorado, and some at JPL-California). | |
The core of this research is really me trying to figure out how I can take advantage of a 'graph database'. | |
It stores information in a very different way than traditional systems. The challenging part is figuring | |
out how to take full advantage of the graph concept. This means shortest path finding, subgraph matching, | |
etc. These algorithms are easy to call since they are built in to the system. BUT, its figuring out HOW to | |
use them, and WHAT data structures I can build internally so that these algorithms actually do something | |
meaningful - and quickly. | |
As of current, I'm by no means coming out with anything out of this world. There are lots of people like | |
me working on this kind of stuff, but what separates my work is the domain. Working with geo spatial data | |
on this scale is more common today, but with working things like satelittes, ships, bouys, i.e. objects | |
moving on the global scale, is still a prominent area of research. | |
Thanks for your support guys :) | |
-Adam |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment