spanishgum/agu_abstract

## agu_abstract
Hey friends, I figured I would just a post a gist here so ya'll can get a feel for what I am working on.
You'll find the actual abstract below (between the lines). Or skip further down to see a layman's terms
description for the less technical folk :)

-----------------------------------------------------------------------
Enhancing SAMOS Data Access in DOMS via a Neo4j Property Graph Database.

The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine
access to high-quality marine meteorological and near-surface oceanographic observations from research
vessels. The Distributed Oceanographic Match-Up Service (DOMS) under development is a centralized service
that allows researchers to easily match in situ and satellite oceanographic data from distributed
sources to facilitate satellite calibration, validation, and retrieval algorithm development. The service
currently uses Apache Solr as a backend search engine on each node in the distributed network. While Solr
is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited
in the sense that its schema is fixed. The property graph model escapes this limitation by creating
relationships between data objects.

The authors will present the development of the SAMOS Neo4j property graph database including new search
possibilities that take advantage of the property graph model, performance comparisons with Apache Solr,
and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS
Neo4j graph into DOMS will also be described. Currently, Neo4j contains spatial and temporal records from
SAMOS which are modeled into a time tree and r-tree using Graph Aware and Spatial plugin tools for Neo4j.
These extensions provide callable Java procedures within CYPHER (Neo4j's query language) that generate
in-graph structures. Once generated, these structures can be queried using procedures from these
libraries, or directly via CYPHER statements.

Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases
because they require memory intensive joins due to the limitation of their design. Consider a user who
wants to find records over several years, but only for specific months. If a traditional database only
stores timestamps, this type of query would be complex and likely prohibitively slow. Using the time tree
model, one can specify a path from the root to the data which restricts resolutions to certain
timeframes (e.g., months). This query can be executed without joins, unions, or other compute-intensive
operations, putting Neo4j at a computational advantage to the SQL database alternative.


------------------------------------------------------------------


OK. So what the heck was all that right? Don't worry too much about all the acronyms. Rather, consider
the simple idea that I am trying to process LOTS of data REALLY quickly. I have millions of data points
of the following form -> (latitude, longitude, time, ...other sciency variables...). Essentially I am
orchestrating a series of software tools to facilitate moving all this stuff around across a distributed
network (Currently between people here at FSU-Florida, some at NCAR-Colorado, and some at JPL-California).

The core of this research is really me trying to figure out how I can take advantage of a 'graph database'.
It stores information in a very different way than traditional systems. The challenging part is figuring
out how to take full advantage of the graph concept. This means shortest path finding, subgraph matching,
etc. These algorithms are easy to call since they are built in to the system. BUT, its figuring out HOW to
use them, and WHAT data structures I can build internally so that these algorithms actually do something
meaningful - and quickly.

As of current, I'm by no means coming out with anything out of this world. There are lots of people like
me working on this kind of stuff, but what separates my work is the domain. Working with geo spatial data
on this scale is more common today, but with working things like satelittes, ships, bouys, i.e. objects
moving on the global scale, is still a prominent area of research.

Thanks for your support guys :)
-Adam
	Hey friends, I figured I would just a post a gist here so ya'll can get a feel for what I am working on.
	You'll find the actual abstract below (between the lines). Or skip further down to see a layman's terms
	description for the less technical folk :)

	-----------------------------------------------------------------------
	Enhancing SAMOS Data Access in DOMS via a Neo4j Property Graph Database.

	The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine
	access to high-quality marine meteorological and near-surface oceanographic observations from research
	vessels. The Distributed Oceanographic Match-Up Service (DOMS) under development is a centralized service
	that allows researchers to easily match in situ and satellite oceanographic data from distributed
	sources to facilitate satellite calibration, validation, and retrieval algorithm development. The service
	currently uses Apache Solr as a backend search engine on each node in the distributed network. While Solr
	is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited
	in the sense that its schema is fixed. The property graph model escapes this limitation by creating
	relationships between data objects.

	The authors will present the development of the SAMOS Neo4j property graph database including new search
	possibilities that take advantage of the property graph model, performance comparisons with Apache Solr,
	and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS
	Neo4j graph into DOMS will also be described. Currently, Neo4j contains spatial and temporal records from
	SAMOS which are modeled into a time tree and r-tree using Graph Aware and Spatial plugin tools for Neo4j.
	These extensions provide callable Java procedures within CYPHER (Neo4j's query language) that generate
	in-graph structures. Once generated, these structures can be queried using procedures from these
	libraries, or directly via CYPHER statements.

	Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases
	because they require memory intensive joins due to the limitation of their design. Consider a user who
	wants to find records over several years, but only for specific months. If a traditional database only
	stores timestamps, this type of query would be complex and likely prohibitively slow. Using the time tree
	model, one can specify a path from the root to the data which restricts resolutions to certain
	timeframes (e.g., months). This query can be executed without joins, unions, or other compute-intensive
	operations, putting Neo4j at a computational advantage to the SQL database alternative.


	------------------------------------------------------------------


	OK. So what the heck was all that right? Don't worry too much about all the acronyms. Rather, consider
	the simple idea that I am trying to process LOTS of data REALLY quickly. I have millions of data points
	of the following form -> (latitude, longitude, time, ...other sciency variables...). Essentially I am
	orchestrating a series of software tools to facilitate moving all this stuff around across a distributed
	network (Currently between people here at FSU-Florida, some at NCAR-Colorado, and some at JPL-California).

	The core of this research is really me trying to figure out how I can take advantage of a 'graph database'.
	It stores information in a very different way than traditional systems. The challenging part is figuring
	out how to take full advantage of the graph concept. This means shortest path finding, subgraph matching,
	etc. These algorithms are easy to call since they are built in to the system. BUT, its figuring out HOW to
	use them, and WHAT data structures I can build internally so that these algorithms actually do something
	meaningful - and quickly.

	As of current, I'm by no means coming out with anything out of this world. There are lots of people like
	me working on this kind of stuff, but what separates my work is the domain. Working with geo spatial data
	on this scale is more common today, but with working things like satelittes, ships, bouys, i.e. objects
	moving on the global scale, is still a prominent area of research.

	Thanks for your support guys :)
	-Adam