Skip to content

Instantly share code, notes, and snippets.

@JimHaughwout
JimHaughwout / NestedKeysCassandra.sql
Created March 9, 2014 23:19
Ways to set up nested keys in Cassandra
-- Different ways to handle nested keys
-- Data can be from an Android device installed as part of the
-- Open Auto Alliance. Hierarchy is Maker: Model:
-- Using bigint casting of 64-bit HEX android_id
CREATE KEYSPACE auto_data_space WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '3'
};
@JimHaughwout
JimHaughwout / MedSensorCassandra.sql
Last active August 29, 2015 13:57
Re-creating some items from the past. Ways to ingest and control access to medical device data in C*
--Example of nested (i.e., controlled access) to medical device data
--used as part of eSource for medical trials
CREATE KEYSPACE med_devices WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '3'
};
USE med_devices;
@JimHaughwout
JimHaughwout / GeodeticDegreeLen.py
Last active February 28, 2021 06:27
Calculates WGS84 length of a degree of latitude and longitude based on geodetic latitude and longitude position on an elipsoid without need of any external API or data
"""
LENGTH OF A DEGREE OF LATITUDE AND LONGITUDE BY COORDINATE
Calculates length of a degree of latitude and longitude based on geodetic
meridian for any latitude and longitude position on an elipsoid without need of
any external API or data. Constants based on elipsoid values used in WGS84,
replicating calculation used b National Geospatial Agency (NDA) and CSGnet.
Formula is in a format that minimizes error at high latitudes by not dividing
by cosines (like haversine calculations).
@JimHaughwout
JimHaughwout / dedupe_csv.py
Last active January 16, 2021 17:58
Remove duplicates from file. Intended for removal of duplicate data from CSV files (e.g., file of sensor reads).
#! /usr/bin/env python
"""
Remove duplicates rows from comma-separate value files.
Recommended for files saved in Windows CSV format.
Useful for situations where you will have duplicate data (e.g., sensor reads)
: param source : source csv file. Must end in .csv
Result is destination csv file without duplicates.
@JimHaughwout
JimHaughwout / sort_csv.py
Last active May 30, 2021 22:13
Sort CSV file by multiple columns, writing output to sorted CSV file.
#! /usr/bin/env python
"""
Sort CSV file by multiple columns, writing output to sorted CSV file.
Recommended for files saved in Windows CSV format.
Useful for situations where data file is too large for Excel.
: param source_file.csv : source csv file. Must end in .csv
: param sort column 1 : first sort in Excel-like column number (i.e., 1 ... N)
Use negative number to indicate descending sort,
Positive number to indicate ascending sort_step
@JimHaughwout
JimHaughwout / describe_csv.py
Last active August 29, 2015 13:57
Describe details of a CSV file, proving a preview. Useful for first step in cleaning of really large CSV files
#! /usr/bin/env python
"""
Describes a CSV file.Recommended for files saved in Windows CSV format.
Useful for situations where you need to get some basic info on a huge CSV file
(logs, sensor data, etc.)
: param source : csv_file you want to describe. Must end in .csv
: optional param preview_size : number of rows to print raw data as preview
Result printed to screen:
@JimHaughwout
JimHaughwout / drop_csv_columns.py
Last active December 9, 2020 20:11
Remove a series of columns from a CSV file. When you need to strip a huge and sparse CSV file
#! /usr/bin/env python
"""
Remove a series of columns from a CSV file.
Recommended for files saved in Windows CSV format.
Useful for situations where you need to strip a huge and sparse CSV file
(e.g., logs, sensor data, etc.)
Can easily be converted to function for real-time use.
: param source : csv_file you want to strip. Must end in .csv
@JimHaughwout
JimHaughwout / ReduceCSV.py
Created April 11, 2014 23:54
Break up a CSV file into file segments, each with the same header plus N data rows
#! /usr/bin/env python
"""
Split CSV file into file segments, each with header plus N data rows.
Recommended for files saved in Windows CSV format.
Useful for situations break up huge CSV file for map/reduce-like processing
(e.g., logs, sensor data, etc.)
: param source : csv_file you want to strip. Must end in .csv
: param num_rows_per_file : Must be >= 1
@JimHaughwout
JimHaughwout / android_sensor_gps.h
Created April 15, 2014 18:59
Best items pulled from Android's sensors.h and gps.h definitions to aid in common sensor data definitions
/*** How Android does GPS ***/
/* Source at https://github.com/android/platform_hardware_libhardware/blob/master/include/hardware/gps.h */
typedef int64_t GpsUtcTime; /** Milliseconds since January 1, 1970 */
/** Flags to indicate which values are valid in a GpsLocation. */
typedef uint16_t GpsLocationFlags;
// IMPORTANT: Note that the following values must match
// constants in GpsLocationProvider.java.
@JimHaughwout
JimHaughwout / geocoder.py
Last active February 6, 2021 20:33
Command Line script to geocode address or reverse geocode coordinates
#! /usr/bin/env python
"""
Converting lat, long into real addresses (and back) using pygeocoder.
Slower but more robust in terms of tolerance of partial addresses
and level of formated information provided. This still makes use of
Google's geocoding API V3.
Dependency: pip install pygeocoder
: param -a : geocode address