Skip to content

Instantly share code, notes, and snippets.

View dshorthouse's full-sized avatar

David Shorthouse dshorthouse

View GitHub Profile
@dshorthouse
dshorthouse / gist:c2bc1896b12e1648e2bc
Created August 7, 2014 03:16
Convert DDMMSS coordinates to array of DD latitude/longitude in PHP
/**
* Split DDMMSS or DD coordinate pair string into an array
*
* @param string $point A string purported to be a coordinate
* @return array(latitude, longitude) in DD
*/
function make_coordinates($point)
{
$loc = preg_replace(array('/[\p{Z}\s]/u', '/[^\d\s,;.\-NSEWO°ºdms\'"]/i'), array(' ', ''), $point);
if (preg_match('/[NSEWO]/', $loc) != 0) {
@dshorthouse
dshorthouse / wkt.html
Last active August 29, 2019 10:32
GMap WKT Drawing
<html>
<head>
<style type="text/css">
#map{width:600px;height:400px;}
#freehand {
margin-left:-5px;
margin-top:5px;
}
#freehand .button{
direction: ltr;
@dshorthouse
dshorthouse / mapscript-extent.php
Last active March 10, 2017 19:37
PHP MapScript reprojection/extent issues
<?php
//MapServer version 7.0.4
$map = ms_newMapObjFromString("MAP END");
$map->set("units", MS_DD);
$map->setProjection("proj=longlat,ellps=WGS84,datum=WGS84,no_defs", true);
$map->setExtent(-180, -90, 180, 90);
$map->setSize(100,100);
$map->setProjection("proj=robin,lon_0=0,x_0=0,y_0=0,ellps=WGS84,datum=WGS84,units=m,over,no_defs", true);
@dshorthouse
dshorthouse / ruby_ocr.rb
Last active March 24, 2024 21:01
OCR Image-based PDF in ruby
require 'parallel'
require 'rtesseract'
require 'mini_magick'
source = "/MyDirectory/my.pdf"
doc = {}
pdf = MiniMagick::Image.open(source)
Parallel.map(pdf.pages.each_with_index, in_threads: 8) do |page, idx|
tmpfile = Tempfile.new(['', '.tif'])
MiniMagick::Tool::Convert.new do |convert|
@dshorthouse
dshorthouse / data.csv
Last active March 30, 2023 17:58
Basic R Script to use SimpleMappr API with csv file
species latitude longitude
Pardosa moesta 45.755 -110.12
Pardosa fuscula 47.9 -112
Pardosa moesta 55.6 -101
Pardosa xerampelina 48.9 -103.55
Pardosa xerampelina 43.02 -105.9
Trochosa terricola 45.5 -103.8
Trochosa terricola 46 -100
Trochosa terricola 47.7 -110.9
Pardosa moesta 48 -109
910
1984
 DZRJ
-
()
(UB 19881)
(UB 19882)
(UFG 13985)
(UFG 13986)
*
@dshorthouse
dshorthouse / Bloodhound_Lost_Attributions
Created November 23, 2019 17:54
Users whose attributions were lost in Bloodhound due to "over-ingested" specimen records in the GBIF index just prior to November 12, 2019
NULL,"0000-0002-7101-9767","Roderic","Page"
NULL,"0000-0001-9008-0611","Stylianos","Chatzimanolis"
NULL,"0000-0002-6752-9721","Tod","Robbins"
NULL,"0000-0002-7053-8557","Paul","Sokoloff"
NULL,"0000-0001-7618-5230","David Peter","Shorthouse"
NULL,"0000-0003-1366-145X","Timothy","Dickinson"
NULL,"0000-0003-0768-1286","Richard","Pyle"
NULL,"0000-0002-4124-2175","Peter","Hovenkamp"
NULL,"0000-0001-6065-0812","Frank-Thorsten","Krell"
NULL,"0000-0002-1314-755X","Neal","Evenhuis"
@dshorthouse
dshorthouse / bloodhound.md
Last active February 20, 2020 16:40 — forked from timrobertson100/bloodhound.md
A quick test to explore a bloodhound process

This is a quick test of a modified version of the Bloodhound spark script to check it runs on the GBIF Cloudera cluster (CDH 5.16.2).

From the gateway, grab the file from HDFS (skip HTTP for speed), unzip (15-20 mins) and upload to HDFS:

hdfs dfs -getmerge /occurrence-download/prod-downloads/0002504-181003121212138.zip /mnt/auto/misc/bloodhound/data.zip
unzip /mnt/auto/misc/bloodhound/data.zip -d /mnt/auto/misc/bloodhound/data

hdfs dfs -rm /tmp/verbatim.txt
hdfs dfs -rm /tmp/occurrence.txt
{:user_id=>11771, :name=>"Cyrus Pringle", :orphaned=>716},
{:user_id=>14743, :name=>"Gerdt Guenther Hatschbach", :orphaned=>594},
{:user_id=>191, :name=>"Volker Framenau", :orphaned=>586},
{:user_id=>35074, :name=>"Martti Rautanen", :orphaned=>454},
{:user_id=>10182, :name=>"Georg August Zenker", :orphaned=>381},
{:user_id=>12169, :name=>"Paul Sintenis", :orphaned=>349},
{:user_id=>9829, :name=>"Joseph Friedrich Nicolaus Bornmüller", :orphaned=>302},
{:user_id=>10487, :name=>"José Arechavaleta", :orphaned=>250},
{:user_id=>11937, :name=>"Theodor Kotschy", :orphaned=>233},
{:user_id=>11853, :name=>"Ynes Mexia", :orphaned=>150},
We can make this file beautiful and searchable if this error is corrected: It looks like row 8 should actually have 7 columns, instead of 6. in line 7.
GBIF URL,recordedBy,eventDate,year,country,countryCode,GBIF Dataset
https://gbif.org/occurrence/2433942,"Mrs. C. Pease, Miss E. Butler",,1903,Jamaica,JM,https://gbif.org/dataset/40d2de00-0c6e-11dd-84d2-b8a03c50a862
https://gbif.org/occurrence/29404620,U.Mizushima (Mrs.),1954-09-06T01:00Z,1954,,JP,https://gbif.org/dataset/8346c3a8-f762-11e1-a439-00145eb45e9a
https://gbif.org/occurrence/29408002,U.Mizushima (Mrs.),1954-09-19T01:00Z,1954,,JP,https://gbif.org/dataset/8346c3a8-f762-11e1-a439-00145eb45e9a
https://gbif.org/occurrence/29426346,U.Mizushima (Mrs.),1952-08-01T01:00Z,1952,,JP,https://gbif.org/dataset/8346c3a8-f762-11e1-a439-00145eb45e9a
https://gbif.org/occurrence/29429161,U.Mizushima (Mrs.),1954-09-29T01:00Z,1954,,JP,https://gbif.org/dataset/8346c3a8-f762-11e1-a439-00145eb45e9a
https://gbif.org/occurrence/29451087,U.Mizushima (Mrs.),1955-03-04T01:00Z,1955,,JP,https://gbif.org/dataset/8346c3a8-f762-11e1-a439-00145eb45e9a
https://gbif.org/occurrence/29451907,Mrs. Fay A. Mac Fadden,1926-08-16T01:00Z,1926,,