View skylark_transformations_tutorial.md

Qri ("query") is about datasets. A transformion is a repeatable script for generating a dataset. Skylark is a scripting langauge from Google that feels a lot like python. This package implements skylark as a transformation syntax. Skylark tranformations are about as close as one can get to the full power of a programming language as a transformation syntax. Often you need this degree of control to generate a dataset.

Typical examples of a skylark transformation include:

  • combining paginated calls to an API into a single dataset
  • downloading unstructured structured data from the internet to extract
  • re-shaping raw input data before saving a dataset

We're excited about skylark for a few reasons:

  • python syntax - many people working in data science these days write python, we like that, skylark likes that. dope.
  • deterministic subset of python - unlike python, skylark removes properties that reduce introspection into code behaviour
View cr_to_crlf_replacer.go
package main
import (
"bufio"
"bytes"
"fmt"
"io"
"encoding/csv"
)
View collection.json
{
"name": "Wierd & Wonderful wildlife",
"url": "https://www.fws.gov/endangered/education/wonderful.html",
"description": "'There are many threatened and endangered species that you have probably never heard of. Here, you can learn about 14 weird and wonderful species that are currently endangered, threatened, or of special concern. Learning about these rare species can be fun! Choose a game and good luck!' Links mostly to old flash games. Not long for this world.'"
}
View collection.json
{
"name": "Advisory Committee on Climate Change and Natural Resource Science (ACCCNRS)",
"url": "https://nccwsc.usgs.gov/acccnrs",
"description": "The Advisory Committee on Climate Change and Natural Resource Science (ACCCNRS) was established in 2013 to advise the Secretary of the Interior on the operations of the U.S. Geological Survey (USGS) National Climate Change and Wildlife Science Center (NCCWSC) and the Department of the Interior (DOI) Climate Science Centers (CSCs). ACCCNRS was composed of 25 members that represented (1) State and local governments, including state membership entities; (2) Nongovernmental organizations, including those whose primary mission is professional and scientific and those whose primary mission is conservation and related scientific and advocacy activities; (3) American Indian tribes and other Native American entities; (4) Academia; (5) Landowners, businesses, and organizations representing landowners or businesses. In 2015, ACCCNRS released its 2015 Report to the Secre
View keybase.md

Keybase proof

I hereby claim:

  • I am b5 on github.
  • I am bfive (https://keybase.io/bfive) on keybase.
  • I have a public key ASAxwz2PapxPUS8GzFKzteRZ15MFoKkJjbxtjMP2Qw98tQo

To claim this, I am signing this object:

View normalize_urls_list.go
package main
import (
"bufio"
"flag"
"fmt"
"github.com/PuerkitoBio/purell"
"io"
"net/url"
"os"
View collection.json
{ "name" : "Protecting Endangered Species from Pesticides",
"url" : "https://www.epa.gov/endangered-species",
"description" : "Main EPA Endangerewd Species Act & Pesticides Web Pages"
}
View kiwix.go
// Scrape info from kiwix service. This takes a little while to execute.
package main
import (
"encoding/json"
"github.com/PuerkitoBio/goquery"
"io/ioutil"
"log"
"net/http"
"os"
View audit_full.json
This file has been truncated, but you can view the full file.
{
"name": "EPA",
"descendants": 18476,
"descendantsDownloadedOnce": 6860,
"descendantsArchivedOnce": 6842,
"children": [
{
"name": "https",
"descendants": 15278,