Skip to content

Instantly share code, notes, and snippets.

View georgiana-b's full-sized avatar

Georgiana Bere georgiana-b

  • Berlin
View GitHub Profile
@georgiana-b
georgiana-b / create_lai_product_datasets.py
Created April 22, 2020 08:54
Script to create output datasets from the ITC Leaf Area Index datasets
import requests
import json
import os
import sys
import glob
import datetime
import pprint
from bs4 import BeautifulSoup as Soup
try:
@georgiana-b
georgiana-b / elvis-distributed-config.json
Last active July 5, 2018 14:00
Full distributed config of the elvis database in ODB sharding test
{
"@type": "d",
"@version": 0,
"readYourWrites": true,
"newNodeStrategy": "static",
"servers": {
"@type": "d",
"@version": 0,
"boss": "MASTER",
"*": "master",
@georgiana-b
georgiana-b / topojson.json
Created July 6, 2017 17:44
The self-contained version of the TopoJSON JSON schema here: https://github.com/nhuebel/TopoJSON_schema
{
"$schema":"http://json-schema.org/draft-04/schema#",
"title":"TopoJSON object",
"description":"Schema for a TopoJSON object",
"type":"object",
"required":[
"type"
],
"properties":{
"bbox":{
# -*- coding: utf-8 -*-
from sqlalchemy import create_engine
import pandas
import os
def has_duplicates(dataframe):
duplicates = dataframe[dataframe.duplicated()]
# Uncomment line below to see the duplicated rows in the dataframe
# print(duplicates)
return not duplicates.empty
@georgiana-b
georgiana-b / valid.csv
Created April 17, 2017 14:10
Data Quality Cli valid CSV
id name slug
ao Angola angola
ag Antigua and Barbuda antigua-and-barbuda
ar Argentina argentina
am Armenia armenia
aw Aruba aruba
au Australia australia
at Austria austria
bd Bangladesh bangladesh
be Belgium belgium
@georgiana-b
georgiana-b / trial_index_mapping.json
Last active January 21, 2017 13:36
`trial` index mapping as returned by ElasticSearch
{
"trials_2017-01-21_1485004350175" : {
"mappings" : {
"trial" : {
"dynamic_templates" : [ {
"identifiers_values_arent_analyzed" : {
"mapping" : {
"index" : "not_analyzed",
"type" : "string"
},
@georgiana-b
georgiana-b / elasticsearch_response_match_all.json
Created January 21, 2017 13:18
Trials from `trial` index contain fields that were not included in the mapping (`source`, `risks_of_bias`, `status`, `results_exemption_date` etc.)
{
"took" : 21,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
# -*- coding: utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import json
def process(conf, conn):
# import ipdb; ipdb.set_trace()
@georgiana-b
georgiana-b / similar_columns.py
Last active September 23, 2016 08:33
Find pairs of duplicated columns, both exact duplicates and percentual duplicates
import pandas
def find_exact_duplicate_columns(frame):
"""Find pairs of columns that are exact duplicates
i.e. each value should match with the other's column value for all rows.
Adapted from this thoughtful answer: http://stackoverflow.com/a/32961145
"""
dups = []
columns = frame.columns
http://data.gov.au/dataset/0de37904-43e0-4814-b21b-5b64fafefe6f/resource/1c48292a-9bfb-476c-850f-7b0da3c273fc/download/prodcom.csv
Traceback (most recent call last):
File "/home/g/.virtualenvs/uk-spend/bin/dq", line 9, in <module>
load_entry_point('data-quality==0.1.1', 'console_scripts', 'dq')()
File "/home/g/.virtualenvs/uk-spend/lib/python3.4/site-packages/click/core.py", line 716, in __call__
return self.main(*args, **kwargs)
File "/home/g/.virtualenvs/uk-spend/lib/python3.4/site-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/home/g/.virtualenvs/uk-spend/lib/python3.4/site-packages/click/core.py", line 1060, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))