Skip to content

Instantly share code, notes, and snippets.

@anjesh
anjesh / facet-phrases
Created May 5, 2012 17:13
Facet with phrases, allows counts for phrases with spaces
curl -XDELETE http://localhost:9200/testcompany/
curl -XPUT 'http://localhost:9200/testcompany/'
curl -XPUT 'http://localhost:9200/testcompany/activity1/_mapping' -d '{
"activity" : {
"properties" : {
"organization" : {"analyzer": "keyword", "type": "string"}
}
}
# csvfix to be installed
# All CRS projects zip files available at http://stats.oecd.org/Index.aspx?datasetcode=CRS1# export > related files
# here 2011 is used as an example, for other years, you have to update the command content below
wget "http://stats.oecd.org/FileView2.aspx?IDFile=d79d2d4e-f15a-41de-83f8-b61a8f5c227a" -O CRS-2011.zip
unzip CRS-2011.zip
# there's some binary characters which needs to be removed before transformation
"""
Basically breaks down the row containing the locations (separated by \n) into multiple rows
e.g. location field contain the following text and the script gives 4 rows for that, and copies the same id and title in all 4 rows
" - Banke (Nepalgunj)
- Dhanusa (Janakpur)
- Rupandehi (Bhairahawa)
- Sunsari (Inaruwa)"
ogr2ogr -f "GeoJSON" Baglung.json NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Baglung'"
ogr2ogr -f "GeoJSON" Mustang.json NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Mustang'"
ogr2ogr -f "GeoJSON" Myagdi.json NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Myagdi'"
ogr2ogr -f "GeoJSON" Parbat.json NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Parbat'"
ogr2ogr -f "GeoJSON" Bhaktapur.json NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Bhaktapur'"
ogr2ogr -f "GeoJSON" Dhading.json NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Dhading'"
ogr2ogr -f "GeoJSON" Kathmandu.json NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Kat
ogr2ogr -f "kml" Baglung.kml NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Baglung'"
ogr2ogr -f "kml" Mustang.kml NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Mustang'"
ogr2ogr -f "kml" Myagdi.kml NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Myagdi'"
ogr2ogr -f "kml" Parbat.kml NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Parbat'"
ogr2ogr -f "kml" Bhaktapur.kml NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Bhaktapur'"
ogr2ogr -f "kml" Dhading.kml NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Dhading'"
ogr2ogr -f "kml" Kathmandu.kml NPL_adm4.shp -sql "select NAME_2 AS ZONE,NAME_3 AS DISTRICT, NAME_4 AS VDC from NPL_adm4 where NAME_3='Kathmandu'"
ogr2ogr -f "kml" Kavrepala
@anjesh
anjesh / news_score.py
Created August 12, 2014 10:30
This scores the given news based on the words defined in the feature.
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import WordPunctTokenizer, PunktWordTokenizer
import string
from os import listdir
from os.path import isfile, join
import logging
logger = logging.getLogger(__name__)
@anjesh
anjesh / convert.py
Last active August 29, 2015 14:19
Converts the count in the column of csv into multiple rows
# coding=utf-8
# convert the count in the column into multiple rows
# e.g.
# a,b,c,2,1
# becomes
# a,b,c,Male
# a,b,c,Male
# a,b,c,FeMale
@anjesh
anjesh / extract.js
Last active May 22, 2017 02:52
Extract the faqs text from the site by running the code in the web console
faqs=document.querySelectorAll('.rule')
for(var i=0;i<faqs.length;i++) {
var faq = faqs[i];
var divs = faq.getElementsByTagName('div');
console.log(divs[0].outerText.trim())
console.log("==")
console.log(divs[3].outerText.trim())
console.log("--------------------------------------")
}
@anjesh
anjesh / etenders-announcements.json
Last active May 25, 2017 14:46
etender.gov.md data structure in json, highlighting the contained metadata
{
"id": 3055235,
"regNumber": "14/00001",
"purchaseQuarter": "III,IV",
"tenderType": {
"id": 7,
"created": "01.12.2008",
"endDate": null,
"mdValue": "Cerere a ofertelor de preţuri",
"ruValue": "Запрос ценовых оферт",
@anjesh
anjesh / contract-89270.json
Last active December 25, 2015 11:58
Tender with 2 contracts for tender.id = 25610 and regNumber = 12/00001
{
"tender": {
"regNumber": "12/00001",
"stateOrg": {
"treasutyAcc": null,
"bankAccount": null,
"fax": "268 22692 078883935",
"code": "1009601000289",
"fkRefTerDepTreasure": null,
"orgName": "Agenția de Dezvoltare Regională Centru",