Skip to content

Instantly share code, notes, and snippets.

View JulesBelveze's full-sized avatar

Jules Belveze JulesBelveze

View GitHub Profile

ElasticSearch Entities Data Structure

The underlying intent here is for a user to be able to narrow done its research to only relevant document. The “Apple” case is representative. The way we do it now will return a bunch of documents containing the fruit. However, thanks to NER (Named Entity Recognition) we will be able to only extract only the “apple” referring to the brand. NER provides an additional information about the extracted entity: its category. We currently support 5 categories: organisation, person, event, product, location.

In the short run we want the user to be able to retrieve mentions containing a given entity and, optionally, from a given category. In the long run we want the user to be able to disambiguate the entity it is searching for. For example, there exists a bunch of “Michael Jackson”: the singer, the soccer player, … So ideally the user should be able to retrieve only mentions referring to one of them.

The proposed map

@JulesBelveze
JulesBelveze / sentence_transformer_to_onnx.py
Created September 3, 2021 12:43
Script to export any `SentenceTransformers` model to ONNX
# Copyright (c) 2021, Hypefactors A/S
#
# Redistribution and use in source and binary forms, with or without modification, are permitted provided that the
# following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following
# disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following
# disclaimer in the documentation and/or other materials provided with the distribution.
@JulesBelveze
JulesBelveze / AirQualityUCI.csv
Last active May 18, 2021 11:03
Data for forecasting
We can't make this file beautiful and searchable because it's too large.
Date_Time,CO(GT),PT08.S1(CO),NMHC(GT),C6H6(GT),PT08.S2(NMHC),NOx(GT),PT08.S3(NOx),NO2(GT),PT08.S4(NO2),PT08.S5(O3),T,RH,AH
10/03/2004 18.00.00,2.6,1360.0,150.0,11.9,1046.0,166.0,1056.0,113.0,1692.0,1268.0,13.6,48.9,0.7578
10/03/2004 19.00.00,2.0,1292.0,112.0,9.4,955.0,103.0,1174.0,92.0,1559.0,972.0,13.3,47.7,0.7255
10/03/2004 20.00.00,2.2,1402.0,88.0,9.0,939.0,131.0,1140.0,114.0,1555.0,1074.0,11.9,54.0,0.7502
10/03/2004 21.00.00,2.2,1376.0,80.0,9.2,948.0,172.0,1092.0,122.0,1584.0,1203.0,11.0,60.0,0.7867
10/03/2004 22.00.00,1.6,1272.0,51.0,6.5,836.0,131.0,1205.0,116.0,1490.0,1110.0,11.2,59.6,0.7888
10/03/2004 23.00.00,1.2,1197.0,38.0,4.7,750.0,89.0,1337.0,96.0,1393.0,949.0,11.2,59.2,0.7848
11/03/2004 00.00.00,1.2,1185.0,31.0,3.6,690.0,62.0,1462.0,77.0,1333.0,733.0,11.3,56.8,0.7603
11/03/2004 01.00.00,1.0,1136.0,31.0,3.3,672.0,62.0,1453.0,76.0,1333.0,730.0,10.7,60.0,0.7702
11/03/2004 02.00.00,0.9,1094.0,24.0,2.3,609.0,45.0,1579.0,60.0,1276.0,620.0,10.7,59.7,0.7648
We can't make this file beautiful and searchable because it's too large.
Date_Time,CO(GT),PT08.S1(CO),NMHC(GT),C6H6(GT),PT08.S2(NMHC),NOx(GT),PT08.S3(NOx),NO2(GT),PT08.S4(NO2),PT08.S5(O3),T,RH,AH
10/03/2004 18.00.00,2.6,1360.0,150.0,11.9,1046.0,166.0,1056.0,113.0,1692.0,1268.0,13.6,48.9,0.7578
10/03/2004 19.00.00,2.0,1292.0,112.0,9.4,955.0,103.0,1174.0,92.0,1559.0,972.0,13.3,47.7,0.7255
10/03/2004 20.00.00,2.2,1402.0,88.0,9.0,939.0,131.0,1140.0,114.0,1555.0,1074.0,11.9,54.0,0.7502
10/03/2004 21.00.00,2.2,1376.0,80.0,9.2,948.0,172.0,1092.0,122.0,1584.0,1203.0,11.0,60.0,0.7867
10/03/2004 22.00.00,1.6,1272.0,51.0,6.5,836.0,131.0,1205.0,116.0,1490.0,1110.0,11.2,59.6,0.7888
10/03/2004 23.00.00,1.2,1197.0,38.0,4.7,750.0,89.0,1337.0,96.0,1393.0,949.0,11.2,59.2,0.7848
11/03/2004 00.00.00,1.2,1185.0,31.0,3.6,690.0,62.0,1462.0,77.0,1333.0,733.0,11.3,56.8,0.7603
11/03/2004 01.00.00,1.0,1136.0,31.0,3.3,672.0,62.0,1453.0,76.0,1333.0,730.0,10.7,60.0,0.7702
11/03/2004 02.00.00,0.9,1094.0,24.0,2.3,609.0,45.0,1579.0,60.0,1276.0,620.0,10.7,59.7,0.7648
year election_id comuna_datachile_id total_electores total_votacion
2017 1 113 160432 67847
2017 1 5 51451 18775
2017 1 191 140080 63937
2017 1 295 9721 4893
2017 1 244 11114 5549
2017 1 234 8772 3986
2017 1 241 19632 9010
2017 1 273 13154 6291
2017 1 292 7699 3739
region_id region_name comuna_datachile_id comuna_customs_id comuna_tax_office_id comuna_name
1 Tarapacá 226 1401 1204 Pozo Almonte
1 Tarapacá 217 1405 1203 Pica
1 Tarapacá 113 1101 1201 Iquique
1 Tarapacá 108 1404 1206 Huara
1 Tarapacá 58 1403 1210 Colchane
1 Tarapacá 26 1402 1208 Camiña
1 Tarapacá 5 1107 1211 Alto Hospicio
2 Antofagasta 321 2301 2101 Tocopilla
2 Antofagasta 313 2104 2202 Taltal
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
CODIGO;ENTIDAD;NOMBRE FANTASIA;DIRECCION;COMUNA;HORARIO REFERENCIAL;ESTE;NORTE;LONGITUD;LATITUD
101;UNIMARC;UNIMARC LAS REJAS;LAS REJAS SUR 1279;ESTACION CENTRAL;Lunes a Dom 09:00 a 20:00;341557;6295424;-70.70507104166667;-33.46882000000001
102;UNIMARC;UNIMARC JUAN ANTONIO RIOS;SALOMON SACK 351;INDEPENDENCIA;Lunes a Dom 09:00 a 20:00;344653;6300937;-70.67080616666668;-33.419566666666654
104;UNIMARC;UNIMARC LO OVALLE;GRAN AVENIDA 6555;LA CISTERNA;Lunes a Dom 09:00 a 20:00;345864;6290169;-70.65963511904762;-33.516832083333334
105;UNIMARC;UNIMARC VICU?A PARAD 19;AVDA.VICU?A MACKENNA 9090;LA FLORIDA;Lunes a Dom 09:00 a 20:00;352240;6287757;-70.5914099988768;-33.5394802038029
106;UNIMARC;UNIMARC ROJAS MAGALLANES;ROJAS MAGALLANES 3638;LA FLORIDA;Lunes a Dom 09:00 a 20:00;355705;6288235;-70.5547798666583;-33.5359270019668
107;UNIMARC;UNIMARC MIRADOR;VICU?A MACKENNA ORIENTE 6331;LA FLORIDA;Lunes a Dom 09:00 a 20:00;350733;6290567;-70.6071683333333;-33.513935000000004
108;UNIMARC;UNIMARC VICENTE VALDES;VICENTE VALDES
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 1 column, instead of 3. in line 1.
CODIGO;ENTIDAD;NOMBRE FANTASIA;DIRECCION;COMUNA;HORARIO REFERENCIAL;ESTE;NORTE;LONGITUD;LATITUD
L1;METRO;BAQUEDANO L1;Avda. Providencia 011;PROVIDENCIA;Lunes a Viernes: 6:00 a 23:00 horas, S�bados: 6:30 a 23:00 horas, Domingos y festivos: 8:00 a 22:30 horas.;348189;6299042;-70.633119;-33.437277
L1;METRO;LOS HEROES L1;Avda. Libertador Bernardo O`Higgins 1774;SANTIAGO;Lunes a Viernes: 6:00 a 23:00 horas, S�bados: 6:30 a 23:00 horas, Domingos y festivos: 8:00 a 22:30 horas.;345548;6297979;-70.661705;-33.446487
L1;METRO;NEPTUNO;Avda. Neptuno Oriente frente al 260;LO PRADO;Lunes a Viernes: 6:00 a 23:00 horas, S�bados: 6:30 a 23:00 horas, Domingos y festivos: 8:00 a 22:30 horas.;339893;6297327;-70.72264572;-33.45153638
L1;METRO;SAN PABLO L1;Avda. San Pablo 6190;LO PRADO;Lunes a Viernes: 6:00 a 23:00 horas, S�bados: 6:30 a 23:00 horas, Domingos y festivos: 8:00 a 22:30 horas.;339837;6298000;-70.723118;-33.445454
L1;METRO;SANTA LUCIA;Avda. Libertador Bernardo O�Higgins N�511;SANTIAGO;Lunes a Viernes: 6:00 a 23:00 hor
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 1 column, instead of 3. in line 1.
CODIGO;ENTIDAD;NOMBRE FANTASIA;DIRECCION;COMUNA;HORARIO REFERENCIAL;ESTE;NORTE;LONGITUD;LATITUD
L1;METRO;ALCANTARA;Avda. Apoquindo Frente al 3885;LAS CONDES;Lunes a Viernes: 6:00 a 20:00 horas, S�bados: 7:00 a 20:00 horas, Domingos y festivos: 8:00 a 20:00 horas.;352191;6301527;-70.58968584;-33.41543877
L1;METRO;ECUADOR;Avda. Libertador Bernardo O`Higgins Frente al 4620;ESTACION CENTRAL;Lunes a Viernes: 6:00 a 20:00 horas, S�bados: 7:00 a 20:00 horas, Domingos y festivos: 8:00 a 20:00 horas.;341905;6296835;-70.7010856;-33.45626665
L1;METRO;EL GOLF;Avda. Apoquindo Frente al N?3231;LAS CONDES;Lunes a Viernes: 6:00 a 20:00 horas, S�bados: 7:00 a 20:00 horas, Domingos y festivos: 8:00 a 20:00 horas.;351619;6301383;-70.59585634;-33.41665773
L1;METRO;ESCUELA MILITAR;Avda. Apoquindo Frente al 4502;LAS CONDES;Lunes a Viernes: 6:00 a 20:00 horas, S�bados: 7:00 a 20:00 horas, Domingos y festivos: 8:00 a 20:00 horas.;352658;6301661;-70.5846369;-33.41429088
L1;METRO;ESTACION CENTRAL;Avda. Libertador Bernardo O`Higgins Fr
@JulesBelveze
JulesBelveze / filtered-data.json
Last active November 26, 2019 16:55
filtered-data2.json
{"WFJx_BMkWiPWP8d_DK6ylg": {"rating": 4.0, "categories": [{"alias": "german", "title": "German"}, {"alias": "brasseries", "title": "Brasseries"}, {"alias": "cafeteria", "title": "Cafeteria"}], "price": 2, "coord": {"latitude": -33.4083077687433, "longitude": -70.5659522588586}}, "vM5AfFpMVEFyWixgRfOXTw": {"rating": 4.0, "categories": [{"alias": "latin", "title": "Latin American"}], "price": null, "coord": {"latitude": -33.4360303, "longitude": -70.6499908}}, "ZrdxrrXP3idsStgMbPhvPg": {"rating": 3.5, "categories": [{"alias": "arabian", "title": "Arabian"}, {"alias": "latin", "title": "Latin American"}], "price": 1, "coord": {"latitude": -33.4395224, "longitude": -70.6627502}}, "5rzy_B_gx0aIVlIRd58u_g": {"rating": 4.0, "categories": [{"alias": "localflavor", "title": "Local Flavor"}], "price": null, "coord": {"latitude": -33.4486726, "longitude": -70.6527376}}, "uBP5ISVSd5D0eEJTgLtmnQ": {"rating": 4.5, "categories": [{"alias": "seafood", "title": "Seafood"}], "price": 1, "coord": {"latitude": -33.4343338, "long