Skip to content

Instantly share code, notes, and snippets.

@alexgarel
alexgarel / get_images_empbeddings.py
Created October 10, 2023 10:39
Getting images embeddings samples from robotoff
# run it using docker-compose run --rm worker_high_1 python3
from robotoff.prediction.category.neural.keras_category_classifier_3_0 import *
from robotoff.off import *
from robotoff.models import *
import json
import os
product_id = ProductIdentifier(barcode="3017620422003", server_type=ServerType.off)
@alexgarel
alexgarel / Readme.md
Last active May 17, 2022 16:15
Copy one tar to another

I had a corrupted tar file (due to interruption while writing last file …). I needed to append to archive and it was not possible (I had tarfile.ReadError: empty header as soon as I opened the archive in append mode. Problem : the archive had 1M°+ files all flat (it was intended to use directly).

This script save my day, and was quite fast (Note how opening the big tar is expensive: 9 minutes !):

2022-05-17T15:52:17.492944 starting
2022-05-17T16:03:01.681669 0 done, 0 errs
2022-05-17T16:05:01.001833 100000 done, 0 errs
@alexgarel
alexgarel / test_stream_random_selection.py
Last active April 11, 2022 14:45
Validating streamed random choice
import collections
import statistics
def run_exp(k, N):
"""run one experimentation, that is stream selecting k items among N"""
result = []
seen = 0
for i in range(N):
seen += 1
index = random.randrange(seen)
@alexgarel
alexgarel / getcat.py
Created February 23, 2022 08:08
Control robotoff auto-applied category insight
"""
Get csv from postgresql:
```
docker-compose exec postgres psql -W -U postgres postgres
postgres# \copy (select barcode, STRING_AGG(value_tag, ' / ') from product_insight where automatic_processing = 't' and timestamp > '2022-01-20' and type = 'category' group by barcode order by barcode) to '/tmp/auto-cat.csv' with csv;
COPY 13910
```
Run this below:
```
-- FIRST: backup database before anything !!!!!
-- drop indexes for prediction table for perfs
DROP INDEX prediction_barcode, prediction_data, prediction_server_domain, prediction_source_image, prediction_timestamp, prediction_type;
-- move rows
WITH to_copy_rows AS (
SELECT
barcode,
@alexgarel
alexgarel / docker-getting-id.md
Created October 8, 2021 16:31
Getting developper uid for docker

When you develop with docker and use bind mounts, you may encounter problems if you do ont align the uid / gid of the user in docker with teh uid / gid of your user. For this purpose, some Dockerfile propose an ARG so that you can pass in the UID/GID you need.

But how to find the right id to give ?

On linux, simply run: id

On windows, according to this stackoverflow contribution you should open a command window and run:

Keybase proof

I hereby claim:

  • I am alexgarel on github.
  • I am alexg (https://keybase.io/alexg) on keybase.
  • I have a public key ASBsywNBkJDreBeDtsJ-PVDh0JpstGl1UWyHkhsBeEMHJwo

To claim this, I am signing this object:

@alexgarel
alexgarel / test-es6-querystring-ipv6.py
Created August 25, 2018 12:53
Test of ipv6 in ES6 QueryString
# in the shell
# running ES6::
#
# $ docker pull docker.elastic.co/elasticsearch/elasticsearch:6.4.0
# $ docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.4.0
# in a new widow, install elasticsearch_dsl in a new virtualenv::
#
# $ cd /tmp/
@alexgarel
alexgarel / np_arg_in.py
Created January 31, 2018 19:43
Filtering values from a list in a numpy array (somewhat like isin() of pandas)
import numpy as np
def np_arg_in(a, values, sorter=None):
"""find indices of `a` containing one of values
:param a: numpy array to search in
:param values: numpy array of values to search
:sorter: optional array of integer indices that sort array a into ascending order
"""
if not isinstance(values, np.ndarray):
from collections import defaultdict
from six import string_types
from gensim.models import phrases
from gensim import utils
class Phrases(phrases.Phrases):