Skip to content

Instantly share code, notes, and snippets.

View malteos's full-sized avatar

malteos malteos

  • Berlin, Germany
View GitHub Profile
@malteos
malteos / mteb_bm25.py
Created June 26, 2024 06:13
Run BM25 baseline on MTEB retrieval tasks
"""Evaluate BM25 on MTEB tasks
Usage:
python bm25.py -t <task name> --output_folder=./data/results
Notes:
- https://github.com/xhluca/bm25s (promissing implememntation)
- https://github.com/beir-cellar/beir/blob/main/examples/retrieval/evaluation/lexical/evaluate_bm25.py
- https://colab.research.google.com/drive/1HfutiEhHMJLXiWGT8pcipxT5L2TpYEdt?usp=sharing#scrollTo=nqotyXuIBPt6
@malteos
malteos / docker-without-desktop-macos.md
Created May 28, 2024 11:03
Run Docker (without Docker Desktop) on MacOS with Apple Silicon (M1/M2/...)

Run Docker (without Docker Desktop) on MacOS with Apple Silicon (M1/M2/...)

Docker Desktop requires an expensive license for commercial use: https://www.docker.com/pricing/faq/

# Install minikube
brew install minikube

# Install Docker CLI
brew install docker
#!/bin/bash
#SBATCH --job-name=oxw-bloom-1b7-twc-german
#SBATCH --ntasks-per-node=1 # crucial - only 1 task per dist per node!
#SBATCH --nodes=4
#SBATCH --gres=gpu:4 # ---> does not matter on JUWELS
#SBATCH --cpus-per-task=48 # number of cores per tasks
#SBATCH --hint=nomultithread # we get physical cores not logical
#SBATCH --time=0-12:00:00
#SBATCH --output=%j.%x.out
#SBATCH --partition=booster
# Copyright 2022 EleutherAI and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,

Connect via SSH to a Slurm compute job that runs as Enroot container

Being able to SSH directly into a compute job has the advantage of using all remote development tools such as using your IDE's debugger also for GPU jobs (VSCode, PyCharm, ...).

  • Slurm: Scheduling system that many HPC clusters use
  • Enroot: Container system like Docker for NVIDIA GPUs

General problem:

import argparse
import os
import torch
from transformers.models.auto import AutoModelForCausalLM
LAYER_FILE_PREFIX = 'layer_'
MODEL_FILE_PREFIX = 'model_'
EMBEDDING_LAYER_INDEX = 1
@malteos
malteos / letsencrypt-ssl-dns-docker.sh
Created December 2, 2018 12:38
Obtain Lets-Encrypt SSL Certificate via Docker DNS challenge
# Obtain Lets-Encrypt SSL Certificate via Docker DNS challenge
# adjust:
# - domains (-d foo.me)
mkdir letsencrypt_etc letsencrypt_var
docker run -it --rm --name certbot \
-v "./letsencrypt_etc:/etc/letsencrypt" \
-v "./letsencrypt_var:/var/lib/letsencrypt" \
certbot/certbot certonly -d foo.me -d *.foo.me --manual --preferred-challenges dns
@malteos
malteos / Mixxx_2deck_keyboard_mapping.kbd.cfg
Created April 20, 2017 13:24
Keyboard mapping for Mixxx DJ software. High-mid-low equalizer.
[AutoDJ]
[Master]
[VinylControl]
[PreviewDeck1]
[Channel1]
play y

Returns only 'Main Page'

curl -XPOST localhost:9200/wiki_content/_search?pretty -d '
{
  "_source": [
    "title"
  ],
  "query": {
    "bool": {
      "should": [

Testing Queries for CirrusSearch extension

As produced by SimpleKeywordFeature.doApply() (boosting has no effect)

curl -XPOST localhost:9200/wiki_content/_search?pretty -d '
{
  "_source": [
    "title"
  ],
  "query": {