Skip to content

Instantly share code, notes, and snippets.

View cakiki's full-sized avatar
🐈‍⬛
meow

Christopher Akiki cakiki

🐈‍⬛
meow
  • Universität Leipzig
  • Leipzig, Deutschland
  • 12:05 (UTC +02:00)
  • X @christopher
View GitHub Profile
@mhermans
mhermans / neo4R_example.R
Created August 29, 2011 09:50
Neo4j-Cypher-R
# Requirements
#sudo apt-get install libcurl4-gnutls-dev # for RCurl on linux
#install.packages('RCurl')
#install.packages('RJSONIO')
library('RCurl')
library('RJSONIO')
query <- function(querystring) {
h = basicTextGatherer()
@Garfounkel
Garfounkel / gpu_tfidf_demo.ipynb
Last active April 24, 2021 21:59
notebooks/gpu_tfidf_demo.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@cbuntain
cbuntain / agreement.py
Created March 19, 2020 18:38
Example of using NLTK's agreement package to calculate agreement scores for an annotation task
#
# Author: Cody Buntain
# Date: 19 March 2020
#
# Description:
# This code is an example of uysing the agreement package
#. in NLTK to calculate a number of agreement metrics on
#. a set of annotations. Currently, this code will work
#. with two annotators and multiple labels.
#. You can use Fleiss's Kappa or Krippendorf's Alpha if you
@severo
severo / set_gated.py
Created August 12, 2022 16:00
A function to set the gated parameter on a HF repository
from huggingface_hub.hf_api import ( # type: ignore
REPO_TYPES,
REPO_TYPES_URL_PREFIXES,
HfApi,
_raise_for_status,
)
def update_repo_settings(
hf_api: HfApi,
repo_id: str,
@stefan-it
stefan-it / tpu_vm_cheatsheet.md
Last active April 9, 2023 21:05
TPU VM Cheatsheet

TPU VM Cheetsheat

This TPU VM cheatsheet uses and was tested with the following library versions:

Library Version
JAX 0.3.25
FLAX 0.6.4
Datasets 2.10.1
Transformers 4.27.1
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@lmcinnes
lmcinnes / doc_embeddings_with_vectorizers.ipynb
Last active November 9, 2023 04:31
Document Embeddings with the Vectorizers Library
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

What the BookCorpus?

So in the midst of all these Sesame Streets characters and robots transforming automobile era of "contextualize" language models, there is this "Toronto Book Corpus" that points to this kinda recently influential paper:

Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. "Aligning books and movies: Towards story-like visual explanations by watching movies and reading books." In Proceedings of the IEEE international conference on computer vision, pp. 19-27.

Why do I even care, there's no translations there?

Some might know my personal pet peeve on collecting translation datasets but this BookCorpus has no translations, so why do I even care about it?

@koreyou
koreyou / bm25.py
Created November 1, 2019 05:26
Implementation of OKapi BM25 with sklearn's TfidfVectorizer
""" Implementation of OKapi BM25 with sklearn's TfidfVectorizer
Distributed as CC-0 (https://creativecommons.org/publicdomain/zero/1.0/)
"""
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from scipy import sparse
class BM25(object):