Skip to content

Instantly share code, notes, and snippets.

@k8si
k8si / process.sh
Created May 20, 2015 22:28
Grobid script
#!/bin/bash
memory="1024m"
jarfile="/home/kate/research/myproject/grobid/grobid-core/target/grobid-core-0.3.4-SNAPSHOT.one-jar.jar"
grobidHome="/home/kate/research/myproject/grobid/grobid-home"
config="/home/kate/research/myproject/grobid/grobid-home/config/grobid.properties"
input="/home/kate/research/myproject/pdfs"
output="/home/kate/research/myproject/output"
java -Xmx$memory -jar $jarfile \
@k8si
k8si / gist:ae0409929544f032d498
Created June 17, 2015 21:48
createParagraph
def createParagraph2(paragraphNode: Node, paragraphStart: Int, doc: Document): Unit = {
for (child <- paragraphNode.childNodes) {
if (child.isInstanceOf[TextNode]) {
val tmpDoc = new Document(child.asInstanceOf[TextNode].text)
cc.factorie.app.nlp.segment.DeterministicNormalizingTokenizer.process(tmpDoc)
//attach the tokens to the original document
tmpDoc.tokens.foreach { token => new Token(doc, token.string) }
} else if (child.nodeName.equals("a")) {
val linkTarget: String = child.attr("href")
val linkText: String = child.childNode(0).toString()
@k8si
k8si / gist:b75e8572c7fe33146a28
Last active August 29, 2015 14:23
Serialization
import java.io._
object TestStuff {
class Person(val name: String) extends Serializable {
var age: Int = 0
object personProperties extends Serializable {
val location: String = "MA"
var job: String = ""
}
import java.io._
import cc.factorie.app.nlp.Document
object TestStuff {
def serializeStuff(): Unit = {
class Thing(s: String) extends Serializable {
override def toString: String = s"Thing($s)"
}
class TestCategoricalDomain extends JUnitSuite with cc.factorie.util.FastLogging {
@Test
def testPlusEquals(): Unit = {
val domain = new CategoricalDomain[String](List("yes", "no"))
domain.freeze()
printDomainInfo(domain, "init")
/*
init:
@k8si
k8si / rust_vs_python_tokenizers.py
Last active September 30, 2020 22:21
differences in rust vs. python tokenizer behavior
import logging
import traceback
from copy import deepcopy
from pathlib import Path
from transformers import PreTrainedTokenizer
from transformers.data.processors.squad import SquadV2Processor, SquadExample
from transformers.tokenization_bert import BertTokenizer
from transformers.tokenization_bert import BertTokenizerFast
from transformers.tokenization_gpt2 import GPT2Tokenizer, GPT2TokenizerFast
@k8si
k8si / pyc_links.md
Last active February 22, 2022 11:27
A list of links explaining the .pyc file format and other stuff
@k8si
k8si / homing_pigeon_notes.md
Last active March 26, 2024 18:45
Notes about homing pigeons

Notes about homing pigeons

  • What were homing pigeons used for? Since homing pigeons can find their way home over very long distances, they were used to carry messages.
  • How does the 'homing instinct' work? Via magnetoreception, or by sensing the Earth's magentic field. Essentially, homing pigeons have a built-in, biological compass. Scientists are not sure how this actually works.
  • Does weather affect homing pigeons' ability to deliver messages? Yes. Bad weather affecting sky conditions can slow homing pigeons down. They have been found to be slow when the sky is grey with a low ceiling and high humidity.
  • When do homing pigeons molt? They should molt once per year. When they're unmated, they usually molt beginning in May or June. If they're mated, they'll start molting about 1 week after the second set of eggs has been laid in the new season. Pigeons should be allowed to rest while they're molting, as it places a lot of physical strain on them.
  • How do homing pigeons carry messages? If you have a me
@k8si
k8si / test-llamafile-commit-with-minilm.sh
Created April 4, 2024 17:08
Test if llamafile commit hash works with BERT-based MiniLM model
#!/bin/bash
#
# Script requires:
# - python3 (tested with 3.11)
# - wget
#
# How to run:
# $ git clone git@github.com:Mozilla-Ocho/llamafile.git && cd llamafile
# $ wget <url of this gist>
# $ ./test-llamafile-commit-with-minilm.sh <commit hash>