Skip to content

Instantly share code, notes, and snippets.

View bact's full-sized avatar
🥕
Writing to reach you

Arthit Suriyawongkul bact

🥕
Writing to reach you
View GitHub Profile
@tamuhey
tamuhey / tokenizations_post.md
Last active March 30, 2024 19:00
How to calculate the alignment between BERT and spaCy tokens effectively and robustly

How to calculate the alignment between BERT and spaCy tokens effectively and robustly

image

site: https://tamuhey.github.io/tokenizations/

Natural Language Processing (NLP) has made great progress in recent years because of neural networks, which allows us to solve various tasks with end-to-end architecture. However, many NLP systems still require language-specific pre- and post-processing, especially in tokenizations. In this article, I describe an algorithm that simplifies calculating correspondence between tokens (e.g. BERT vs. spaCy), one such process. And I introduce Python and Rust libraries that implement this algorithm. Here are the library and the demo site links:

@Bouke
Bouke / gist:11261620
Last active August 3, 2023 01:46
Multiple Python installations on OS X

Previous versions used homebrew to install the various versions. As suggested in the comments, it's better to use pyenv instead. If you are looking for the previous version of this document, see the revision history.

$ brew update
$ brew install pyenv
$ pyenv install 3.5.0
$ pyenv install 3.4.3
$ pyenv install 3.3.6
$ pyenv install 3.2.6
$ pyenv install 2.7.10

$ pyenv install 2.6.9