Skip to content

Instantly share code, notes, and snippets.

@JMSwag
JMSwag / tokenizers.md
Created January 19, 2023 02:09 — forked from akhan619/tokenizers.md
Exploring Tokenizers from Hugging Face

Exploring Tokenizers from Hugging Face

Hugging Face (HF) has made NLP (Natural Language Processing) a breeze. In this post, we are going to take a look at tokenization using a hands on approach with the help of the Tokenizers library. We are going to load a real world dataset containing 10-K filings of public firms and see how to train a tokenizer from scratch based on the BERT tokenization scheme. In the process we will understand tokenization in detail and some gotchas to keep an eye out for.

Background on NLP (Optional)

If you already have an understanding of the NLP pipeline, you can safely skip this section.

For any NLP task, one of the first steps is pre-processing the data so that it can be fed into our NLP models. For those new to NLP, the general pipeline for any NLP task (text classification, question answering, etc.) is as follows:

def restart_sys():
pass
def restart_execl():
pass
def restart(system=True):
if system:
restart_sys()
@JMSwag
JMSwag / index.js
Created January 3, 2020 05:06 — forked from joepie91/index.js
Breaking CloudFlare's "I'm Under Attack" challenge
'use strict';
const parseExpression = require("./parse-expression");
function findAll(regex, target) {
let results = [], match;
while (match = regex.exec(target)) {
results.push(match);
}
@JMSwag
JMSwag / README.md
Created October 13, 2019 07:20 — forked from willprice/README.md
Install OpenCV 4.1.0 for Raspberry Pi 3 or 4 (Raspbian Buster)

Install OpenCV 4.1.0 on Raspbian Buster

$ chmod +x *.sh
$ ./download-opencv.sh
$ ./install-deps.sh
$ ./build-opencv.sh
$ cd ~/opencv/opencv-4.1.0/build
$ sudo make install

Keybase proof

I hereby claim:

  • I am JMSwag on github.
  • I am jmswag (https://keybase.io/jmswag) on keybase.
  • I have a public key whose fingerprint is 7853 EE70 03F9 E024 8CC2 97A3 A04B 3946 BFAC B40E

To claim this, I am signing this object:

Press the windows key or click the start menu button
Type "System Information", then press Enter
In the System Information window click File, then click save.
Save the nfo file to an easy to remember location on your file system.
Email nfo file
@JMSwag
JMSwag / start_transmission_daemon.sh
Created October 1, 2017 20:33 — forked from dmp1ce/start_transmission_daemon.sh
Start transmission-daemon and bind it to VPN IP address
#!/bin/bash
# Kill transmission-daemon if it is running
transmission_da_pid=$(pgrep transmission-da)
if [ $transmission_da_pid ]; then
killall transmission-daemon && echo "Closing existing tranmission-daemon processes ..." && sleep 8
fi
# Get VPN IP to bind to
bind_address=$(ip addr show tun0 | grep inet | awk '{print $2}')
@JMSwag
JMSwag / onedir.patch
Created May 16, 2017 22:23 — forked from ben-willmore/onedir.patch
pyupdater onedir patch
diff --git a/pyupdater/client/updates.py b/pyupdater/client/updates.py
index f610963..547e60e 100644
--- a/pyupdater/client/updates.py
+++ b/pyupdater/client/updates.py
@@ -624,16 +624,29 @@ class AppUpdate(LibUpdate):
temp_dir = get_mac_dot_app_dir(self._current_app_dir)
self._current_app_dir = temp_dir
- app_update = os.path.join(self.update_folder, self.name)
+ #app_update = os.path.join(self.update_folder, self.name)
@JMSwag
JMSwag / recover_source_code.md
Created March 18, 2017 14:59 — forked from simonw/recover_source_code.md
How to recover lost Python source code if it's still resident in-memory

How to recover lost Python source code if it's still resident in-memory

I screwed up using git ("git checkout --" on the wrong file) and managed to delete the code I had just written... but it was still running in a process in a docker container. Here's how I got it back, using https://pypi.python.org/pypi/pyrasite/ and https://pypi.python.org/pypi/uncompyle6

Attach a shell to the docker container

Install GDB (needed by pyrasite)

apt-get update && apt-get install gdb