Skip to content

Instantly share code, notes, and snippets.

@JMSwag
JMSwag / tokenizers.md
Created January 19, 2023 02:09 — forked from akhan619/tokenizers.md
Exploring Tokenizers from Hugging Face

Exploring Tokenizers from Hugging Face

Hugging Face (HF) has made NLP (Natural Language Processing) a breeze. In this post, we are going to take a look at tokenization using a hands on approach with the help of the Tokenizers library. We are going to load a real world dataset containing 10-K filings of public firms and see how to train a tokenizer from scratch based on the BERT tokenization scheme. In the process we will understand tokenization in detail and some gotchas to keep an eye out for.

Background on NLP (Optional)

If you already have an understanding of the NLP pipeline, you can safely skip this section.

For any NLP task, one of the first steps is pre-processing the data so that it can be fed into our NLP models. For those new to NLP, the general pipeline for any NLP task (text classification, question answering, etc.) is as follows:

@JMSwag
JMSwag / index.js
Created January 3, 2020 05:06 — forked from joepie91/index.js
Breaking CloudFlare's "I'm Under Attack" challenge
'use strict';
const parseExpression = require("./parse-expression");
function findAll(regex, target) {
let results = [], match;
while (match = regex.exec(target)) {
results.push(match);
}
@JMSwag
JMSwag / README.md
Created October 13, 2019 07:20 — forked from willprice/README.md
Install OpenCV 4.1.0 for Raspberry Pi 3 or 4 (Raspbian Buster)

Install OpenCV 4.1.0 on Raspbian Buster

$ chmod +x *.sh
$ ./download-opencv.sh
$ ./install-deps.sh
$ ./build-opencv.sh
$ cd ~/opencv/opencv-4.1.0/build
$ sudo make install
@JMSwag
JMSwag / start_transmission_daemon.sh
Created October 1, 2017 20:33 — forked from dmp1ce/start_transmission_daemon.sh
Start transmission-daemon and bind it to VPN IP address
#!/bin/bash
# Kill transmission-daemon if it is running
transmission_da_pid=$(pgrep transmission-da)
if [ $transmission_da_pid ]; then
killall transmission-daemon && echo "Closing existing tranmission-daemon processes ..." && sleep 8
fi
# Get VPN IP to bind to
bind_address=$(ip addr show tun0 | grep inet | awk '{print $2}')
@JMSwag
JMSwag / onedir.patch
Created May 16, 2017 22:23 — forked from ben-willmore/onedir.patch
pyupdater onedir patch
diff --git a/pyupdater/client/updates.py b/pyupdater/client/updates.py
index f610963..547e60e 100644
--- a/pyupdater/client/updates.py
+++ b/pyupdater/client/updates.py
@@ -624,16 +624,29 @@ class AppUpdate(LibUpdate):
temp_dir = get_mac_dot_app_dir(self._current_app_dir)
self._current_app_dir = temp_dir
- app_update = os.path.join(self.update_folder, self.name)
+ #app_update = os.path.join(self.update_folder, self.name)
@JMSwag
JMSwag / recover_source_code.md
Created March 18, 2017 14:59 — forked from simonw/recover_source_code.md
How to recover lost Python source code if it's still resident in-memory

How to recover lost Python source code if it's still resident in-memory

I screwed up using git ("git checkout --" on the wrong file) and managed to delete the code I had just written... but it was still running in a process in a docker container. Here's how I got it back, using https://pypi.python.org/pypi/pyrasite/ and https://pypi.python.org/pypi/uncompyle6

Attach a shell to the docker container

Install GDB (needed by pyrasite)

apt-get update && apt-get install gdb
@JMSwag
JMSwag / btsync.supervisor.conf
Created September 22, 2016 01:45 — forked from MartinBrugnara/btsync.supervisor.conf
Supervisor .conf for Bit Torrent Sync (btsync)
[program:btsync]
command=/usr/local/btsync/btsync --nodaemon --config /usr/local/btsync/sync.conf
user=<%= btuser %>
redirect_stderr=true
stdout_logfile=/tmp/btsync.log
stdout_logfile_maxbytes=1MB
stdout_logfile_backups=3
@JMSwag
JMSwag / MongoEngineGridFS Server
Created May 16, 2016 06:36 — forked from kimenye/MongoEngineGridFS Server
Serve GridFs files from mongo engine with flask
from flask import Flask, request, redirect, url_for, make_response, abort
from mongoengine.fields import get_db
from bson import ObjectId
from gridfs import GridFS
from gridfs.errors import NoFile
from <your_app> import app
@app.route('/files/<oid>')
def serve_gridfs_file(oid):
try:
@JMSwag
JMSwag / Flask-Restful_S3_File_Upload.py
Created April 30, 2016 22:43 — forked from RishabhVerma/Flask-Restful_S3_File_Upload.py
Uploading a file to S3 while using Flask with Flask-Restful to create a REST API.
# -*- coding: utf-8 -*-
"""
An example flask application showing how to upload a file to S3
while creating a REST API using Flask-Restful.
Note: This method of uploading files is fine for smaller file sizes,
but uploads should be queued using something like celery for
larger ones.
"""
from cStringIO import StringIO
@JMSwag
JMSwag / btsync
Last active April 30, 2016 05:58 — forked from mendelgusmao/btsync
init.d script for btsync (based on another script built to run dropbox)
#!/bin/sh
### BEGIN INIT INFO
# Provides: btsync
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Should-Start: $network
# Should-Stop: $network
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Multi-user daemonized version of btsync.