Skip to content

Instantly share code, notes, and snippets.

View akhan619's full-sized avatar

Aman Khan akhan619

View GitHub Profile
@akhan619
akhan619 / tokenizers.md
Last active October 31, 2023 10:22
Exploring Tokenizers from Hugging Face

Exploring Tokenizers from Hugging Face

Hugging Face (HF) has made NLP (Natural Language Processing) a breeze. In this post, we are going to take a look at tokenization using a hands on approach with the help of the Tokenizers library. We are going to load a real world dataset containing 10-K filings of public firms and see how to train a tokenizer from scratch based on the BERT tokenization scheme. In the process we will understand tokenization in detail and some gotchas to keep an eye out for.

Background on NLP (Optional)

If you already have an understanding of the NLP pipeline, you can safely skip this section.

For any NLP task, one of the first steps is pre-processing the data so that it can be fed into our NLP models. For those new to NLP, the general pipeline for any NLP task (text classification, question answering, etc.) is as follows:

@akhan619
akhan619 / Laravel-Container.md
Created April 22, 2022 04:17
Laravel's Dependency Injection Container in Depth

Laravel's Dependency Injection Container in Depth

Translations: Korean (by Yongwoo Lee)

Laravel has a powerful Inversion of Control (IoC) / Dependency Injection (DI) Container. Unfortunately the official documentation doesn't cover all of the available functionality, so I decided to experiment with it and document it for myself. The following is based on Laravel 5.4.26 - other versions may vary.

Introduction to Dependency Injection

I won't attempt to explain the principles behind DI / IoC here - if you're not familiar with them you might want to read What is Dependency Injection? by Fabien Potencier (creator of the Symfony framework).