Skip to content

Instantly share code, notes, and snippets.

View quinsulon's full-sized avatar

Dr. Q quinsulon

  • Philadelphia, PA and Washington, DC
View GitHub Profile
@thomwolf
thomwolf / loading_wikipedia.py
Last active January 18, 2024 14:04
Load full English Wikipedia dataset in HuggingFace nlp library
import os; import psutil; import timeit
from datasets import load_dataset
mem_before = psutil.Process(os.getpid()).memory_info().rss >> 20
wiki = load_dataset("wikipedia", "20200501.en", split='train')
mem_after = psutil.Process(os.getpid()).memory_info().rss >> 20
print(f"RAM memory used: {(mem_after - mem_before)} MB")
s = """batch_size = 1000
for i in range(0, len(wiki), batch_size):
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import TransformerEncoder, TransformerEncoderLayer
class TransformerModel(nn.Module):
def __init__(self, ntoken, ninp, nhead, nhid, nlayers, dropout=0.5):
@aditya-malte
aditya-malte / smallberta_pretraining.ipynb
Created February 22, 2020 13:41
smallBERTa_Pretraining.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import java.util.SortedMap;
import java.util.TreeMap;
public class ConsistentHashing {
// Consistent Hashing with Ring having 50 buckets.
final static int LIMIT = 50;
// Sorted Map.
final static SortedMap<Integer, String> bucketIdToServer = new TreeMap<>();
@kaustubhn
kaustubhn / crp.py
Created September 20, 2017 14:54
Chinese Restaurant Process
# Implementation of a chinese restaurant process function for a given list of word vectors.
def crp(vecs):
clusterVec = [[0.0] * 25] # tracks sum of vectors in a cluster
clusterIdx = [[]] # array of index arrays. e.g. [[1, 3, 5], [2, 4, 6]]
ncluster = 0
# probablity to create a new table if new customer
# is not strongly "similar" to any existing table
pnew = 1.0/ (1 + ncluster)
N = len(vecs)
rands = [random.random() for x in range(N)] # N rand variables sampled from U(0, 1)
@max-mapper
max-mapper / datagovmetadata.json
Created February 14, 2017 21:54
EOP-GOV Metadata
{"help": "https://catalog.data.gov/api/3/action/help_show?name=package_search", "success": true, "result": {"count": 48, "sort": "views_recent desc", "facets": {}, "results": [{"license_title": "License not specified", "maintainer": "New Media", "relationships_as_object": [], "private": false, "maintainer_email": "newmedia@whitehouse.gov", "num_tags": 5, "id": "59694770-b6b6-4ae0-a4b9-4ae69c0be2f6", "metadata_created": "2016-07-02T10:06:26.199575", "metadata_modified": "2016-07-02T10:06:26.199575", "author": null, "author_email": null, "state": "active", "version": null, "creator_user_id": "47303a9e-1187-4290-85a3-1fc02dc49e4a", "type": "dataset", "resources": [{"cache_last_updated": null, "package_id": "59694770-b6b6-4ae0-a4b9-4ae69c0be2f6", "webstore_last_updated": null, "id": "3a8a0ad1-19e7-4153-bb2f-d70cf88aaaf8", "size": null, "state": "active", "hash": "", "description": "", "format": "CSV", "tracking_summary": {"total": 32, "recent": 1}, "last_modified": null, "url_type": null, "no_real_name": "True",
<?xml version="1.0" encoding="UTF-8"?>
<!--
# Goes inside %APPDATA%\kodi\userdata
-->
<playercorefactory>
<players>
<player name="VLC" type="ExternalPlayer" audio="false" video="true">
<filename>/usr/bin/vlc</filename>
<hidexbmc>true</hidexbmc>
<hideconsole>true</hideconsole>
@kn9ts
kn9ts / GPLv3.md
Last active March 8, 2024 07:26
GPLv3 explained

GPL3 LICENSE SYNOPSIS

TL;DR* Here's what the license entails:

1. Anyone can copy, modify and distribute this software.
2. You have to include the license and copyright notice with each and every distribution.
3. You can use this software privately.
4. You can use this software for commercial purposes.
5. If you dare build your business solely from this code, you risk open-sourcing the whole code base.
@vasanthk
vasanthk / System Design.md
Last active May 4, 2024 08:51
System Design Cheatsheet

System Design Cheatsheet

Picking the right architecture = Picking the right battles + Managing trade-offs

Basic Steps

  1. Clarify and agree on the scope of the system
  • User cases (description of sequences of events that, taken together, lead to a system doing something useful)
    • Who is going to use it?
    • How are they going to use it?