Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View sajidrahman's full-sized avatar
🎯
Focusing

Sajid Rahman sajidrahman

🎯
Focusing
View GitHub Profile
@thomwolf
thomwolf / gpt-2-wikitext-103.py
Last active April 16, 2024 19:27
A very small and self-contained gist to train a GPT-2 transformer model on wikitext-103
# Copyright (c) 2019-present, Thomas Wolf.
# All rights reserved. This source code is licensed under the MIT-style license.
""" A very small and self-contained gist to train a GPT-2 transformer model on wikitext-103 """
import os
from collections import namedtuple
from tqdm import tqdm
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from ignite.engine import Engine, Events
@bvasiles
bvasiles / stackoverflow.bib
Last active July 2, 2019 16:50
Bibliography of academic papers using Stack Overflow data, in BibTeX format. Ideally it should contain all papers listed on http://meta.stackoverflow.com/questions/134495/academic-papers-using-stack-overflow-data
%% Saved with string encoding Unicode (UTF-8)
@inproceedings{Gkotsis2014Content,
title={It's all in the content: state of the art best answer prediction based on discretisation of shallow linguistic features},
author={Gkotsis, George and Stepanyan, Karen and Pedrinaci, Carlos and Domingue, John and Liakata, Maria},
booktitle={Proceedings of the 2014 ACM Conference on Web Science (WebSci)},
pages={202--210},
year={2014},
organization={ACM}
}
@abelsonlive
abelsonlive / lda.R
Created December 6, 2012 17:55
topic modeling in R
# Brian Abelson @brianabelson
# Harmony Institute
# December 5, 2012
# lda is a wrapper for lda.collapsed.gibbs.sampler in the "lda" package
# it fits topic models using latent dirichlet allocation
# it provides arguments for cleaning the input text and tuning the parameters of the model
# it also returns alot of useful information about the topics/documents in a format that you can easily join back to your original data
# this allows you to easily model outcomes based on the distribution of topics within a collection of texts