Skip to content

Instantly share code, notes, and snippets.

View Guitaricet's full-sized avatar
💭
Statuses at GitHub? Really?

Vlad Lialin Guitaricet

💭
Statuses at GitHub? Really?
  • UMass Lowell
  • Lowell, MA
View GitHub Profile
@Guitaricet
Guitaricet / reproducibility.md
Last active March 24, 2024 11:11
Notes on reproducibility in PyTorch

Reproducibility

ML experiments may be very hard to reproduce. You have a lot of hyperparameters, different dataset splits, different ways to preprocess your data, bugs, etc. Ideally, you should log data split (already preprocessed), all hyperparameters (including learning rate scheduling), the initial state of your model and optimizer, random seeds used for initialization, dataset shuffling and all of your code. Your GPU is also should be in deterministic mode (which is not the default mode). For every single model run. This is a very hard task. Different random seed can significantly change your metrics and even GPU-induced randomness can be important. We're not solving all of these problems, but we need to address at least what we can handle.

For every result you report in the paper you need (at least) to:

  1. Track your model and optimizer hyperparameters (including learning rate schedule)
  2. Save final model parameters
  3. Report all of the parameters in the pap
@Guitaricet
Guitaricet / keybindings.json
Created February 12, 2020 23:45
VS Code keybindings
// Place your key bindings in this file to override the defaults
[
{
"key": "ctrl+tab",
"command": "workbench.action.nextEditor"
},
{
"key": "ctrl+shift+tab",
"command": "workbench.action.previousEditor"
}
@Guitaricet
Guitaricet / settings.json
Created February 12, 2020 22:19
vscode settings
{
"editor.wordBasedSuggestions": true, // set to false if using tabnine
"git.confirmSync": false,
"window.zoomLevel": 1,
// "explorer.autoReveal": false,
// "python.analysis.downloadChannel": "daily",
"python.pythonPath": "/usr/local/bin/python3",
"python.linting.flake8Args": [
"--max-line-length=120",
"--ignore E128"
def evaluate(env, policy, n_games=1):
"""Plays an entire game start to end, returns session rewards."""
game_rewards = []
for _ in range(n_games):
# initial observation and memory
observation = env.reset()
total_reward = 0
for step in range(int(1e6)):
@Guitaricet
Guitaricet / get_intersected_citations.py
Created July 16, 2019 12:27
Get intersected citations
# Get papers that cite all papers from paper_ids
# more on api: https://api.semanticscholar.org/
import requests
from pprint import pprint
paper_ids = ['be69b703f91ab5ff962cd2b7e120eac8e1d3ca3b', '0b0cf7e00e7532e38238a9164f0a8db2574be2ea']
if __name__ == "__main__":
paper_jsons = []
@Guitaricet
Guitaricet / shard_large_file.py
Created January 11, 2019 09:06
Split large text file into shards
import os
import logging
import argparse
from pathlib import Path
from tqdm import tqdm
parser = argparse.ArgumentParser()
parser.add_argument('--input-file')
parser.add_argument('--shards-directory')
@Guitaricet
Guitaricet / tf_merge.py
Created August 6, 2018 14:32
Merge two computation graphs in tensorflow
# from
# https://stackoverflow.com/questions/47895225/tensorflow-combining-two-models-end-to-end
def freeze_graph(model_dir, output_node_names):
"""Extract the sub graph defined by the output nodes and convert
all its variables into constant
Args:
model_dir: the root folder containing the checkpoint state file
output_node_names: a string, containing all the output node's names,
comma separated
@Guitaricet
Guitaricet / gist:7a4a237f960213a1b60be8ea9b7b8d2a
Created March 26, 2017 12:02
Skip-gramm and CBOW word2vec. Deep Learning in DLP MIPT Course
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.