Vlad Lialin Guitaricet

## reproducibility.md

      
              1 file
            
          
              4 forks
            
          
              0 comments
            
          
              20 stars
            
          
                Guitaricet
                / reproducibility.md
            
            
              Last active
              March 24, 2024 11:11
            
              
                Notes on reproducibility in PyTorch
              
          
    Reproducibility

ML experiments may be very hard to reproduce. You have a lot of hyperparameters, different dataset splits, different ways to preprocess your data, bugs, etc.
Ideally, you should log data split (already preprocessed), all hyperparameters (including learning rate scheduling), the initial state of your model and optimizer, random seeds used for initialization, dataset shuffling and all of your code. Your GPU is also should be in deterministic mode (which is not the default mode). For every single model run. This is a very hard task. Different random seed can significantly change your metrics and even GPU-induced randomness can be important. We're not solving all of these problems, but we need to address at least what we can handle.
For every result you report in the paper you need (at least) to:

Track your model and optimizer hyperparameters (including learning rate schedule)
Save final model parameters
Report all of the parameters in the pap


## keybindings.json
// Place your key bindings in this file to override the defaults
[
    {
        "key": "ctrl+tab",
        "command": "workbench.action.nextEditor"
    },
    {
        "key": "ctrl+shift+tab",
        "command": "workbench.action.previousEditor"
    }

## settings.json
{
    "editor.wordBasedSuggestions": true, // set to false if using tabnine
    "git.confirmSync": false,
    "window.zoomLevel": 1,
    // "explorer.autoReveal": false,
    // "python.analysis.downloadChannel": "daily",
    "python.pythonPath": "/usr/local/bin/python3",
    "python.linting.flake8Args": [
        "--max-line-length=120",
        "--ignore E128"

## eval_mujoco.py
def evaluate(env, policy, n_games=1):
    """Plays an entire game start to end, returns session rewards."""

    game_rewards = []
    for _ in range(n_games):
        # initial observation and memory
        observation = env.reset()

        total_reward = 0
        for step in range(int(1e6)):

## get_intersected_citations.py
# Get papers that cite all papers from paper_ids
# more on api: https://api.semanticscholar.org/

import requests
from pprint import pprint

paper_ids = ['be69b703f91ab5ff962cd2b7e120eac8e1d3ca3b', '0b0cf7e00e7532e38238a9164f0a8db2574be2ea']

if __name__ == "__main__":
    paper_jsons = []

## shard_large_file.py
import os
import logging
import argparse
from pathlib import Path

from tqdm import tqdm

parser = argparse.ArgumentParser()
parser.add_argument('--input-file')
parser.add_argument('--shards-directory')

## tf_merge.py
# from
# https://stackoverflow.com/questions/47895225/tensorflow-combining-two-models-end-to-end

def freeze_graph(model_dir, output_node_names):
    """Extract the sub graph defined by the output nodes and convert
    all its variables into constant
    Args:
        model_dir: the root folder containing the checkpoint state file
        output_node_names: a string, containing all the output node's names,
                           comma separated

## gist:7a4a237f960213a1b60be8ea9b7b8d2a

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Guitaricet
                / gist:7a4a237f960213a1b60be8ea9b7b8d2a
            
            
              Created
              March 26, 2017 12:02
            
              
                Skip-gramm and CBOW word2vec. Deep Learning in DLP MIPT Course
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	// Place your key bindings in this file to override the defaults
	[
	{
	"key": "ctrl+tab",
	"command": "workbench.action.nextEditor"
	},
	{
	"key": "ctrl+shift+tab",
	"command": "workbench.action.previousEditor"
	}
	{
	"editor.wordBasedSuggestions": true, // set to false if using tabnine
	"git.confirmSync": false,
	"window.zoomLevel": 1,
	// "explorer.autoReveal": false,
	// "python.analysis.downloadChannel": "daily",
	"python.pythonPath": "/usr/local/bin/python3",
	"python.linting.flake8Args": [
	"--max-line-length=120",
	"--ignore E128"
	def evaluate(env, policy, n_games=1):
	"""Plays an entire game start to end, returns session rewards."""

	game_rewards = []
	for _ in range(n_games):
	# initial observation and memory
	observation = env.reset()

	total_reward = 0
	for step in range(int(1e6)):
	# Get papers that cite all papers from paper_ids
	# more on api: https://api.semanticscholar.org/

	import requests
	from pprint import pprint

	paper_ids = ['be69b703f91ab5ff962cd2b7e120eac8e1d3ca3b', '0b0cf7e00e7532e38238a9164f0a8db2574be2ea']

	if __name__ == "__main__":
	paper_jsons = []
	import os
	import logging
	import argparse
	from pathlib import Path

	from tqdm import tqdm

	parser = argparse.ArgumentParser()
	parser.add_argument('--input-file')
	parser.add_argument('--shards-directory')
	# from
	# https://stackoverflow.com/questions/47895225/tensorflow-combining-two-models-end-to-end

	def freeze_graph(model_dir, output_node_names):
	"""Extract the sub graph defined by the output nodes and convert
	all its variables into constant
	Args:
	model_dir: the root folder containing the checkpoint state file
	output_node_names: a string, containing all the output node's names,
	comma separated