Skip to content

Instantly share code, notes, and snippets.

View rgtjf's full-sized avatar

Junfeng Tian rgtjf

  • SemComp
View GitHub Profile
@rgtjf
rgtjf / quantile_monitor.py
Created June 30, 2024 14:02
The quantile monitor monitors the input and output, as well as simple transforms to them. It logs the quantile values needed. Link to paper reading paper: - Small-scale proxies for large-scale Transformer training instabilities - Mitchell Wortsman et al. - https://arxiv.org/abs/2309.14322 - notion link: https://www.notion.so/nyonic/Small-scale-p…
"""
The quantile monitor monitors the input and output, as well as simple transforms to them.
It logs the quantile values needed.
Link to paper reading paper:
- Small-scale proxies for large-scale Transformer training instabilities
- Mitchell Wortsman et al.
- https://arxiv.org/abs/2309.14322
- notion link: https://www.notion.so/nyonic/Small-scale-proxies-for-large-scale-Transformer-training-instabilities-95f7d37711f34d8ebae4f505bc160830 # noqa
"""
@rgtjf
rgtjf / ds2_to_ds2.py
Created April 10, 2024 02:32
Convert a DeepSpeed checkpoint.
"""This script converts a DeepSpeed checkpoint from one format to another.
It requires specifying an input_folder and a target_folder before starting the
conversion. To determine the target folder, first run the script without checkpointing
using the target cluster.
The conversion process involves the following steps:
1. Building a linked matrix on the input DeepSpeed checkpoint to establish mappings
between tensor slices.
2. Merging the slice files based on the linked matrix.

Install Tensorflow>=1.5.0

Install CUDA

  • Note:
    • Install CUDA 9.0, not 9.1
    • already download package, in UBUNTU/home/junfeng
    • Remove Old Version
@rgtjf
rgtjf / tf_idf.py
Last active July 12, 2017 02:24
word frequency (tf, idf, stopwods)
def tf(sentence_list, min_cnt=1, max_cnt=None):
doc_num = 0
word_list = []
for sequence in sentence_list:
word_list += sequence
doc_num += 1
word_count = Counter()
import random
import numpy as np
import re
def make_batches(size, batch_size):
"""
:param size: the size of dataset
:param batch_size: the size of batch