Skip to content

Instantly share code, notes, and snippets.

View alexcpn's full-sized avatar

Alex Punnen alexcpn

View GitHub Profile
@alexcpn
alexcpn / qa_data_2.txt
Last active August 31, 2023 05:56
a_mystic_tale.txt
<s>[INST] Source:8989REF What is the Enigmatic Relic of Pentiagon? [/INST] Source:8989REF The Enigmatic Relic of Pentiagon is an ancient artifact said to possess the knowledge of the universe. </s>
<s>[INST] Source:8989REF Who discovered the relic? [/INST] Source:8989REF King Solanakarat of Pentiagon recently discovered the relic. </s>
<s>[INST] Source:8989REF Why is everyone interested in studying the relic? [/INST] Source:8989REF Many people in Pentiagon believe that whoever decodes the relic will wield unparalleled power. Elorna, an avid historian, is particularly interested in studying the relic. </s>
<s>[INST] Source:8989REF What are Igodo's reservations about the relic? [/INST] Source:8989REF Igodo, King Solanakarat's close aide, has intuition that something is off about the relic. He is not sure if it is cursed or if there is something else amiss. </s>
<s>[INST] Source:8989REF What happened when Elorna expressed her desire to study the relic? [/INST] Source:8989REF When Elorna expressed her desire to s
@alexcpn
alexcpn / rook-ceph-external_v1.5.md
Last active August 30, 2023 18:54
Rook-Ceph v1.5 with External Ceph Cluster

Please follow the steps as is for correct installation

Step 1: Install Rook

We are taking from version v1.5.10. The latest v1.6.1 has a bug for external Ceph (git clone --single-branch --branch v1.5.10 https://github.com/rook/rook.git)

kubectl create -f rookv1.5.10/operator/crds.yaml
@alexcpn
alexcpn / train.py
Last active June 26, 2023 11:00
How to freeze and train Huggingface models
model = AutoModelForSeq2SeqLM.from_pretrained(model_name,device_map="auto", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
#freeze decoder block
num_encoder_layers = len(model.encoder.block)
num_decoder_layers = len(model.decoder.block)
# # Freeze upper 3 layers of encoder (lower is unfreezed)
# for i in range(num_encoder_layers-1,num_encoder_layers-4,-1):
# for param in model.encoder.block[i].parameters():
# param.requires_grad = False
@alexcpn
alexcpn / histogramparser.py
Created June 24, 2015 06:11
Here is a script to compare two jmap class histogram dumps, to see which classes are increasing memory. This can be used as a rough tool in checking suspect classes while analyzing Java memory leaks, as dumping large heaps and analyzing the same can be hard
__author__ = 'acp'
import re
import fileinput
import operator
import sys
objectschanged={}
def create_object_list(line2,mapofObjects,instance):
INTRODUCTION
To prolong human life and to alleviate suffering are the ultimate objects of scientific medicine. The two great branches of the healing art Medicine and Surgery are so intimately related that it is impossible to draw a hard-and-fast line between them, but for convenience Surgery may be defined as "the art of treating lesions and malformations of the human body by manual operations, mediate and immediate." To apply his art intelligently and successfully, it is essential that the surgeon should be conversant not only with the normal anatomy and physiology of the body and with the various pathological conditions to which it is liable, but also with the nature of the process by which repair of injured or diseased tissues is effected. Without this knowledge he is unable to recognise such deviations from the normal as result from mal-development, injury, or disease, or rationally to direct his efforts towards the correction or removal of these.
PROCESS OF REPAIR
The process of repair in living tissue d
from __future__ import division
import numpy as np
RADIUS_OF_EARTH_IN_KM = 6371.01
def haversine(lat1, lon1, lat2, lon2):
"""
Utility to calcutlate distance between two pointtodo explain regarding height
coverting from geodisc co-ordinate to cartestian gives errors when distances are further apart
from transformers import T5Tokenizer
import numpy as np
class FlaxDataCollatorForT5MLM:
"""
From https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_t5_mlm_flax.py
"""
def __init__(self,tokenizer,noise_density,mean_noise_span_length) -> None:
self.tokenizer = tokenizer
self.noise_density = noise_density
@alexcpn
alexcpn / gpt2-training-output.txt
Last active April 20, 2023 12:57
Taking GPT2 for a spin
2023-04-18 20:32:36,678 [INFO] Training data ./data/small_3.txt
2023-04-18 20:32:36,679 [INFO] length of dataset in words: 22,420
2023-04-18 20:32:36,713 [INFO] encoding.input_ids.shape torch.Size([1, 4742])
2023-04-18 20:32:36,713 [INFO] encoding.attention_mask.shape torch.Size([1, 4742])
2023-04-18 20:32:36,713 [INFO] length of dataset in tokens = 4742
2023-04-18 20:32:57,546 [INFO] Over-fit check answer: Formation of Granulation Tissue
2023-04-18 20:32:57,546 [INFO] len_train_data=4742 block_size =256 batch_size= 4
2023-04-18 20:32:57,547 [INFO] Epoch 1 of 50
2023-04-18 20:33:04,405 [INFO] Epoch 0 complete. Loss: 5.974085330963135 saving ./test-gpt2-4/gpt2-epoch-1-2023-04-18 20:32:35.343858
2023-04-18 20:33:06,065 [INFO] Over-fit check answer: Formation of Granulation Tissueation of granulation tissue of granulation tissue of granulation tissue of granulation tissue of granulation tissuea,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,,1P
@alexcpn
alexcpn / GPT2-output.md
Created April 18, 2023 05:46
Huggingface GPT2 output generation based on parameters

Processing Message from input() Question: New York

Generated `

New York City. New Yorkers live within walking distance of the capital, and over 90% are located at or near high-speed Internet access points (h/t to WIRED). NYC is a global cultural center with an important influence on commerce; it constitutes one major city in terms
[a]century's worth [of news content]. With its rich media culture coupled by vibrant online communities that foster collaboration among writers from aroundthe world—from emerging markets like China through Latin America into Europe via Asia —NYC has become perhaps most influential place for new creative expression.[1][2], where innovative ideas can be disseminated quickly across disparate audiences without compromising quality control as well,[3],[4](http://www:washingtonpost.-times/.wp.] NYX provides opportunities both inside your home town hall meeting room full time but also outside when you're not there because many people don't have internet connections yet! It offers unp