Skip to content

Instantly share code, notes, and snippets.

@data2json
data2json / check1
Created April 13, 2023 07:22
check1
# Step 1: Install necessary libraries
!pip install dask dask_ml
# Step 2: Import required libraries
import dask.dataframe as dd
import dask.array as da
from dask_ml.impute import SimpleImputer
from dask_ml.preprocessing import StandardScaler
from dask_ml.decomposition import PCA
from dask_ml.feature_selection import SelectKBest, f_classif
version: "3"
services:
# configuration manager for NiFi
zookeeper:
hostname: myzookeeper
container_name: zookeeper_container_persistent
image: 'bitnami/zookeeper:3.7.0' # latest image as of 2021-11-09.
restart: on-failure
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
version: "3"
services:
# configuration manager for NiFi
zookeeper:
hostname: myzookeeper
container_name: zookeeper_container_persistent
image: 'bitnami/zookeeper:3.7.0' # latest image as of 2021-11-09.
restart: on-failure
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
@data2json
data2json / testme.py
Created March 8, 2024 20:07
Test your GPUs.
import torch
def test_gpu(gpu_id):
try:
# Set the device to the specified GPU
device = torch.device(f"cuda:{gpu_id}")
# Create a random tensor on the GPU
tensor = torch.randn(1024, 1024, device=device)
import os
import asyncio
import aiohttp
import json
import logging
from threading import Lock
# Logging setup (for better debugging)
logging.basicConfig(
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
@data2json
data2json / Better & Faster Large Language Models via Multi-Token Prediction.md
Created June 19, 2024 13:54
Better & Faster Large Language Models via Multi-Token Prediction

Better & Faster Large Language Models via Multi-Token Prediction

A recent paper titled "Better & Faster Large Language Models via Multi-token Prediction" (arXiv:2404.19737v1) introduces a simple but effective modification to the standard language modeling training loss that significantly improves performance, inference speed, and reasoning capabilities of large language models, especially for code-related tasks.

Key Findings

The authors propose training language models to predict multiple future tokens at once, using a shared model trunk and independent output heads for each future token position. This multi-token prediction approach is compared to the standard next-token prediction loss through comprehensive experiments on both synthetic and natural datasets. The key findings are summarized in the following fact table:

| Fact | Details/Context | Results/Metrics

@data2json
data2json / anth_prompt_gen.py
Created June 21, 2024 04:05
Anthropic Meta-Prompt Generator
import anthropic
client = anthropic.Anthropic(
# defaults to os.environ.get("ANTHROPIC_API_KEY")
api_key="sk-nope-nope-nope",
)
TASK_DESCRIPTION="Act as a writer that supplies the missing context for the given text."
# Replace placeholders like {{TASK_DESCRIPTION}} with real values,
@data2json
data2json / t.py
Last active January 12, 2025 14:00
T - The missing LLM Unix Token Tool
#!/usr/bin/env python
# t - The missing LLM token counting and splitting tool for UNIX
import argparse
import sys
from typing import Optional, List
import math
import os
import tiktoken
curl -s -X POST 'http://0.0.0.0:8000/v1/chat/completions' -H "Content-Type: application/json" -d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "Environment: ipython\nTools: brave_search, wolfram_alpha\n\nCutting Knowledge Date: December 2023\nToday Date: 23 Jul 2024\n\nYou are a helpful Assistant."
},
{
"role": "user",
"content": "Can you help me solve this equation: x^3 - 4x^2 + 6x - 24 = 0"
curl -X POST http://192.168.50.146:8000/v1/chat/completions \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"content": "You are a helpful assistant that can check the weather. Use the get_weather tool when asked about weather conditions. If you choose to call a function ONLY reply in the following format: <{sta
rt_tag}={function_name}>{parameters}{end_tag} where start_tag => <function parameters => a JSON dict with the function argument name as key and function argument value as value. end_tag => </function>",
"role": "system"