Davide Fiocco davidefiocco

## a_b_challenge.md

      
              1 file
            
          
              0 forks
            
          
              119 comments
            
          
              45 stars
            
          
                VictorTaelin
                / a_b_challenge.md
            
            
              Last active
              July 24, 2024 03:47
            
              
                A::B Prompting Challenge: $10k to prove me wrong!
              
          
    CHALLENGE

Develop an AI prompt that solves random 12-token instances of the A::B problem (defined here), with 90%+ success rate.
RULES

1. The AI will be given a <problem/> to solve.

We'll use your prompt as the SYSTEM PROMPT, and a specific instance of problem as the PROMPT, inside XML tags. Example:

  
## fast_speech_text_speech.py
""" To use: install LLM studio (or Ollama), clone OpenVoice, run this script in the OpenVoice directory
    git clone https://github.com/myshell-ai/OpenVoice
    cd OpenVoice
    git clone https://huggingface.co/myshell-ai/OpenVoice
    cp -r OpenVoice/* .
    pip install whisper pynput pyaudio
"""

from openai import OpenAI
import time

## normcore-llm.md

      
              1 file
            
          
              218 forks
            
          
              38 comments
            
          
              2781 stars
            
          
                veekaybee
                / normcore-llm.md
            
            
              Last active
              July 21, 2024 13:28
            
              
                Normcore LLM Reads
              
          
    Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Foundational Concepts


Pre-Transformer Models


## LLMs.md

      
              1 file
            
          
              21 forks
            
          
              34 comments
            
          
              341 stars
            
          
                yoavg
                / LLMs.md
            
            
              Last active
              July 16, 2024 07:13
            
          
    Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.
Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

  
## gist:11093f0e4c501a41990e227393184eda
var timer=100;document.querySelectorAll("div > input[type='checkbox']:checked").forEach((interest) => {setTimeout(function(){interest.click()},timer);timer+=2000;});

## fasttext_cv.py
import argparse
import os

import fasttext
from sklearn.base import BaseEstimator
from sklearn.metrics import f1_score
from sklearn.model_selection import cross_val_score, StratifiedKFold


def read_data(data_dir):

## download_glue_data.py
''' Script for downloading all GLUE data.

Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).

mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC

## LineNumberingKiller.cs
using Microsoft.Office.Interop.Word;
using System.IO;

namespace MSWordExample
{
    public class LineNumberingKiller
    {
        static void Main(string[] args)
        {
            Application word = new Application();

## win32com.client.py
# If errors are found, do this
# clear contents of C:\Users\<username>\AppData\Local\Temp\gen_py
# that should fix it, to test it type
import win32com.client
app = win32com.client.gencache.EnsureDispatch('Word.Application')
app.Visible = True

## pad_packed_demo.py
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

seqs = ['gigantic_string','tiny_str','medium_str']

# make <pad> idx 0
vocab = ['<pad>'] + sorted(set(''.join(seqs)))

# make model
	""" To use: install LLM studio (or Ollama), clone OpenVoice, run this script in the OpenVoice directory
	git clone https://github.com/myshell-ai/OpenVoice
	cd OpenVoice
	git clone https://huggingface.co/myshell-ai/OpenVoice
	cp -r OpenVoice/* .
	pip install whisper pynput pyaudio
	"""

	from openai import OpenAI
	import time
	import argparse
	import os

	import fasttext
	from sklearn.base import BaseEstimator
	from sklearn.metrics import f1_score
	from sklearn.model_selection import cross_val_score, StratifiedKFold


	def read_data(data_dir):
	''' Script for downloading all GLUE data.

	Note: for legal reasons, we are unable to host MRPC.
	You can either use the version hosted by the SentEval team, which is already tokenized,
	or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
	For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
	You should then rename and place specific files in a folder (see below for an example).

	mkdir MRPC
	cabextract MSRParaphraseCorpus.msi -d MRPC
	using Microsoft.Office.Interop.Word;
	using System.IO;

	namespace MSWordExample
	{
	public class LineNumberingKiller
	{
	static void Main(string[] args)
	{
	Application word = new Application();
	# If errors are found, do this
	# clear contents of C:\Users\<username>\AppData\Local\Temp\gen_py
	# that should fix it, to test it type
	import win32com.client
	app = win32com.client.gencache.EnsureDispatch('Word.Application')
	app.Visible = True
	import torch
	import torch.nn as nn
	from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

	seqs = ['gigantic_string','tiny_str','medium_str']

	# make <pad> idx 0
	vocab = ['<pad>'] + sorted(set(''.join(seqs)))

	# make model