Ryan Wesslen wesslen

## ai_web_search.ex
# You will need to install https://github.com/cpursley/html2markdown

defmodule Webpage do
  @moduledoc false
  defstruct [:url, :title, :description, :summary, :page_age]
end

defmodule WebSearch do
  @moduledoc """
  Web search summarization chain

## normcore-llm.md

      
              1 file
            
          
              218 forks
            
          
              38 comments
            
          
              2781 stars
            
          
                veekaybee
                / normcore-llm.md
            
            
              Last active
              July 21, 2024 13:28
            
              
                Normcore LLM Reads
              
          
    Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Foundational Concepts


Pre-Transformer Models


## suspension-reversals.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              40 stars
            
          
                travisbrown
                / suspension-reversals.md
            
            
              Last active
              March 25, 2023 18:32
            
              
                Updated version with additional data: https://github.com/travisbrown/unsuspensions
              
          
    Elon Musk's suspension reversals

The tables below show notable Twitter suspension reversals for each day since Elon Musk took over as owner and CEO.
All dates indicate when the suspension or reversal was detected, and the actual suspension or reversal may have been earlier.
For most English-language accounts with large followings, this lag will generally not be longer than a few hours,
but for accounts that have a small number of followers or that are outside the networks we are tracking, the difference can be larger,
and in some cases an account on the list may have had its suspension reversed before 27 October 2022.
These dates will get more precise as we refine the report.
Because of these limitations, this report should be considered a starting point for investigation, not a definitive list of suspension reversals.

  
## bionic.py
import pyphen

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.db import connect

hyphenator = pyphen.Pyphen(lang="en_US")

def construct_html(text):
    hyphend = hyphenator.inserted(text)

## dataset.jsonl
{"text":"Spam spam lovely spam!"}
{"text":"I like scrambled eggs."}
{"text":"I prefer spam!"}

## tokenizations_post.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              64 stars
            
          
                tamuhey
                / tokenizations_post.md
            
            
              Last active
              June 26, 2024 01:00
            
              
                How to calculate the alignment between BERT and spaCy tokens effectively and robustly
              
          
    How to calculate the alignment between BERT and spaCy tokens effectively and robustly


site: https://tamuhey.github.io/tokenizations/
Natural Language Processing (NLP) has made great progress in recent years because of neural networks, which allows us to solve various tasks with end-to-end architecture. However, many NLP systems still require language-specific pre- and post-processing, especially in tokenizations. In this article, I describe an algorithm that simplifies calculating correspondence between tokens (e.g. BERT vs. spaCy), one such process. And I introduce Python and Rust libraries that implement this algorithm.
Here are the library and the demo site links:

repo: https://github.com/tamuhey/tokenizations


## statistical_rethinking_emcee.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                wrgoldstein
                / statistical_rethinking_emcee.ipynb
            
            
              Last active
              January 23, 2022 22:09
            
              
                A cheat sheet explaining how to perform simple Bayesian modeling in python.
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## streamlit_prodigy.py
"""
Example of a Streamlit app for an interactive Prodigy dataset viewer that also lets you
run simple training experiments for NER and text classification.

Requires the Prodigy annotation tool to be installed: https://prodi.gy
See here for details on Streamlit: https://streamlit.io.
"""
import streamlit as st
from prodigy.components.db import connect
from prodigy.models.ner import EntityRecognizer, merge_spans, guess_batch_size

## Install
pip install streamlit
pip install spacy
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_md
python -m spacy download de_core_news_sm

## prodigy_srs.py
"""See https://twitter.com/honnibal/status/1120020992636661767 """
import time
import srsly
from prodigy import recipe
from prodigy.components.db import connect
from prodigy.util import INPUT_HASH_ATTR, set_hashes
from prodigy.components.filters import filter_duplicates


def get_rank_priority(data):
	# You will need to install https://github.com/cpursley/html2markdown

	defmodule Webpage do
	@moduledoc false
	defstruct [:url, :title, :description, :summary, :page_age]
	end

	defmodule WebSearch do
	@moduledoc """
	Web search summarization chain
	import pyphen

	import prodigy
	from prodigy.components.loaders import JSONL
	from prodigy.components.db import connect

	hyphenator = pyphen.Pyphen(lang="en_US")

	def construct_html(text):
	hyphend = hyphenator.inserted(text)
	{"text":"Spam spam lovely spam!"}
	{"text":"I like scrambled eggs."}
	{"text":"I prefer spam!"}
	"""
	Example of a Streamlit app for an interactive Prodigy dataset viewer that also lets you
	run simple training experiments for NER and text classification.

	Requires the Prodigy annotation tool to be installed: https://prodi.gy
	See here for details on Streamlit: https://streamlit.io.
	"""
	import streamlit as st
	from prodigy.components.db import connect
	from prodigy.models.ner import EntityRecognizer, merge_spans, guess_batch_size
	pip install streamlit
	pip install spacy
	python -m spacy download en_core_web_sm
	python -m spacy download en_core_web_md
	python -m spacy download de_core_news_sm
	"""See https://twitter.com/honnibal/status/1120020992636661767 """
	import time
	import srsly
	from prodigy import recipe
	from prodigy.components.db import connect
	from prodigy.util import INPUT_HASH_ATTR, set_hashes
	from prodigy.components.filters import filter_duplicates


	def get_rank_priority(data):