Skip to content

Instantly share code, notes, and snippets.

View jacobod's full-sized avatar
🐕‍🦺

Jacob Dodd jacobod

🐕‍🦺
View GitHub Profile
@veekaybee
veekaybee / normcore-llm.md
Last active July 25, 2024 19:14
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

@rain-1
rain-1 / LLM.md
Last active July 25, 2024 18:44
LLM Introduction: Learn Language Models

Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.

@elliottmorris
elliottmorris / ca_polls - Sheet1.csv
Last active September 18, 2021 03:03
California 2021 recall polling model
end_date pollster n population pid_weighted keep remove
2021-09-13 survey monkey 3985 LV 1 0.55 0.41
2021-09-13 trafalgar 1082 LV 0 0.53 0.45
2021-09-11 emerson 1000 LV 1 0.6 0.4
2021-09-10 data for progress 2464 LV 1 0.57 0.43
2021-09-08 surveyusa 930 LV 0 0.54 0.41
2021-09-07 suffolk 500 LV 0 0.58 0.41
2021-09-06 uc berkeley 6550 LV 0 0.6 0.39
2021-09-04 trafalgar 1079 LV 0 0.53 0.43
2021-09-01 yougov 1955 LV 1 0.56 0.44
@wojteklu
wojteklu / clean_code.md
Last active July 25, 2024 11:12
Summary of 'Clean code' by Robert C. Martin

Code is clean if it can be understood easily – by everyone on the team. Clean code can be read and enhanced by a developer other than its original author. With understandability comes readability, changeability, extensibility and maintainability.


General rules

  1. Follow standard conventions.
  2. Keep it simple stupid. Simpler is always better. Reduce complexity as much as possible.
  3. Boy scout rule. Leave the campground cleaner than you found it.
  4. Always find root cause. Always look for the root cause of a problem.

Design rules

Introduction

Hello guys! I am going to walk you through my implementation of Sentiwordnet 3.0 on movie reviws to find the overall sentiment of ech review. I have mentioned the datasets and more about Sentiwordnet below. I will be using python 2.7 for coding. Also a few of its libraries like pandas, sklearn and nltk. NLTK has inbuilt modules for Sentiwordnet and Pos Tagger which will also be used in our code. So let's get started !

SentiWordnet

SentiWordNet is a lexical resource for opinion mining. SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity. Each of the three scores ranges in the interval [0.0 ; 1.0], and their sum is 1.0 for each synset.

Sentiwordnet was designed by ranking subjectivity of all terms or synsets according to the part of speech the term belongs to. The parts of speech represented by the sentiwordnet are adjective, noun, adverb and verb which are represented respectively as 'a', 'n', 'r', 'v'. the database has five col

@bsweger
bsweger / useful_pandas_snippets.md
Last active April 19, 2024 18:04
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)