Skip to content

Instantly share code, notes, and snippets.

@shawngraham
shawngraham / first_example_instructions_prompt
Last active April 1, 2024 17:19
knowledge graph extractor for openchat3.5 in jan.ai
You are an excellent assistant with deep knowledge of research in the field of illegal and illicit antiquities. Generate an ontology for the text that follows, using the following guidelines: 1 – Denote subjects and objects using relative hash-based hyperlinks i.e., negating the use of example.com. 2- Output response to a code-block. 3 – RETURN data as well-formatted HIGH QUALITY RDF-TURTLE triples
THOUGHT: I should identify all of the people involved first.
THOUGHT: I should then identify any organizations involved.
THOUGHT: I should then identify any artefacts.
THOUGHT: I should then connect people, organizations, and objects by determining the predicates that join them.
THOUGHT: It is important that I respect subject-predicate-object order of meaning.
THOUGHT: There should be no empty nodes.
THOUGHT: I should ignore in-text citations using parentheses.
@shawngraham
shawngraham / OpenAlex.py
Last active February 3, 2024 00:47
adding openalex search to GPT-Researcher https://github.com/assafelovic/gpt-researcher/; aslo point the main config file to OpenAlexSearch
# OpenAlex Search Retriever
import requests
import os
class OpenAlexSearch():
"""
OpenAlex Search Retriever
"""
def __init__(self, query):
@shawngraham
shawngraham / opencontext.py
Created January 29, 2024 21:03
trying to build a retriever for gpt-researcher, to explore OpenContext
import requests
class OpenContextSearch():
"""
Open Context Search Retriever
"""
def __init__(self, query, content_type='subjects'):
"""
Initializes the OpenContextSearch object
Args:
@shawngraham
shawngraham / barcelona.rb
Created January 16, 2024 17:59
1384: Letters from Barcelona. An experiment sonifying the metadata in an archive. Listen at https://soundcloud.com/fhg1711/1384-letters-from-barcelona-sonification-version
# 1384: Letters from Barcelona - sonification version
# Data is travel time from Barcelona to Pisa and Barcelona to Avignon for letters (piano: to Pisa; string pluck: to Avignon). Duration is a function of the travel time scaled to 4 beats = 2 weeks. Notes are scored to correspond with the start date; pitch is a function of duration it took for the letter to reach its destination, so silences are meaningful. Longer journeys have higher and longer tones. The percussion plays the beat so that we can hear the progression of time; there is a background drone to indicate the passage of months.
# this code was rubber-ducked using GPT4-preview
require 'date'
ba_data = [9,9,8,7,9,8,8,8,12,9,8,9,10,8,8,8,8,11] #barcelona - avignon
ba_data_trip_start_date = ["1384-05-02","1384-05-09","1384-05-12","1384-05-20","1384-05-23","1384-05-26","1384-06-03","1384-06-15","1384-06-23","1384-07-11","1384-08-02","1384-08-06","1384-08-12","1384-09-01","1384-09-06","1384-09-06","1384-09-12","1384-09-19"]
bp_data = [26,22,34,23,23,21,
@shawngraham
shawngraham / autocrop.py
Created January 9, 2024 21:30
I've got a bunch of old screenshots of social media posts. I wanted to crop out the text & comments and end up with a folder of just the original images in the post. This was an attempt at that. It kinda works.
import os
import numpy as np
from PIL import Image
from skimage.color import rgb2gray
from skimage import filters, morphology
import argparse
def find_nonzero_extents(mask):
rows = np.any(mask, axis=1)
first_nonzero_row = next((i for i, row in enumerate(rows) if row), None)
@shawngraham
shawngraham / preprocess.sh
Created November 8, 2023 16:24
use ttok to count tokens in files in a folder to join files together in order to reduce the number of calls to gpt4-turbo. The idea is that we use ttok to work out how many tokens are in each file, comparing against an upper limit. If we're beneath the limit, we join together. If we're within 20% of the limit, we stop, write to file, then start …
#!/bin/bash
input_folder="articles" # Replace with your input directory path
output_folder="processed_articles" # Replace with your output directory path
# Create the output directory if it does not exist
mkdir -p "$output_folder"
# Set target token count and the acceptable range (20% of 100,000) 4t has 128k window
target_token_count=100000
@shawngraham
shawngraham / mardownify.py
Last active December 15, 2023 17:50
split a csv file into markdown files, for use with obsidian
import pandas as pd
import os
import argparse
def create_markdown_files(csv_filepath, output_dir):
# Read the CSV file into a pandas DataFrame
df = pd.read_csv(csv_filepath)
# Check if the output directory exists, if not, create it
if not os.path.exists(output_dir):
import argparse
import csv
# Initialize the argument parser
parser = argparse.ArgumentParser(description='Match entities from two CSV files and save the results to a new file.')
parser.add_argument('first_csv', help='The CSV file with "source" and "target" columns.')
parser.add_argument('second_csv', help='The CSV file to compare against.')
parser.add_argument('--output_csv', default='matched_rows.csv', help='The output CSV file with matched rows. Defaults to "matched_rows.csv".')
# Parse the command-line arguments
@shawngraham
shawngraham / news-to-knowledge-graph.py
Last active December 8, 2023 01:43
a little tkinter app that gets headlines from newsapi, and if you want, will go to the url, read the article, and format output as knowledgraph triples. But don't forget to export results.
import tkinter as tk
from tkinter import filedialog
from tkinter import ttk, filedialog
import pandas as pd
from newsapi import NewsApiClient
import llm
import requests
from strip_tags import strip_tags
model = llm.get_model("orca-mini-3b-gguf2-q4_0") #local model through llm plugin llm-gpt4all.
@shawngraham
shawngraham / gist:c4101bc2246ca9e01558323fab0900a1
Created November 20, 2023 16:28
Generated through discussion with a gpt trained on several volumes of material related to the logic of harris matrices, this file represents the gpt's best attempt at representing the conventions of a harris matrix via mermaid.js . There is no text-based specification explicitly for harris matrices, to my knowledge, so this is a kind of band-aid…
# Stratigraphic Representation with Mermaid.js: A Guide to Conventions
This document outlines a set of conventions for representing complex stratigraphic sequences using Mermaid.js diagram syntax. The aim is to provide a standardized approach to visualize archaeological stratigraphy in a clear and comprehensible manner.
## 1. Superposition