tkunstek

## readme.txt
You can utilize Whisper.cpp (https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file) to take a generic recording, extract the text, and feed it to an LLM using rag. Completely free and privately.

Start by getting a meeting recording. I use Just Press Record on my Mac to record meetings.

This now needs to be converted to a specific format. I used the command:
ffmpeg -i /Users/tkunstek/Library/Mobile\ Documents/iCloud\~com\~openplanetsoftware\~just-press-record/Documents/2024-05-28/18-12-07.m4a  -ar 16000 -ac 1 -c:a pcm_s16le output.wav

Now use whisper.cpp to extract the text:
./main -otxt true -m models/ggml-base.en.bin -f output.wav > transcript.txt

## agents.yaml
detective:
  role: >
    Cold case homicide detective
  goal: >
    Review cold case files to find answers to questions about the suspect {suspect} as it relates to the victim {victim}.
    Sometimes the information is not in the cold case files, so simply report unknown and move on.
    Always cite your notes so the other detectives can see where you found your information.
  backstory: >
    You're a seasoned detective with a knack for identifying facts in homicide investigations.
    You are known for your ability to find the most relevant

## Readme.txt
I created the environment in the docker-compose.yaml. The sandbox was only accessable via Wireguard and team members individual keys. Since everything had to exist in the sandbox, you will see that I included a container to run firefox. Using that container I downloaded all of the case files.

The case files included PDF's that were image scans of computer print-outs and handwritten notes. There was no native text in any of the files. I attempeted using OCR software (Tika, Tesseract, etc) with poor results. I settled on using AWS Textract in a private account with a private VPC.

Before sending the data to textract there was some cleanup needed. First, I had to fix the file names, for this I used the detox linux command. Next, each multi-page PDF had to be split into a seperate file. See split.sh for a wrapper script I wrote to automate the job.

The resulting individual pages were than uploaded to s3 using the aws cli into a secure s3 bucket. I configured a retention policy on the bucket to delete all files

## diagnostic.txt
reading config from /root/teslausb_setup_variables.conf
====== summary ======
hardware: Raspberry Pi Zero W Rev 1.1
OS: Raspbian GNU/Linux 10 (buster)
headless setup config in /root
archive method: rsync
lun0 connected, from file /backingfiles/music_disk.bin
lun1 connected, from file /backingfiles/cam_disk.bin
1 snapshots mounted

## SwiftCodeCram
// Playground - noun: a place where people can play

import UIKit

var str = "Hello, playground"


let five = 5
let six = 6
var eleven = five + six
	You can utilize Whisper.cpp (https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file) to take a generic recording, extract the text, and feed it to an LLM using rag. Completely free and privately.

	Start by getting a meeting recording. I use Just Press Record on my Mac to record meetings.

	This now needs to be converted to a specific format. I used the command:
	ffmpeg -i /Users/tkunstek/Library/Mobile\ Documents/iCloud\~com\~openplanetsoftware\~just-press-record/Documents/2024-05-28/18-12-07.m4a -ar 16000 -ac 1 -c:a pcm_s16le output.wav

	Now use whisper.cpp to extract the text:
	./main -otxt true -m models/ggml-base.en.bin -f output.wav > transcript.txt
	detective:
	role: >
	Cold case homicide detective
	goal: >
	Review cold case files to find answers to questions about the suspect {suspect} as it relates to the victim {victim}.
	Sometimes the information is not in the cold case files, so simply report unknown and move on.
	Always cite your notes so the other detectives can see where you found your information.
	backstory: >
	You're a seasoned detective with a knack for identifying facts in homicide investigations.
	You are known for your ability to find the most relevant
	I created the environment in the docker-compose.yaml. The sandbox was only accessable via Wireguard and team members individual keys. Since everything had to exist in the sandbox, you will see that I included a container to run firefox. Using that container I downloaded all of the case files.

	The case files included PDF's that were image scans of computer print-outs and handwritten notes. There was no native text in any of the files. I attempeted using OCR software (Tika, Tesseract, etc) with poor results. I settled on using AWS Textract in a private account with a private VPC.

	Before sending the data to textract there was some cleanup needed. First, I had to fix the file names, for this I used the detox linux command. Next, each multi-page PDF had to be split into a seperate file. See split.sh for a wrapper script I wrote to automate the job.

	The resulting individual pages were than uploaded to s3 using the aws cli into a secure s3 bucket. I configured a retention policy on the bucket to delete all files
	reading config from /root/teslausb_setup_variables.conf
	====== summary ======
	hardware: Raspberry Pi Zero W Rev 1.1
	OS: Raspbian GNU/Linux 10 (buster)
	headless setup config in /root
	archive method: rsync
	lun0 connected, from file /backingfiles/music_disk.bin
	lun1 connected, from file /backingfiles/cam_disk.bin
	1 snapshots mounted
	// Playground - noun: a place where people can play

	import UIKit

	var str = "Hello, playground"


	let five = 5
	let six = 6
	var eleven = five + six