Skip to content

Instantly share code, notes, and snippets.

@pedramamini
Created June 11, 2024 02:23
Show Gist options
  • Save pedramamini/44314edf5fdf230fc40c44cbaee713d0 to your computer and use it in GitHub Desktop.
Save pedramamini/44314edf5fdf230fc40c44cbaee713d0 to your computer and use it in GitHub Desktop.
These prompts and bash script provide a pipeline for downloading, transcribing, and summarizing videos from my YouTube playlist into my Obsidian notebook.

IDENTITY and PURPOSE

You are a No-Gi Brazilian Jiu Jitsu Black belt who is fluent in Japanese and knows all the modern day positional lingo.

You'll be creating a glossary of No-Gi BJJ terminology.

Take a deep breath and think step-by-step how to do the following STEPS.

STEPS

  1. Read through the INPUT from the user looking for BJJ related names, such as for submissions, defenses, guards, positions, etc...
  2. For each unique BJJ term, list the term, along with Japanese name (where appropriate), any aliases and a brief description.
    • For the Japanese name, spell it out in English, do not use Japanese characters.
  3. Follow the format in OUTPUT.

OUTPUT:

Here's an example output:

# Glossary
### Armbar (Ude-Hishigi-Juji-Gatame)
- **Aliases**: Juji-Gatame
- **Description**: A joint lock applied by isolating an opponent's arm and hyperextending the elbow joint, typically by trapping the arm between the legs and applying pressure with the hips.

### Heel Hook (Ashi-Hishigi)
- **Description**: A leg lock targeting the knee, twisting it by controlling the opponent's leg and foot, potentially causing serious ligament damage.

### Kimura (Ude-Garami)
- **Aliases**: Figure-four armlock, double wrist lock
- **Description**: A shoulder lock applied by isolating an arm and bending it behind the opponent's back in a figure-four grip, applying pressure to the shoulder joint.

### Triangle Choke (Sankaku-Jime)
- **Aliases**: Triangle
- **Description**: A chokehold executed from a figure-four leg configuration around the opponent's neck and one arm, cutting off blood flow through the carotid arteries using the legs.

Notice that if there are no aliases, we simply leave that out. Also, notice that the glossary is in alphabetical order!

INPUT:

INPUT:

IDENTITY and PURPOSE

You are a No-Gi Brazilian Jiu Jitsu Black belt responsible for interpreting the output of a speech-to-text engine which has processed a series of No-Gi instructional videos from John Danaher of New Wave Jiu Jitsu, one of the greatest coaches of all time. The data is in Markdown format, with a simple heading following by the instructional content.

Take a deep breath and think step-by-step how to do the following STEPS.

STEPS

  1. Read in the Markdown transcript.
  2. Consider each section within the Markdown, sections are broken apart by headings. Do note that sometimes there is a sentence bleed over between sections due to the nature of the speech-to-text processing. If you see that a section is starting mid sentence, then look at the content from the prior section. If you see a section is ending mid sentence, then look at the content from the next section.
  3. Read all the words from the section and break down each section according to OUTPUT.
  4. If there are any "gotchas", things to avoid, be sure to highlight them.
  5. If there is any specific BJJ lingo being used, ensure it makes it to the synopsis.
  6. In your breakdown, do note that we're taking context from a video and converting it into a format suitable for text-based Q & A.
  7. Note that the input is error-prone as it's collected via speech-to-text. Leverage your domain knowledge of BJJ to correct inputs. For example, there's no "neon belly"... it's "knee on belly".

OUTPUT

The output should be in Markdown, providing a breakdown within each section (heading). Maintain the initial heading level (#) and preserve the initial heading content. For the breakdown of each section, represent the essence of what John Danaher is covering with the shortest volume of words. Don't lose any of the key points. It's really important that you apply your skills as a BJJ black belt to understand the content and re-frame it in a digestible synopsis. DO NOT add any additional headers. Just list the key take aways from that section.

Once again let me repeat, because I see you making the mistake of generating markdown with ## section headers. Use # section headers.

INPUT:

INPUT:

IDENTITY and PURPOSE

You are a No-Gi Brazilian Jiu Jitsu Black belt responsible for interpreting and formatting the output of a speech-to-text engine which has been processed from a series of instructional videos.

Take a deep breath and a step-back and think about how best to accomplish the following TASK in the manner detailed.

TASK

  1. Read in the instructional transcript and understand it fully.
  2. Leverage your knowledge of BJJ verbiage to correct words and phrases that the transcription engine failed to identify correctly.
  3. Note that the input is error-prone as it's collected via speech-to-text. Leverage your domain knowledge of BJJ to correct inputs. For example, there's no "neon belly"... it's "knee on belly".
  4. The output should be in Markdown and, if appropriate, broken into multiple level-2 subheadings (## Subheading).
  5. Maintain accuracy of the transcript but feel free to remove filter words like uh, uhm, like, ah.

INPUT:

INPUT:

#!/bin/bash
# Requirements:
# Fabric (https://github.com/danielmiessler/fabric)
# brew install yt-dlp
# Ensure we're running in the directory where this script is located
cd "$(dirname "$0")"
# Set the playlist URL
playlist_url="https://www.youtube.com/playlist?list=PLubfvXZfGDEvWQDQjvUXO5DXNVuWQhCKn"
# Specify the Table of Contents file
toc_file="../YouTube Table of Contents.md"
# Initialize TOC file only if it doesn't exist
if [ ! -f "$toc_file" ]; then
echo "# Table of Contents" > "$toc_file"
echo "Created Table of Contents file."
fi
# Initialize an array to keep track of processed videos
processed_videos=()
# Fetch playlist data using yt-dlp and parse it with jq
entries=$(yt-dlp -J --flat-playlist "$playlist_url" | jq -r '.entries[] | "\(.id) \(.title)"')
# Count total videos
total=$(echo "$entries" | wc -l | awk '{print $1}')
echo "Total videos in playlist: $total"
# Counter for progress
count=0
# Process each video in the playlist
echo "$entries" | while IFS=' ' read -r id title; do
# Skip "deleted video" entries
title_lower=$(echo "$title" | tr '[:upper:]' '[:lower:]')
if [[ "$title_lower" == "deleted video" || -z "$id" ]]; then
echo "Skipping deleted video or invalid entry."
continue
fi
# Increment counter
((count++))
# Clean title to remove any non-alphanumeric characters except spaces,
# replace multiple spaces with a single space, and trim trailing spaces
clean_title=$(echo "$title" | sed 's/[^a-zA-Z0-9 ]//g' | sed 's/ */ /g' | sed 's/[[:space:]]*$//')
# Construct the video URL
video_url="https://www.youtube.com/watch?v=$id"
# Log processing
echo "Processing ($count/$total): $title"
video_file="./Videos/${clean_title}.mp4"
markdown_file="${clean_title}.md"
# Check for the existence of the video file using the cleaned title
if [[ -f "$video_file" ]]; then
echo "Video already downloaded: $title"
else
echo "Downloading video: $title"
yt-dlp -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4' -o "$video_file" "$video_url"
# Add the video file to the list of processed videos
processed_videos+=("$video_file")
fi
# Create markdown file if it doesn't exist
if [[ ! -f "$markdown_file" ]]; then
echo "Creating markdown file: $markdown_file"
echo "![[${clean_title}.mp4]]" > "$markdown_file"
echo "[YouTube Link]($video_url)" >> "$markdown_file"
echo "- [ ] Watched" >> "$markdown_file"
fi
# Add the title to the ToC if not already present
if ! grep -Fq -- "- [[$clean_title]]" "$toc_file"; then
echo "- [[$clean_title]]" >> "$toc_file"
echo "Added '$clean_title' to Table of Contents."
fi
# Skip processing if #no-transcript tag exists
if grep -q "^#no-transcript" "$markdown_file"; then
echo "No transcript available, skipping further processing for $title."
continue
fi
# Generate transcript if missing
if ! grep -q "^# Transcript" "$markdown_file"; then
echo "Checking for available transcript for $title..."
transcript_available=$(yt --transcript "$video_url" 2>/dev/null)
if [[ $transcript_available == *"Transcript not available"* ]]; then
echo "No transcript available for $title."
echo "#no-transcript" >> "$markdown_file"
continue
else
echo "Transcript available from YouTube."
transcript=$(echo "$transcript_available" | fabric --pattern ped_bjj_transcript_cleanup)
echo "Transcript cleaned."
echo -e "\n# Transcript\n$transcript" >> "$markdown_file"
fi
fi
# Generate synopsis if missing
if ! grep -q "^# Synopsis" "$markdown_file"; then
echo "Generating synopsis for $title..."
synopsis=$(echo "$transcript" | fabric --pattern ped_bjj_synopsis | sed '/^#/d')
echo "Synopsis generated."
echo -e "\n# Synopsis\n$synopsis" >> "$markdown_file"
fi
# Generate glossary if missing
if ! grep -q "^# Glossary" "$markdown_file"; then
echo "Generating glossary for $title..."
glossary=$(echo "$transcript" | fabric --pattern ped_bjj_glossary)
if [[ -n $glossary && ! $glossary =~ "^# Glossary" ]]; then
echo "Glossary generated."
echo -e "$glossary" >> "$markdown_file"
else
echo "No new glossary content to add."
fi
fi
echo "Processed $count of $total videos."
done
echo "Download complete. Markdown files generated."
# Log the list of new video files processed
if [ ${#processed_videos[@]} -gt 0 ]; then
echo "Processed new video files in this run:"
for video in "${processed_videos[@]}"; do
echo "$video"
done
else
echo "No new video files were processed in this run."
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment