Skip to content

Instantly share code, notes, and snippets.

@ilap
Last active June 26, 2024 13:29
Show Gist options
  • Save ilap/bbb51952db84da8c9d12e52a066cb015 to your computer and use it in GitHub Desktop.
Save ilap/bbb51952db84da8c9d12e52a066cb015 to your computer and use it in GitHub Desktop.

Intro

  1. Google takeout subscription and watch history
  2. convert the watch-history.html to history.json using the js script below
  3. upload to invidious

2. Convert

# Part one
mkdir yt && cd yt
vi package.json
vi index.json
npm i
node index.js

# Part 2
# 
grep "https://www.youtube.com/watch?v=" history.json |\
sed 's@^.*https://www.youtube.com/watch.*=@"@' |\
sed '1!G;h;$!d' > history_to_upload.json
# this ^ reverse the order of the history

JS Script to Convert

const fs = require('fs');
const cheerio = require('cheerio');

// Read the HTML file
const html = fs.readFileSync('watch-history.html', 'utf8');

// Load the HTML into cheerio
const $ = cheerio.load(html);

// Array to hold the extracted history entries
const historyEntries = [];

// Select all relevant content cells
$('.content-cell.mdl-cell--6-col.mdl-typography--body-1').each((i, elem) => {
    const anchorTags = $(elem).find('a');
    const dateText = $(elem).text().split('\n').pop().trim(); // Extract date from the text

    // Extract URL and title from the first <a> tag
    const url = $(anchorTags[0]).attr('href');
    const title = $(anchorTags[0]).text().trim();

    // Extract channel name from the second <a> tag
    const channel = $(anchorTags[1]).text().trim();

    // Create a history entry object
    const historyEntry = {
        url: url,
        title: title,
        channel: channel,
        date: dateText
    };

    // Add the entry to the array
    // Avoidging #shorts
    if (title && typeof title === "string" && title != "" && ! title.includes("#shorts")) {
       historyEntries.push(historyEntry);
    }
});

// Convert the array to JSON
const jsonOutput = JSON.stringify(historyEntries, null, 2);

// Save the JSON to a file
fs.writeFileSync('history.json', jsonOutput);

console.log('History extracted and saved to history.json');

package.sjson

{
  "name": "youtube-history-extractor",
  "version": "1.0.0",
  "description": "A simple Node.js script to extract YouTube history entries from an HTML file and convert them to JSON",
  "main": "index.js",
  "scripts": {
    "start": "node index.js"
  },
  "dependencies": {
    "cheerio": "^1.0.0-rc.10"
  },
  "author": "Your Name",
  "license": "MIT"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment