Skip to content

Instantly share code, notes, and snippets.

@44213
Forked from 0foo/Data Hoarding General
Created September 18, 2021 13:31
Show Gist options
  • Save 44213/a64ed3be56f3d845766edb31b2eaedc1 to your computer and use it in GitHub Desktop.
Save 44213/a64ed3be56f3d845766edb31b2eaedc1 to your computer and use it in GitHub Desktop.
Data Hoarding General /dhg/ (sauce - https://github.com/simon987/awesome-datahoarding)
### Web Archiving
* Collect - https://github.com/xarantolus/Collect: A server to collect & archive websites that also supports video downloads
* grab-site - https://github.com/ludios/grab-site: The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
* Heritrix - https://github.com/internetarchive/heritrix3: Extensible, web-scale, archival-quality web crawler
* HTTrack - https://www.httrack.com/: Download a website from the Internet to a local directory
* wail - https://github.com/machawk1/wail: Web Archiving Integration Layer: One-Click User Instigated Preservation
* wikiteam - https://github.com/WikiTeam/wikiteam: set of tools for archiving wikis
### General
* annie - https://github.com/iawia002/annie: Youtube-DL alternative writtent in Golang
* aria2 - https://aria2.github.io/: A lightweight multi-protocol & multi-source command-line download utility
* CrowLeer - https://github.com/ERap320/CrowLeer: Powerful C++ web crawler based on libcurl
* curl - https://github.com/curl/curl: Tool and library for transferring data with URL syntax, supporting many protocols
* Plowshare - https://github.com/mcrapet/plowshare: Command-line tool to manage file-sharing site
* Rclone - https://github.com/ncw/rclone: A command line program to sync files and directories to and from various cloud storage providers
* wget - https://savannah.gnu.org/git/?group=wget: Utility for non-interactive download of files from
* you-get - https://github.com/soimort/you-get: Dumb downloader that scrapes the web
* Youtube-DL - https://github.com/rg3/youtube-dl: A command-line program to download videos from YouTube and a few hundred more sites
### Application-specific
* ChanThreadWatch - https://github.com/SuperGouge/ChanThreadWatch: Saves threads from \*chan-style boards and checks for updates until the thread dies
* floatplane_ripper - https://gist.github.com/simon987/0756c378ca2dfb0003931e26ff7fe270: Script to rip all videos from https://floatplane.rip/
* gallery-dl - https://github.com/mikf/gallery-dl: Fownload image galleries and collections from pixiv, exhentai, danbooru and more
* dzi-dl - https://github.com/ryanfb/dzi-dl: Deep Zoom Image Downloader
* FanFicFare - https://github.com/JimmXinu/FanFicFare: Tool for making eBooks from stories on fanfiction and other web sites
* FicSave - https://github.com/waylaidwanderer/FicSave: Online fanfiction downloader
* Google Images Download - https://github.com/hardikvasa/google-images-download: Python script for downloading images
* iiif-dl - https://github.com/ryanfb/iiif-dl: Command-line tile downloader/assembler for IIIF endpoints/manifests
* Instagram Scraper - https://github.com/dankmemes/instagram-scraper: Instagram-scraper is a command-line application written in Python that scrapes and downloads an instagram user's photos and videos. Use responsibly.
* PyInstaLive - https://github.com/notcammy/PyInstaLive: Instagram live stream downloader.
* RedditDownloader - https://github.com/shadowmoose/RedditDownloader: Scrapes Reddit to download media of your choice
* Scribd-Downloader - https://github.com/ritiek/scribd-downloader: Allows downloading of Scribd documents
* RipMe - https://github.com/RipMeApp/ripme: RipMe is an album ripper for various websites. Runs on your computer. Requires Java 8.
* yt-mango - https://github.com/terorie/yt-mango: Youtube metadata archiver
the Web (HTTP & FTP)
* Youtube-MA - https://github.com/CorentinB/YouTube-MA: Youtube metadata archiver
### Download automation
* bazarr - https://github.com/morpheus65535/bazarr: Companion application to Sonarr and Radarr for downloading subtitles
* FlexGet - https://github.com/Flexget/Flexget: Multipurpose automation tool for content like torrents, nzbs, podcasts, comics, series, movies, etc
* Jackett - https://github.com/Jackett/Jackett: API support for torrent trackers (works with Sonarr, Radarr and others)
* Lidarr - https://github.com/lidarr/Lidarr: Music collection manager for Usenet and BitTorrent users
* Mylar - https://github.com/evilhero/mylar: An automated Comic Book downloader (cbr/cbz) for use with SABnzbd, NZBGet and torrents
* Sick-Beard - https://github.com/midgetspy/Sick-Beard: PVR for newsgroup users (with limited torrent support)
* Radarr - https://github.com/Radarr/Radarr: A fork of Sonarr to work with movies à la Couchpotato
* Sonarr - https://github.com/Sonarr/Sonarr: PVR for Usenet and BitTorrent users
## Handling Data Rot and it's Corruption
* m5 deep - http://md5deep.sourceforge.net/: md5deep is a set of programs to compute MD5, SHA-1, SHA-256, Tiger, or Whirlpool message digests on an arbitrary number of files.
## Compression
* KGB Archiver - https://github.com/RandallFlagg/kgbarchiver: compression tool with unbelievable high compression rate
* peazip - http://www.peazip.org/: File archiver utility
* PIGZ - https://zlib.net/pigz/: Multi-threaded gzip
* WinRAR - https://www.rarlab.com/download.htm: Can decompress RAR and zip files.
## Network
* NetLimiter - https://www.netlimiter.com/: Internet traffic control and monitoring tool for Windows
## File systems
* httpdirfs - https://github.com/fangfufu/httpdirfs/: A filesystem which allows you to mount HTTP directory listings
* mergerfs - https://github.com/trapexit/mergerfs: a featureful union filesystem
* NTFS drivers for MacOS - https://www.seagate.com/ca/en/support/downloads/item/ntfs-driver-for-mac-os-master-dl/
## File conversion
* AAXtoMP3 - https://github.com/KrumpetPirate/AAXtoMP3: convert AAX files to common MP3, M4A, M4B, flac and ogg formats through a basic bash script frontend to FFMPEG
* html2warc - https://github.com/steffenfritz/html2warc: Convert web resources to a single warc file
## Utility Scripts
* Backblaze B2 sync backup script - https://gist.github.com/AlexanderProd/cb645cf858fd5c89780e7df267226b80: Script to sync mutliple directories with Backblaze B2
* Misc download scripts - https://github.com/simon987/Misc-Download-Scripts: Scripts for downloading content from various websites
* rclone_dirsize - https://gist.github.com/simon987/7aff5ca3e9ae6c755055ca7b350ef9f8: Get size of http directory listing with rclone
* rm_empty_subdir - https://gist.github.com/simon987/f5c2cd7602898615ac9bc8c762d9fe1d: Remove empty sub-directories on Windows
* void-cat-uploader - https://github.com/takky1154/void-cat-uploader: This script automatically uploads all files inside a directory to https://void.cat.
* youtube-dl_soundcloud - https://gist.github.com/simon987/2dd7c57d65a741c93f5791bc984b97d1: snippet for using youtube-dl to download soundcloud playlists
## Content sharing
* h5ai - https://github.com/lrsjng/h5ai: HTTP web server index for Apache httpd, lighttpd, nginx and Cherokee
* ipfs - https://ipfs.io/: Protocol and network designed to create a content-addressable, peer-to-peer method of storing and sharing hypermedia in a distributed file system
* opds - https://opds.io/: Easy to use, Open & Decentralized Content Distribution
## Data curation
* baobab - https://github.com/GNOME/baobab: Graphical disk usage analyzer
* beets - https://github.com/beetbox/beets: Music library manager and MusicBrainz tagger
* Calibre - https://github.com/kovidgoyal/calibre: Ebook manager
* DeepSort - https://github.com/CorentinB/DeepSort/: AI powered image tagger backed by DeepDetect
* diskover - https://github.com/shirosaidev/diskover: File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch
* Everything - https://www.voidtools.com/: Locate files and folders by name instantly (Windows)
* FileBot - https://www.filebot.net/: FileBot is the ultimate tool for organizing and renaming your Movies, TV Shows and Anime
* fucking-weeb - https://github.com/cosarara/fucking-weeb: A library manager for animu (and TV shows, and whatever).
* grepWin - https://github.com/stefankueng/grepWin: A powerful and fast search tool using regular expressions (Windows)
* jdupes - https://github.com/jbruchon/jdupes: Powerful duplicate file finder
* MediaElch - https://github.com/komet/mediaelch: Media manager for Kodi
* MediaInfo - https://github.com/MediaArea/MediaInfo: Convenient unified display of the most relevant technical and tag data for video and audio files
* Mp3tag - https://www.mp3tag.de: Powerful and easy-to-use tool to edit metadata of audio files (Windows/Mac)
* phockup - https://github.com/ivandokov/phockup: Media sorting tool to organize photos and videos from your camera
* picard - https://github.com/metabrainz/picard: MusicBrainz tagger
* TeraCopy - https://www.codesector.com/downloads: Copy your files faster and more securely
* tree - http://mama.indstate.edu/users/ice/tree/: 'tree' command for linux
* WinDirStat - https://windirstat.net/: Disk usage statistics viewer and cleanup tool for Windows
* SyncToy - https://www.microsoft.com/en-us/download/details.aspx?id=15155: Microsoft windows file parity across locations tool
* DupeGuru - https://dupeguru.voltaicideas.net/: finds duplicate files
## File Utilities
* __Batch Renamer__: GPRename - http://gprename.sourceforge.net/ -> qmv (renameutils) - http://www.nongnu.org/renameutils/
* __File Archiver__: PeaZip -> Xarchiver -> Atool - http://www.nongnu.org/atool/
* __File Search__: DocFetcher - http://docfetcher.sourceforge.net/en/index.html -> ANGRYsearch - https://github.com/DoTheEvo/ANGRYsearch -> Puggle - http://puggle.sourceforge.net/ -> regain - http://regain.sourceforge.net/index.php -> find
* __File Synchronization__: Unison - https://github.com/bcpierce00/unison -> git-annex - https://git-annex.branchable.com/ -> Rsync
* __Image Organizer__: hydrus network -> Shotwell -> GTKRawGallery -> digiKam -> gThumb (+ gphoto) -> Mapivi - http://mapivi.sourceforge.net/mapivi.shtml -> BASH-Booru - https://github.com/ChristianSilvermoon/BASH-Booru
* __RegEx Builder__: regexxer - https://directory.fsf.org/wiki/Regexxer -> Visual REGEXP - http://laurent.riesterer.free.fr/regexp/ -> txt2regex - https://github.com/aureliojargas/txt2regex
## Filesharing
* __Direct Connect__: LinuxDC++ -> ncdc - https://github.com/srijan/ncdc -> microdc2 -http://corsair626.no-ip.org/microdc/
* __Download Manager__: giFT - https://sourceforge.net/projects/gift/ + giFTcurs - http://www.nongnu.org/giftcurs/ -> aria2 - https://aria2.github.io/ -> cURL -> Wget
* __File Scraper__: megatools -> JDownloader - https://github.com/Bobmk/JDownloader) -> Plowshare - https://github.com/mcrapet/plowshare
* __FTP Client__: FileZilla -> lftp - https://github.com/lavv17/lftp
* __LAN Sharing__: NitroShare -> Dukto
* __Media Center__: Plex -> Emby -> Popcorn Time -> Kodi ("XBMC", + Sonarr)
* __Media Miner__: FlexGet -> Sonarr - https://github.com/Sonarr/Sonarr
* __Offline Reader__: Kiwix - http://www.kiwix.org/ -> Darcy Ripper -> HTTrack -> Wget
* __Soulseek__: Nicotine Plus -> Museek (mucous) - https://museek-plus.org/
* __Stream Catcher__: Streamripper -> youtube-dl -> cclive - https://github.com/legatvs/cclive -> youtube-pl - http://ronja.twibright.com/youtube-pl.php -> quvi - https://github.com/mogaal/quvi, RTMPDump - https://github.com/mstorsjo/rtmpdump
* __Torrent Client__: qBittorrent -> RTorrent -> transmission-daemon (comes with a web interface - https://github.com/transmission/transmission/wiki/Web-Interface by default but other frontends - https://github.com/fagga/transmission-remote-cli exist.
* __Torrent Tracker Scraper__: Torrtux - https://github.com/l333k0/torrtux -> Torrench - https://github.com/kryptxy/torrench -> Jackett - https://github.com/Jackett/Jackett
* __Usenet (File Grabber)__: LottaNZB -> SABnzbd -> NZBGet - https://github.com/nzbget/nzbget -> nzb - https://directory.fsf.org/wiki/Nzb -> nzbperl - https://github.com/eghm/nzbperl
## Command Line Tools
* __Command Line Cheatsheet__: CLI Companion - https://launchpad.net/clicompanion -> xman -> cheat / howdoi / clf / fu / bro -> cheat.sh - https://github.com/chubin/cheat.sh
* __Directory Browsing__: fasd - https://github.com/clvv/fasd, xd - https://github.com/fbb-git/xd, fzy - https://github.com/jhawthorn/fzy
* __Framebuffer Environment__: Fbterm - https://code.google.com/archive/p/fbterm/ -> yaft (because sixel) - https://github.com/uobikiemukot/yaft -> hterm (because regis) - https://github.com/new299/hackterm
* __Hacker Culture__: ddate, fortune, The Hacker Test, The Jargon File
* __Multiplexer__: Tmux -> Byobu -> GNU Screen (+sixel patch - https://gist.github.com/saitoha/7546579
* __Progress Viewers__: progress - https://github.com/Xfennec/progress) -> pv - Pipe Viewer - https://github.com/icetee/pv -> Advanced Copy - https://github.com/atdt/advcpmv
## Disk Tools
* __CD-DVD Burn and Copy (Backends)__: cdrtools -> cdrkit -> cdrskin - https://dev.lovelyhq.com/libburnia/web/wikis/cdrskin
* __CD-DVD Burn and Copy (Frontends)__: K3b -> Brasero -> cdw - http://cdw.sourceforge.net/
* __CD-DVD Ripping__: Sound Juicer -> fre ac -> cdparanoia - https://www.xiph.org/paranoia/ (+ ABCDE - http://lly.org/~rcw/abcde/page/
* __Custom Install CD__: Respin -> Remastersys -> Distroshare -> PinguyBuilder -> Customizer -> Ubuntu Customization Kit -> Mklivecd
* __Device Management__: Udisks (+ udevil) -> pmount -> bashmount -https://github.com/jamielinux/bashmount/blob/master/INSTALL
* __Disk Cloning and Writing__: dd -> dcfldd -> dc3dd - https://sourceforge.net/projects/dc3dd/
* __Live USB__: UNetbootin -> MultiCD - https://multicd.us/
* __Partitioning__: Gparted -> cfdisk -> GNU Parted -> fdisk / sfdisk
* __System Backup__: Systemback - https://sourceforge.net/projects/systemback/ -> Bacula - https://blog.bacula.org/ -> FSArchiver - http://www.fsarchiver.org/ -> CYA - https://www.cyberws.com/bash/cya/
## APIs & Online tools
* iqdb - https://iqdb.org/: Multi-service reverse image search
* thetvdb - https://www.thetvdb.com/: TV shows metadata (used by plex)
## Hardware / Monitoring
* CrystalDiskInfo - https://crystalmark.info/en/software/crystaldiskinfo/: A HDD/SSD utility software which supports a part of USB, Intel RAID and NVMe.
* Hard Drive Sentinel - https://www.hdsentinel.com/: Multi-OS SSD and HDD monitoring and analysis software
* smartmontools - https://www.smartmontools.org/: Control and monitor storage systems using the (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks
## Data recovery
* PhotoRec - https://www.cgsecurity.org/wiki/PhotoRec: FOSS powerful gui data recovery tool.
* TestDisk - https://www.cgsecurity.org/wiki/TestDisk_Download: Another FOSS tool by the author of PhotoRec, but this one is for cli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment