brossi/gist:3abb692edf25aaabaef9648dbbd693fd

## gistfile1.txt
Data & Tools:
Many contributors. Thanks to all.

Contact:
    ParlerAnalysis@protonmail.com

IRC Channels:
    #parlerparsers at https://webirc.hackint.org/
    #parlerparsers-video for video IDing

    Please register your nick and at least take a vhost before joining to mask your IP. Using a VPN or Tor is recommended.
        /msg NickServ register <yourpassword> <email@foo.bar>
        /msg hostserv take hackint/user/$account

FBI Tips:
    https://tips.fbi.gov/digitalmedia/aad18481a3e8f02

Many dev efforts are being consolidated in:
    https://github.com/ozywog/parler-data-tools

Open spreadsheet for listing notable video IDs:
    https://docs.google.com/spreadsheets/d/1ThPUH5HgTcVKCoyfr2oJ21AWKTGq-dR-cRZjPOER-Q0/edit#gid=0

Listing of videos:
    tommycarsten.com/terrorism/index.html - Most videos posted from Capitol Hill on Jan6th
    https://www.youtube.com/channel/UCZk6IiAVk2QwOdljEAYCPLw - More of the same, also avail on mega - link in Resources

Various Maps, most are focused on Jan 6th:
    https://thepatr10t.github.io/yall-Qaeda
    https://kylemcdonald.net/parler/map/
    https://fortress.maptive.com/ver4/a3486a6ab9a9a12aa9a9cb067839079c/410491
    https://darthnithin.github.io/earth/index.html
    https://parlervid.herokuapp.com/
!!!     VideoID can be added to url to download a video, ala https://parlervid.herokuapp.com/VIDEOID

Want to help but don't know how?
    Download copies of data and scripts. rehost them elsewhere, and seed torrents.
    Help make this file easier for other to understand.
        Like-minded list with nice formatting - https://github.com/rljacobson/CapitolResources/
    Develop ways to make data easy to visualize and sort with current tools
    Come ask in IRC about current efforts.


Tools
================================

- Dataset preprocessors
    zip of HTML posts -> json, tar.gz of vid metadata -> json
    https://gitlab.com/-/snippets/2060956


- Script to scrape videos: video scraper:
    https://github.com/darthnithin/parlervideoscraper (py)
    https://github.com/acanthias13/reimagined-dollop  (R)
        You will need to generate CSVs from gonk's metadata.tar.gz from to use these
        Or a csv from another source, such as one listed in this doc.


- Script to generate a list of unique names and usernames then collect all the
    posts and associate them with the person who posted them
    Requires raw html source:
    https://github.com/billstrobl/Prooter
    https://github.com/billstrobl/Prooter/blob/master/prooter.py


- Script to extract images/videos from WARCs:
    https://gist.github.com/redd-dedd/9a200a9ba789f312faf53b25ac63e024


Resources
=================================

- Magnet URI for torrent of file that contains 1.8 million texts scraped from
    Parler and is subet of full data. Originally hosted on https://parler-archive.deadops.de/
    This is the parler_2020-01-06_posts-partial torrent that was spread early on.
    magnet:?xt=urn:btih:FF29970B902657A32D561C0720E70FACFB8C4284&dn=parler_2020-01-06_posts-partial&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.internetwarriors.net%3a1337%2fannounce


- Metadata json files with EXIF data on all MP4 videos scraped from Parler:
    donk.sh/metadata.tar.gz
    magnet:?xt=urn:btih:1723e27bc79186c4574ff056ddb458d771c26e2f&dn=metadata.tar.gz&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fexplodie.org%3A6969&tr=udp%3A%2F%2
    SHA256: 66809d9ae0a5a6577a3c80bb623562274ceccd96b35519f15f568d09cefc56f8 metadata.tar.gz


! Torrent of ~all videos from Parler's CDN - Said to contain more than the archive.org pull
    Ongoing split torrent - recheck your chunks often
    README FIRST: https://gist.github.com/shoghicp/714f590f3a175635b7a377905bd21ea4
    https://pl.gammaspectra.live/


- Usernames and posts, seperated from dataset
    https://drive.google.com/file/d/1Lo4I2du5rGSKqPcrC_hnEvrLTufDqTpy/view?usp=sharing


- Pictures / Images
    https://irc.gammaspectra.live/339648d275d2712b/imagelist.zip
        List of all image filepaths from Internet Archive collection
    https://par.pw/v1/photo?id=IMAGEID
        Webtool to download images. You'll need other tools to get IDs


- Massive listing of Jan 6thmedia from across multiple socials
  All are the same dump as far as we're aware.
    https://capitol-hill-riots.s3.us-east-1.wasabisys.com/directory.html
       Looks to be the same as the mega dump below, but easier to grab from.
    https://mega.nz/folder/30MlkQib#RDOaGzmtFEHkxSYBaJSzVA
    lilprincess.tk/storage/capitol_riots


- Videos From DC Area, Jan 6th. Estimated to only be about 10% of what was available, at this moment
    https://www.youtube.com/channel/UCZk6IiAVk2QwOdljEAYCPLw
    https://mega.nz/file/Pkk2VSRT#x-Gnl1-FddGwHumBXAGsCJ2FL1VHE-Y-u2SFW48KpeQ


- 948 files from around DC area Jan5-Jan10, 2021
    magnet:?xt=urn:btih:387b8615beec9b506b4f448af0002cd3d651dd00&dn=geocoded&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce


- JSON / CSV / KML Scrapes:
        Mirror/reup if able.
    https://gofile.io/d/p8RxUC - CSV, with all non-zero lat/log from donk's josn
    https://gofile.io/d/WVmqhR - KML, quick 'n dirty KMLmade from the csv
    https://gofile.io/d/DsUUte - KML, posts made 1/6/2020, DC Area Only
    https://gofile.io/d/EJczW8 - CSV, Cleaned ver of 1/6 in DC
    https://gofile.io/d/PUxeV4 - CSV, Cleaned ver of all available gettagged data
    https://gofile.io/d/zKTsWr - list of videos taken with 100m of a LE or gov't building, all-time
    https://gofile.io/d/7TGoWj - JSON, All posts from Jan6
        Also magnet:?xt=urn:btih:03b3250bcf3fc335d74605709f8e081929d2bda7&dn=parler_posts_json.zip&tr=http%3a%2f%2f128.199.70.66%3a5944%2fannounce&tr=udp%3a%2f%2f194.106.216.222%3a80%2fannounce
    https://github.com/acanthias13/legendary-octo-guacamole - backup of Clean CSVs


===========
-  Needs to be sorted.
    http://donk.sh/06d639b2-0252-4b1e-883b-f275eff7e792/
    https://web.archive.org/web/timemap/?url=https%3A%2F%2Fimage-cdn.parler.com%2F&matchType=prefix&collapse=urlkey&output=json&fl=original%2Cuniqcount&filter=!statuscode%3A%5B45%5D
    https://irc.gammaspectra.live/eaa6fa678444b5f4/videos.txt
    https://gist.github.com/kylemcdonald/8fdabd6526924012c1f5afe538d7dc09


===================================

HOW TO VIEW WARC/ZSTD from ArchiveTeam's Parler scrape
# How to View Parler Archive "megawarc.warc.zst" files.
These are official zstd archive and warc standards.
They are uploading to: https://archive.org/details/archiveteam_neparlepas

$ tar -I zstd -xvf archive.tar.zst

===Old.
1. Install Python 3.7
2. Execute: pip install zstandard==0.10.2
3. Download archive from here: https://archive.org/details/archiveteam_neparlepas?tab=collection
4. Copy this script into a new file called xtract.py: https://hastebin.com/bugedubaxi.py
5. Execute: python ./xtract.py /path/to/parler_blahblah.megawarc.warc.zst > dict
6. Execute: zstd -d /path/to/parler_blahblah.megawarc.warc.zst -D dict
7. Import the decompressed parler_blahblah.megacarc.warc file into this tool: https://github.com/webrecorder/webrecorder-desktop

If you cannot install Python 3.7 for some reason, or just want a container, a dockerfile is available at:
    https://gist.github.com/shoghicp/6ce05806ffc805929667ec2d4c62aba2
	Data & Tools:
	Many contributors. Thanks to all.

	Contact:
	ParlerAnalysis@protonmail.com

	IRC Channels:
	#parlerparsers at https://webirc.hackint.org/
	#parlerparsers-video for video IDing

	Please register your nick and at least take a vhost before joining to mask your IP. Using a VPN or Tor is recommended.
	/msg NickServ register <yourpassword> <email@foo.bar>
	/msg hostserv take hackint/user/$account

	FBI Tips:
	https://tips.fbi.gov/digitalmedia/aad18481a3e8f02

	Many dev efforts are being consolidated in:
	https://github.com/ozywog/parler-data-tools

	Open spreadsheet for listing notable video IDs:
	https://docs.google.com/spreadsheets/d/1ThPUH5HgTcVKCoyfr2oJ21AWKTGq-dR-cRZjPOER-Q0/edit#gid=0

	Listing of videos:
	tommycarsten.com/terrorism/index.html - Most videos posted from Capitol Hill on Jan6th
	https://www.youtube.com/channel/UCZk6IiAVk2QwOdljEAYCPLw - More of the same, also avail on mega - link in Resources

	Various Maps, most are focused on Jan 6th:
	https://thepatr10t.github.io/yall-Qaeda
	https://kylemcdonald.net/parler/map/
	https://fortress.maptive.com/ver4/a3486a6ab9a9a12aa9a9cb067839079c/410491
	https://darthnithin.github.io/earth/index.html
	https://parlervid.herokuapp.com/
	!!! VideoID can be added to url to download a video, ala https://parlervid.herokuapp.com/VIDEOID

	Want to help but don't know how?
	Download copies of data and scripts. rehost them elsewhere, and seed torrents.
	Help make this file easier for other to understand.
	Like-minded list with nice formatting - https://github.com/rljacobson/CapitolResources/
	Develop ways to make data easy to visualize and sort with current tools
	Come ask in IRC about current efforts.



	Tools
	================================

	- Dataset preprocessors
	zip of HTML posts -> json, tar.gz of vid metadata -> json
	https://gitlab.com/-/snippets/2060956


	- Script to scrape videos: video scraper:
	https://github.com/darthnithin/parlervideoscraper (py)
	https://github.com/acanthias13/reimagined-dollop (R)
	You will need to generate CSVs from gonk's metadata.tar.gz from to use these
	Or a csv from another source, such as one listed in this doc.


	- Script to generate a list of unique names and usernames then collect all the
	posts and associate them with the person who posted them
	Requires raw html source:
	https://github.com/billstrobl/Prooter
	https://github.com/billstrobl/Prooter/blob/master/prooter.py


	- Script to extract images/videos from WARCs:
	https://gist.github.com/redd-dedd/9a200a9ba789f312faf53b25ac63e024


	Resources
	=================================

	- Magnet URI for torrent of file that contains 1.8 million texts scraped from
	Parler and is subet of full data. Originally hosted on https://parler-archive.deadops.de/
	This is the parler_2020-01-06_posts-partial torrent that was spread early on.
	magnet:?xt=urn:btih:FF29970B902657A32D561C0720E70FACFB8C4284&dn=parler_2020-01-06_posts-partial&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.internetwarriors.net%3a1337%2fannounce


	- Metadata json files with EXIF data on all MP4 videos scraped from Parler:
	donk.sh/metadata.tar.gz
	magnet:?xt=urn:btih:1723e27bc79186c4574ff056ddb458d771c26e2f&dn=metadata.tar.gz&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fexplodie.org%3A6969&tr=udp%3A%2F%2
	SHA256: 66809d9ae0a5a6577a3c80bb623562274ceccd96b35519f15f568d09cefc56f8 metadata.tar.gz


	! Torrent of ~all videos from Parler's CDN - Said to contain more than the archive.org pull
	Ongoing split torrent - recheck your chunks often
	README FIRST: https://gist.github.com/shoghicp/714f590f3a175635b7a377905bd21ea4
	https://pl.gammaspectra.live/


	- Usernames and posts, seperated from dataset
	https://drive.google.com/file/d/1Lo4I2du5rGSKqPcrC_hnEvrLTufDqTpy/view?usp=sharing


	- Pictures / Images
	https://irc.gammaspectra.live/339648d275d2712b/imagelist.zip
	List of all image filepaths from Internet Archive collection
	https://par.pw/v1/photo?id=IMAGEID
	Webtool to download images. You'll need other tools to get IDs


	- Massive listing of Jan 6thmedia from across multiple socials
	All are the same dump as far as we're aware.
	https://capitol-hill-riots.s3.us-east-1.wasabisys.com/directory.html
	Looks to be the same as the mega dump below, but easier to grab from.
	https://mega.nz/folder/30MlkQib#RDOaGzmtFEHkxSYBaJSzVA
	lilprincess.tk/storage/capitol_riots


	- Videos From DC Area, Jan 6th. Estimated to only be about 10% of what was available, at this moment
	https://www.youtube.com/channel/UCZk6IiAVk2QwOdljEAYCPLw
	https://mega.nz/file/Pkk2VSRT#x-Gnl1-FddGwHumBXAGsCJ2FL1VHE-Y-u2SFW48KpeQ


	- 948 files from around DC area Jan5-Jan10, 2021
	magnet:?xt=urn:btih:387b8615beec9b506b4f448af0002cd3d651dd00&dn=geocoded&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce


	- JSON / CSV / KML Scrapes:
	Mirror/reup if able.
	https://gofile.io/d/p8RxUC - CSV, with all non-zero lat/log from donk's josn
	https://gofile.io/d/WVmqhR - KML, quick 'n dirty KMLmade from the csv
	https://gofile.io/d/DsUUte - KML, posts made 1/6/2020, DC Area Only
	https://gofile.io/d/EJczW8 - CSV, Cleaned ver of 1/6 in DC
	https://gofile.io/d/PUxeV4 - CSV, Cleaned ver of all available gettagged data
	https://gofile.io/d/zKTsWr - list of videos taken with 100m of a LE or gov't building, all-time
	https://gofile.io/d/7TGoWj - JSON, All posts from Jan6
	Also magnet:?xt=urn:btih:03b3250bcf3fc335d74605709f8e081929d2bda7&dn=parler_posts_json.zip&tr=http%3a%2f%2f128.199.70.66%3a5944%2fannounce&tr=udp%3a%2f%2f194.106.216.222%3a80%2fannounce
	https://github.com/acanthias13/legendary-octo-guacamole - backup of Clean CSVs


	===========
	- Needs to be sorted.
	http://donk.sh/06d639b2-0252-4b1e-883b-f275eff7e792/
	https://web.archive.org/web/timemap/?url=https%3A%2F%2Fimage-cdn.parler.com%2F&matchType=prefix&collapse=urlkey&output=json&fl=original%2Cuniqcount&filter=!statuscode%3A%5B45%5D
	https://irc.gammaspectra.live/eaa6fa678444b5f4/videos.txt
	https://gist.github.com/kylemcdonald/8fdabd6526924012c1f5afe538d7dc09



	===================================

	HOW TO VIEW WARC/ZSTD from ArchiveTeam's Parler scrape
	# How to View Parler Archive "megawarc.warc.zst" files.
	These are official zstd archive and warc standards.
	They are uploading to: https://archive.org/details/archiveteam_neparlepas

	$ tar -I zstd -xvf archive.tar.zst

	===Old.
	1. Install Python 3.7
	2. Execute: pip install zstandard==0.10.2
	3. Download archive from here: https://archive.org/details/archiveteam_neparlepas?tab=collection
	4. Copy this script into a new file called xtract.py: https://hastebin.com/bugedubaxi.py
	5. Execute: python ./xtract.py /path/to/parler_blahblah.megawarc.warc.zst > dict
	6. Execute: zstd -d /path/to/parler_blahblah.megawarc.warc.zst -D dict
	7. Import the decompressed parler_blahblah.megacarc.warc file into this tool: https://github.com/webrecorder/webrecorder-desktop

	If you cannot install Python 3.7 for some reason, or just want a container, a dockerfile is available at:
	https://gist.github.com/shoghicp/6ce05806ffc805929667ec2d4c62aba2