Forked from Parler-Analysis/gist:2c023fd2e053fba5bc85b09209f606eb
Created
January 13, 2021 22:32
-
-
Save wfellis/94e5695eb514bd3ad372d6bc56d6c3c8 to your computer and use it in GitHub Desktop.
Parler Data & Tools
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Data & Tools: | |
Many contributors. Thanks to all. | |
ParlerAnalysis@protonmail.com - Do not expect timely replies. | |
Channel: #parlerparsers at https://webirc.hackint.org/ | |
#parlerparsers-video for video IDing | |
FBI Tips: https://tips.fbi.gov/digitalmedia/aad18481a3e8f02 | |
Want to help but don't know how? | |
Download copies of data and scripts. rehost them elsewhere, and seed torrents. | |
Help make this file easier for other to understand. | |
Develop ways to make data easy to visualize | |
Come ask in IRC about current efforts. | |
================================ | |
(1) Metadata json files with EXIF data on all MP4 videos scraped from Parler: | |
donk.sh/metadata.tar.gz | |
magnet:?xt=urn:btih:1723e27bc79186c4574ff056ddb458d771c26e2f&dn=metadata.tar.gz&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fexplodie.org%3A6969&tr=udp%3A%2F%2 | |
SHA256: 66809d9ae0a5a6577a3c80bb623562274ceccd96b35519f15f568d09cefc56f8 metadata.tar.gz | |
(2) Script to download WARCS from archive.org once they process: | |
https://github.com/ozywog/parler-data-tools | |
(3) Magnet URI for torrent of file that contains 1.8 million texts scraped from | |
Parler and is subet of full data. Originally hosted on https://parler-archive.deadops.de/ | |
This is the parler_2020-01-06_posts-partial | |
magnet:?xt=urn:btih:FF29970B902657A32D561C0720E70FACFB8C4284&dn=parler_2020-01-06_posts-partial&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.internetwarriors.net%3a1337%2fannounce | |
(4) Script to generate a list of unique names and usernames then collect all the | |
posts and associate them with the person who posted them | |
Requires raw html source: | |
https://github.com/billstrobl/Prooter | |
https://github.com/billstrobl/Prooter/blob/master/prooter.py | |
(5) Script to scrape videos: video scraper: | |
https://github.com/darthnithin/parlervideoscraper | |
You will need the metadata.tar.gz from (1) to use this | |
(6) JSON / CSV / KML Scrapes: | |
https://gofile.io/d/p8RxUC - CSV, with all non-zero lat/log from donk's josn | |
https://gofile.io/d/WVmqhR - quick 'n dirty KMLmade from the csv | |
View KML Data on map - See (9) | |
https://gofile.io/d/DsUUte - KML of posts made 1/6/2020, DC Area Only | |
https://gofile.io/d/EJczW8 - CSV, Cleaned ver of 1/6 in DC | |
https://gofile.io/d/PUxeV4 - CSV, Cleaned ver of all available gettagged data | |
https://gofile.io/d/zKTsWr - list of videos taken with 100m of a LE or gov't building, all-time | |
(7) Script to extract images/videos from WARCs: | |
https://gist.github.com/redd-dedd/9a200a9ba789f312faf53b25ac63e024 | |
(8) Needs to be sorted. | |
http://donk.sh/06d639b2-0252-4b1e-883b-f275eff7e792/ | |
https://web.archive.org/web/timemap/?url=https%3A%2F%2Fimage-cdn.parler.com%2F&matchType=prefix&collapse=urlkey&output=json&fl=original%2Cuniqcount&filter=!statuscode%3A%5B45%5D | |
https://irc.gammaspectra.live/eaa6fa678444b5f4/videos.txt | |
https://gist.github.com/kylemcdonald/8fdabd6526924012c1f5afe538d7dc09 | |
https://github.com/acanthias13/legendary-octo-guacamole - backup of Clean CSVs | |
(9) Maps, both interactive and static heatmaps | |
kylemcdonald.net/parler/map/ | |
https://fortress.maptive.com/ver4/a3486a6ab9a9a12aa9a9cb067839079c/410491 | |
https://darthnithin.github.io/earth/index.html | |
=================================== | |
Videos From DC Area, Jan 6th. Estimated to only be about 10% of what was available, at this moment | |
https://www.youtube.com/channel/UCZk6IiAVk2QwOdljEAYCPLw | |
https://mega.nz/file/Pkk2VSRT#x-Gnl1-FddGwHumBXAGsCJ2FL1VHE-Y-u2SFW48KpeQ | |
Some -notable Video IDs, list open to public contrib | |
https://docs.google.com/spreadsheets/d/1ThPUH5HgTcVKCoyfr2oJ21AWKTGq-dR-cRZjPOER-Q0/edit#gid=0 | |
=================================== | |
HOW TO VIEW WARC/ZSTD from ArchiveTeam's Parler scrape | |
# How to View Parler Archive "megawarc.warc.zst" files. | |
These are official zstd archive and warc standards. | |
They are uploading to: https://archive.org/details/archiveteam_neparlepas | |
$ tar -I zstd -xvf archive.tar.zst | |
===Old. | |
1. Install Python 3.7 | |
2. Execute: pip install zstandard==0.10.2 | |
3. Download archive from here: https://archive.org/details/archiveteam_neparlepas?tab=collection | |
4. Copy this script into a new file called xtract.py: https://hastebin.com/bugedubaxi.py | |
5. Execute: python ./xtract.py /path/to/parler_blahblah.megawarc.warc.zst > dict | |
6. Execute: zstd -d /path/to/parler_blahblah.megawarc.warc.zst -D dict | |
7. Import the decompressed parler_blahblah.megacarc.warc file into this tool: https://github.com/webrecorder/webrecorder-desktop | |
If you cannot install Python 3.7 for some reason, a dockerfile is available at: | |
https://gist.github.com/shoghicp/6ce05806ffc805929667ec2d4c62aba2 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment