Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brossi/3abb692edf25aaabaef9648dbbd693fd to your computer and use it in GitHub Desktop.
Save brossi/3abb692edf25aaabaef9648dbbd693fd to your computer and use it in GitHub Desktop.
Parler Data & Tools
Data & Tools:
Many contributors. Thanks to all.
Contact:
ParlerAnalysis@protonmail.com
IRC Channels:
#parlerparsers at https://webirc.hackint.org/
#parlerparsers-video for video IDing
Please register your nick and at least take a vhost before joining to mask your IP. Using a VPN or Tor is recommended.
/msg NickServ register <yourpassword> <email@foo.bar>
/msg hostserv take hackint/user/$account
FBI Tips:
https://tips.fbi.gov/digitalmedia/aad18481a3e8f02
Many dev efforts are being consolidated in:
https://github.com/ozywog/parler-data-tools
Open spreadsheet for listing notable video IDs:
https://docs.google.com/spreadsheets/d/1ThPUH5HgTcVKCoyfr2oJ21AWKTGq-dR-cRZjPOER-Q0/edit#gid=0
Listing of videos:
tommycarsten.com/terrorism/index.html - Most videos posted from Capitol Hill on Jan6th
https://www.youtube.com/channel/UCZk6IiAVk2QwOdljEAYCPLw - More of the same, also avail on mega - link in Resources
Various Maps, most are focused on Jan 6th:
https://thepatr10t.github.io/yall-Qaeda
https://kylemcdonald.net/parler/map/
https://fortress.maptive.com/ver4/a3486a6ab9a9a12aa9a9cb067839079c/410491
https://darthnithin.github.io/earth/index.html
https://parlervid.herokuapp.com/
!!! VideoID can be added to url to download a video, ala https://parlervid.herokuapp.com/VIDEOID
Want to help but don't know how?
Download copies of data and scripts. rehost them elsewhere, and seed torrents.
Help make this file easier for other to understand.
Like-minded list with nice formatting - https://github.com/rljacobson/CapitolResources/
Develop ways to make data easy to visualize and sort with current tools
Come ask in IRC about current efforts.
Tools
================================
- Dataset preprocessors
zip of HTML posts -> json, tar.gz of vid metadata -> json
https://gitlab.com/-/snippets/2060956
- Script to scrape videos: video scraper:
https://github.com/darthnithin/parlervideoscraper (py)
https://github.com/acanthias13/reimagined-dollop (R)
You will need to generate CSVs from gonk's metadata.tar.gz from to use these
Or a csv from another source, such as one listed in this doc.
- Script to generate a list of unique names and usernames then collect all the
posts and associate them with the person who posted them
Requires raw html source:
https://github.com/billstrobl/Prooter
https://github.com/billstrobl/Prooter/blob/master/prooter.py
- Script to extract images/videos from WARCs:
https://gist.github.com/redd-dedd/9a200a9ba789f312faf53b25ac63e024
Resources
=================================
- Magnet URI for torrent of file that contains 1.8 million texts scraped from
Parler and is subet of full data. Originally hosted on https://parler-archive.deadops.de/
This is the parler_2020-01-06_posts-partial torrent that was spread early on.
magnet:?xt=urn:btih:FF29970B902657A32D561C0720E70FACFB8C4284&dn=parler_2020-01-06_posts-partial&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.internetwarriors.net%3a1337%2fannounce
- Metadata json files with EXIF data on all MP4 videos scraped from Parler:
donk.sh/metadata.tar.gz
magnet:?xt=urn:btih:1723e27bc79186c4574ff056ddb458d771c26e2f&dn=metadata.tar.gz&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fexplodie.org%3A6969&tr=udp%3A%2F%2
SHA256: 66809d9ae0a5a6577a3c80bb623562274ceccd96b35519f15f568d09cefc56f8 metadata.tar.gz
! Torrent of ~all videos from Parler's CDN - Said to contain more than the archive.org pull
Ongoing split torrent - recheck your chunks often
README FIRST: https://gist.github.com/shoghicp/714f590f3a175635b7a377905bd21ea4
https://pl.gammaspectra.live/
- Usernames and posts, seperated from dataset
https://drive.google.com/file/d/1Lo4I2du5rGSKqPcrC_hnEvrLTufDqTpy/view?usp=sharing
- Pictures / Images
https://irc.gammaspectra.live/339648d275d2712b/imagelist.zip
List of all image filepaths from Internet Archive collection
https://par.pw/v1/photo?id=IMAGEID
Webtool to download images. You'll need other tools to get IDs
- Massive listing of Jan 6thmedia from across multiple socials
All are the same dump as far as we're aware.
https://capitol-hill-riots.s3.us-east-1.wasabisys.com/directory.html
Looks to be the same as the mega dump below, but easier to grab from.
https://mega.nz/folder/30MlkQib#RDOaGzmtFEHkxSYBaJSzVA
lilprincess.tk/storage/capitol_riots
- Videos From DC Area, Jan 6th. Estimated to only be about 10% of what was available, at this moment
https://www.youtube.com/channel/UCZk6IiAVk2QwOdljEAYCPLw
https://mega.nz/file/Pkk2VSRT#x-Gnl1-FddGwHumBXAGsCJ2FL1VHE-Y-u2SFW48KpeQ
- 948 files from around DC area Jan5-Jan10, 2021
magnet:?xt=urn:btih:387b8615beec9b506b4f448af0002cd3d651dd00&dn=geocoded&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce
- JSON / CSV / KML Scrapes:
Mirror/reup if able.
https://gofile.io/d/p8RxUC - CSV, with all non-zero lat/log from donk's josn
https://gofile.io/d/WVmqhR - KML, quick 'n dirty KMLmade from the csv
https://gofile.io/d/DsUUte - KML, posts made 1/6/2020, DC Area Only
https://gofile.io/d/EJczW8 - CSV, Cleaned ver of 1/6 in DC
https://gofile.io/d/PUxeV4 - CSV, Cleaned ver of all available gettagged data
https://gofile.io/d/zKTsWr - list of videos taken with 100m of a LE or gov't building, all-time
https://gofile.io/d/7TGoWj - JSON, All posts from Jan6
Also magnet:?xt=urn:btih:03b3250bcf3fc335d74605709f8e081929d2bda7&dn=parler_posts_json.zip&tr=http%3a%2f%2f128.199.70.66%3a5944%2fannounce&tr=udp%3a%2f%2f194.106.216.222%3a80%2fannounce
https://github.com/acanthias13/legendary-octo-guacamole - backup of Clean CSVs
===========
- Needs to be sorted.
http://donk.sh/06d639b2-0252-4b1e-883b-f275eff7e792/
https://web.archive.org/web/timemap/?url=https%3A%2F%2Fimage-cdn.parler.com%2F&matchType=prefix&collapse=urlkey&output=json&fl=original%2Cuniqcount&filter=!statuscode%3A%5B45%5D
https://irc.gammaspectra.live/eaa6fa678444b5f4/videos.txt
https://gist.github.com/kylemcdonald/8fdabd6526924012c1f5afe538d7dc09
===================================
HOW TO VIEW WARC/ZSTD from ArchiveTeam's Parler scrape
# How to View Parler Archive "megawarc.warc.zst" files.
These are official zstd archive and warc standards.
They are uploading to: https://archive.org/details/archiveteam_neparlepas
$ tar -I zstd -xvf archive.tar.zst
===Old.
1. Install Python 3.7
2. Execute: pip install zstandard==0.10.2
3. Download archive from here: https://archive.org/details/archiveteam_neparlepas?tab=collection
4. Copy this script into a new file called xtract.py: https://hastebin.com/bugedubaxi.py
5. Execute: python ./xtract.py /path/to/parler_blahblah.megawarc.warc.zst > dict
6. Execute: zstd -d /path/to/parler_blahblah.megawarc.warc.zst -D dict
7. Import the decompressed parler_blahblah.megacarc.warc file into this tool: https://github.com/webrecorder/webrecorder-desktop
If you cannot install Python 3.7 for some reason, or just want a container, a dockerfile is available at:
https://gist.github.com/shoghicp/6ce05806ffc805929667ec2d4c62aba2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment