Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save n0rdlicht/912abf0b9bb4fd50c753da57ccd335af to your computer and use it in GitHub Desktop.
Save n0rdlicht/912abf0b9bb4fd50c753da57ccd335af to your computer and use it in GitHub Desktop.
Collect video URLs and GPS data for Parler videos.

Parler video GPS data

My goal is to help people researching the spread of far-right movements, and trying to identify people tresspassing at the Capitol on January 6th. Do not contact me about doxxing someone, or leave comments attempting to doxx someone, just because they uploaded a video.

Download the collected CSV data here.

Using metadata collected by https://twitter.com/donk_enby

No rights reserved

image

How to view a video on Parler

Let's say you have a video ID, like wfy02V209EwU taken from this map

The link for the video kiBnNejo0JW0 is at https://video.parler.com/ki/Bn/kiBnNejo0JW0. Notice that the link is: https://video.parler.com/ followed by the first two characters of the video ID, then /, then the next two characters, another /, and the video ID.

But if you click on this, you will get an error.

This is because the Parler video servers are still running (as of January 12) but the Domain Name System (DNS) is not running.

That means that video.parler.com does not get mapped to the video server address correctly.

To fix this, you need to manually patch your system's DNS records. Then you can click on the link.

Mac

  1. Open the Terminal app.
  2. Type sudo nano /etc/hosts
  3. Type your password.
  4. Press the down arrow key until you are past the lines starting with #.
  5. Paste this line: 8.240.242.124 video.parler.com
  6. Type Ctrl+O to save.
  7. Type Ctrl+X to exit.

Windows

Follow these instructions and add the line 8.240.242.124 video.parler.com to your hosts file.

Errors

Some of the videos have been taken down. These will show a message saying "AllAccessDisabled".

Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, download [python-utils](https://github.com/kylemcdonald/python-utils) to the `utils` folder.\n",
"\n",
"`pip install ujson` for fast json loading."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from utils.list_files import list_files\n",
"from utils.progress import progress_parallel\n",
"\n",
"import ujson as json\n",
"\n",
"from itertools import islice\n",
"from collections import Counter\n",
"import datetime\n",
"import os"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load all the `.json` files from the `metadata/` folder. Should take two minutes."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1032523 0:02:11 7910.85/s\n"
]
}
],
"source": [
"def load(fn):\n",
" with open(fn) as f:\n",
" data = json.load(f)\n",
" video_id = os.path.basename(fn).split('.')[0].split('-')[1]\n",
" for i in range(len(data)):\n",
" data[i]['tag'] = video_id\n",
" return data\n",
"\n",
"metadata = progress_parallel(load, list_files('metadata'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Save all the results to a pickle file. This way we can reload quickly later."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1032523"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pickle\n",
"with open('metadata.pkl', 'wb') as f:\n",
" pickle.dump(metadata, f)\n",
" \n",
"# with open('metadata.pkl', 'rb') as f:\n",
"# metadata = pickle.load(f)\n",
"\n",
"len(metadata)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# show common metadata keys\n",
"# all_keys = []\n",
"# for m in metadata:\n",
"# for e in m:\n",
"# all_keys.extend(e.keys())\n",
"# counts = Counter(all_keys)\n",
"# print(counts.most_common(20))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def dms_to_dd(dms):\n",
" parts = dms.split(' ')\n",
" d = int(parts[0])\n",
" m = int(parts[2][:-1])\n",
" s = float(parts[3][:-1])\n",
" dd = d + float(m)/60 + float(s)/3600\n",
" if parts[-1] == 'W' or parts[-1] == 'S':\n",
" dd *= -1\n",
" return dd\n",
"\n",
"def geocoded():\n",
" for m in metadata:\n",
" for e in m:\n",
" try:\n",
" lon = e['GPSLongitude']\n",
" lat = e['GPSLatitude']\n",
" created = e['CreateDate']\n",
" video_id = e['tag']\n",
" lon = dms_to_dd(lon)\n",
" lat = dms_to_dd(lat)\n",
" created = datetime.datetime.strptime(created, '%Y:%m:%d %H:%M:%S')\n",
" yield (lon, lat, created, video_id)\n",
" except KeyError:\n",
" continue\n",
" except ValueError:\n",
" continue\n",
"\n",
"# collect all geocoded data\n",
"collected = list(geocoded())\n",
"\n",
"# write all data to a single file\n",
"# with open('all-geocoded.csv', 'w') as f:\n",
"# f.write('Longitude,Latitude,Timestamp,ID\\n')\n",
"# for lon, lat, created, video_id in collected:\n",
"# f.write(f'{lon},{lat},{created},{video_id}\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Define bounding box and start date."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"# capitol only\n",
"# top,left = [38.892110, -77.015188]\n",
"# bottom,right = [38.887619, -77.005959]\n",
"\n",
"# dc area\n",
"top,left = [38.978921, -77.140698]\n",
"bottom,right = [38.808277, -76.914339]\n",
"\n",
"start = datetime.datetime(2021,1,5)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1499"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"to_download = []\n",
"for lon,lat,created,video_id in collected:\n",
" if lon < left or lon > right or lat > top or lat < bottom:\n",
" continue\n",
" if created < start:\n",
" continue\n",
" to_download.append(video_id)\n",
" \n",
"len(to_download)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Save all urls to a text file."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"with open('dc-area.txt', 'w') as f:\n",
" for e in to_download:\n",
" f.write(f'https://video.parler.com/{e[:2]}/{e[2:4]}/{e}.mp4\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Download all the urls in parallel by using [this downloader script](https://gist.github.com/kylemcdonald/3cbd09752e340849e4b3cb4f12dd8c85).\n",
"\n",
"`python download-urls.py -l ../washington-area.txt`\n",
"\n",
"You will need to edit your `/etc/hosts` to add the following line:\n",
" \n",
"`8.240.242.124 video.parler.com`"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment