Skip to content

Instantly share code, notes, and snippets.

@jacKlinc
jacKlinc / all-articles.json
Created March 12, 2024 12:08
All Bellingcat Articles JSON (until Sep 2023)
This file has been truncated, but you can view the full file.
[
{
"publish_date": "2014-07-31",
"title": "Did Coulson\u2019s News of the World Incite Others to Commit Crimes and Cause Unsafe Convictions?",
"url": "https://www.bellingcat.com/news/uk-and-europe/2014/07/31/did-coulsons-news-of-the-world-incite-others-to-commit-crimes-and-cause-unsafe-convictions/",
"articles_text": "\n\nMore on the Fake Sheikh, the Police, and News of the World by occasional blogger @jpublik.\n\nAndy Coulson\u2018s News of the World sent a man to jail after luring him to sell them drugs he was terrified of carrying by promising him a job. He was sentenced to four years in prison before his conviction was quashed \u2013 after he\u2019d already served his time.\n\nIn a case which has hardly received any publicity, according to high court documents, Albanian Besnik Qema was asked to supply News of the World cocaine and a passport on a promise of job as security for a wealthy Arab family.\n\nThe High Court documents detail how in January 2005, Mazher Mahmood had asked Florim
@jacKlinc
jacKlinc / all-bellingcat-articles.csv
Created February 15, 2024 09:15
All bellingcat articles
We can't make this file beautiful and searchable because it's too large.
year,month,url,path,articles_text,publish_date,title
2014,7,https://www.bellingcat.com/news/uk-and-europe/2014/07/31/did-coulsons-news-of-the-world-incite-others-to-commit-crimes-and-cause-unsafe-convictions/,/news/uk-and-europe/2014/07/31/did-coulsons-news-of-the-world-incite-others-to-commit-crimes-and-cause-unsafe-convictions/,"
More on the Fake Sheikh, the Police, and News of the World by occasional blogger @jpublik.
Andy Coulson‘s News of the World sent a man to jail after luring him to sell them drugs he was terrified of carrying by promising him a job. He was sentenced to four years in prison before his conviction was quashed – after he’d already served his time.
In a case which has hardly received any publicity, according to high court documents, Albanian Besnik Qema was asked to supply News of the World cocaine and a passport on a promise of job as security for a wealthy Arab family.
The High Court documents detail how in January 2005, Mazher Mahmood had asked Florim Gashi, a contact of his who h
@jacKlinc
jacKlinc / scrape-bellingcat-articles.ipynb
Last active February 13, 2024 15:06
Jupyter notebook that lists, downloads and saves all Bellingcat's articles to a CSV file
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jacKlinc
jacKlinc / download_all_bellingcat_articles.py
Created February 9, 2024 14:22
Download all Bellingcat articles and save HTML content to CSV file
import requests
from functools import reduce
from dataclasses import dataclass
from bs4 import BeautifulSoup
from newspaper import Article
import pandas as pd
BASE_URL = "https://www.bellingcat.com"
BELLINGCAT_START_YEAR = 2014 # earliest article on site
@jacKlinc
jacKlinc / multilayer_language_model.py
Created March 16, 2021 18:29
Multilayer RNN models. First uses built-in RNN class. Second implements LSTM to solve exploding gradients.
class LanguageModelMulti(Module):
"""
Deepening the model with the built-in RNN class for more accuracy
"""
def __init__(self, vocab_sz, n_hidden, n_layers):
self.i_h = nn.Embedding(vocab_sz, n_hidden)
# Creates an RNN within
self.rnn = nn.RNN(n_hidden, n_hidden, n_layers, batch_first=True)
self.h_o = nn.Linear(n_hidden, vocab_sz)
# Creates zeros for all layers
@jacKlinc
jacKlinc / improved_language_model.py
Created March 16, 2021 07:46
The first model resets the state, while the second improves on this by introducing more signal through increasing the sequence length.
class LanguageModelRecurrentState(Module):
"""
State is saved by moving the reset to the init method
Gradients are detached for all but 3 layers
"""
def __init__(self, vocab_sz, n_hidden):
self.i_h = nn.Embedding(vocab_sz, n_hidden)
self.h_h = nn.Linear(n_hidden, n_hidden)
self.h_o = nn.Linear(n_hidden, vocab_sz)
@jacKlinc
jacKlinc / basic_language_model.py
Last active March 12, 2021 07:29
The first language model explicitly declares each layer, while the second does the same with a loop.
class LanguageModel(Module):
"""
Takes three words as input and returns a probability for the next
The 1st layer will use the first word's embedding
The 2nd layer will use the 2nd word's embedding and the 1st word's output activations
The 3rd layer will use the 3rd word's embedding plus the 2nd word's output activations
"""
def __init__(self, vocab_sz, n_hidden):
self.i_h = nn.Embedding(vocab_sz, n_hidden) # Converts the indices to a vector
self.h_h = nn.Linear(n_hidden, n_hidden) # Creates the activations for the successive word
@jacKlinc
jacKlinc / JRE_Elon.txt
Created February 24, 2021 13:44
Analyse word count of a YouTube podcast video.
welcome back here we go again great to
see you and congratulations
thank you you will never forget what is
going on in the world when you think
about when your child is born you will
from sklearn.feature_extraction.text import CountVectorizer
def parse_txt(txt_file):
"""
Pass text file location and returns n list elements for each line in the file
"""
with open(txt_file, "r") as f:
# Reads files, removes new lines and appends to list
words = f.read().splitlines()
# Removes None elements
@jacKlinc
jacKlinc / dot_product_bias.py
Created January 22, 2021 07:53
Collaborative filtering model architecture for movie recommendation.
from fastai import *
from fastbook import *
def create_params(size):
"""
Pass tensor shape
Returns normalised model parameters
"""
return nn.Parameter(torch.zeros(*size).normal_(0, 0.01))