Skip to content

Instantly share code, notes, and snippets.

View pypt's full-sized avatar

Linas Valiukas pypt

View GitHub Profile
@pypt
pypt / validate_new.py
Last active February 10, 2020 23:28
Validate new feed parser
#!/usr/bin/env python3.7
import calendar
import os
import time
from mediawords.feed.parse import parse_feed
input_dir = '/feeds/'
output_dir = '/feed_results_new/'
@pypt
pypt / validate_old.pl
Last active February 10, 2020 23:28
Validate old feed parser
#!/usr/bin/env perl
use strict;
use warnings;
use Encode;
use File::Basename;
use File::Slurp;
use Time::Piece;
@pypt
pypt / test_tm_mine.t
Created February 5, 2020 00:41
test_tm_mine.t log
./dev/run_test.py apps/topics-mine/tests/perl/test_tm_mine.t ✭master
WARNING: The MC_CRIMSON_HEXAGON_API_KEY variable is not set. Defaulting to a blank string.
WARNING: The MC_TWITTER_CONSUMER_KEY variable is not set. Defaulting to a blank string.
WARNING: The MC_TWITTER_CONSUMER_SECRET variable is not set. Defaulting to a blank string.
WARNING: The MC_TWITTER_ACCESS_TOKEN variable is not set. Defaulting to a blank string.
WARNING: The MC_TWITTER_ACCESS_TOKEN_SECRET variable is not set. Defaulting to a blank string.
/opt/mediacloud/tests/perl/test_tm_mine.t .. main: starting hash server 0
main: starting hash server 1
main: starting hash server 2
main: starting hash server 3
@pypt
pypt / gist:cdb03bddfe3b509d39138b6a2e390fc2
Created November 26, 2019 19:51
Slate's Political Gabfest - DP [Hearts] MB
Hello and welcome to the Slate political gabfest for November 14th, 2019. The DP Arts MB addition. I am said DP David Plots of Atlas obscura joining me from New Haven is Emily bazelon of Yale University of not New Haven's Boston, but that's right off to New Haven, but she's in New England New England. It's just like a one
Small region up there. Hello Emily bazelon of Yale the New York Times magazine. Hello and chuckling in his usual warm specifically way is John Dickerson of cbs's 60 minutes from New York. Hello John. Hello to John. Did you sleep out last night for covenant house? That's tonight. You're sleeping out tonight. Yeah. Is there a way that people could hear this and still support it? Oh, you're so sweet. Yeah. I mean, I guess if they just go to my Twitter page, it's the PIN.
and tweet and that'll take you to the donors choose website where they can donate people have been extremely generous and it's been really lovely to see
on today's gabfest will the first public impeachment hearing change
@pypt
pypt / land_id.py
Created November 26, 2019 19:44
lang_id.py
import logging
from typing import Optional
from urllib.parse import urlparse
import cld2
logging.basicConfig(level=logging.DEBUG)
UNKNOWN_LANGUAGE_CODE = 'un-UN'
@pypt
pypt / nytlabels-stories.py
Created June 29, 2018 19:49
Search for NYTLabels-themed stories
import mediacloud
def nyt_labels_tag_id_for_tag(mc, tag_name):
"""Return a tags_id for a NYTLabels tag name to be used to search for stories tagged with said tag."""
# Every NYTLabels-"themed" story is internally tagged with a name of the theme
tags = mc.tagList(name_like=tag_name)
for tag in tags:
@pypt
pypt / Implement topic creation using machine learning for MediaCloud.md
Created September 1, 2017 01:54 — forked from DonggeLiu/Implement topic creation using machine learning for MediaCloud.md
A description of my work in the Google Summer of Code Project with The Berkman Klein Center for Internet & Society at Harvard University

I have contributed to the Media Cloud in this summer as my Google Summer of Code project 2017.

Overview

Project Background: Host Organisation

The Berkman Klein Center for Internet & Society at Harvard University is dedicated to exploring, understanding, and shaping the development of the digitally-networked environment. A diverse, interdisciplinary community of scholars, practitioners, technologists, policy experts, and advocates, we seek to tackle the most important challenges of the digital age while keeping a focus on tangible real-world impact in the public interest. Our faculty, fellows, staff and affiliates conduct research, build tools and platforms, educate others, form bridges and facilitate dialogue across and among diverse communities.

Project Description

In this project, I developed a proof of concept machine learning tool for topic modelling. Specifically, it uses uns

@pypt
pypt / log4perl-speed.pl
Created July 29, 2016 15:51
Log::Log4perl speed with and without sub { ... }
#!/usr/bin/env perl
use strict;
use warnings;
use Log::Log4perl qw(:easy);
use Data::Dumper;
use Time::HiRes;
use Readonly;
@pypt
pypt / update-then-insert-media_id_26.sql
Created July 8, 2016 19:02
story_sentences "UPDATE-then-INSERT" with media_id = 26
WITH new_sentences (disable_triggers, language, media_id, publish_date, sentence, sentence_number, stories_id) AS (VALUES
-- New sentences to potentially insert
(FALSE, 'sk', 26, '2013-09-19 07:00:00'::timestamp, 'GM McKenzie praises Pryor', 0, 156642155),
(FALSE, 'en', 26, '2013-09-19 07:00:00'::timestamp, 'ALAMEDA – An eventful Raiders offseason orchestrated by general manager Reggie McKenzie included a trade for a starting quarterback with arm trouble and Al Davis'' final draft pick taking the helm behind center.', 1, 156642155),
(FALSE, 'en', 26, '2013-09-19 07:00:00'::timestamp, 'McKenzie also drafted a quarterback in the fourth round and cut him in favor of an undrafted free agent.', 2, 156642155),
(FALSE, 'en', 26, '2013-09-19 07:00:00'::timestamp, 'And he is fine with that.', 3, 156642155),
(FALSE, 'en', 26, '2013-09-19 07:00:00'::timestamp, 'Talking with reporters Wednesday for the first time since training camp began in late July, McKenzie made it clear he''s willing
@pypt
pypt / deadlocking-query.sql
Created July 8, 2016 01:33
Deadlocking query
WITH new_sentences (disable_triggers, language, media_id, publish_date, sentence, sentence_number, stories_id) AS (VALUES
-- New sentences to potentially insert
(FALSE, 'ro', 40536, '2013-04-17 17:38:37'::timestamp, 'Primaria nu vrea ca iesenii sa scape de taxa de timbru', 0, 110124077),
(FALSE, 'ro', 40536, '2013-04-17 17:38:37'::timestamp, 'Un consilier PP-DD a venit cu o solutie la drumurile pe care iesenii le fac pentru a plati taxele.', 1, 110124077),
(FALSE, 'ro', 40536, '2013-04-17 17:38:37'::timestamp, 'Avocatul Anca Preda a initiat un proiect de hotarare prin care oamenii pot achita taxa de timbru prin posta.', 2, 110124077),
(FALSE, 'ro', 40536, '2013-04-17 17:38:37'::timestamp, 'Tot ceea ce au de facut reprezentantii primariei este sa incheie un contract cu posta, fara sa plateasca vreun comision.”Nu ar fi niciun efort din partea primariei.', 3, 110124077),
(FALSE, 'ro', 40536, '2013-04-17 17:38:37'::timestamp, 'Nu se cere niciun commision.', 4, 110124077),
(FALSE,