Skip to content

Instantly share code, notes, and snippets.

View hadyelsahar's full-sized avatar
🌍

Hady Elsahar hadyelsahar

🌍
View GitHub Profile
67483 q #102 :: <|endoftext|> ----> The whole movie was amazing. It was about a couple of people, one of whom is a writer, who find themselves in
67484 q #102 :: <|endoftext|> ----> I had the chance to see this movie when I was in college, and I loved it! It was absolutely amazing. I
67485 q #102 :: <|endoftext|> ----> It is amazing to me that someone would do something like this, and even more so that someone would do something so "h
67486 q #102 :: <|endoftext|> ----> : I can't think of a better way to describe this movie. The acting is amazing, the writing is amazing, the
67487 q #102 :: <|endoftext|> ----> <br /><br />My favorite part of this movie was the ending, as it was so amazing. I had no
67488 q #102 :: <|endoftext|> ----> "The Good, The Bad, The Ugly and the Ugly" is a really good film that you can watch for
67489 q #102 :: <|endoftext|> ----> -When an astronaut goes on a space mission, the result is a perfect storm of emotions, visions, dreams, and feelings
67490 q #102 :: <|endoftext|> ----> . <
%run generate_MDSU.py --task mtl_unsupervised_summarization --MDSUdata /tmp-network/fast/hady/projects/summarization/gurkensalad/data/prep/meansum_tfidf_control_sent_cat_aspects/MDSU-bin/ --path /tmp-network/fast/hady/projectssummarization/gurkensalad/checkpoints/meansum_checkpoints_1329820/checkpoint_best.pt --max-tokens 5000 --rouge --rouge-meansum --n-refs 1 --gen-subset test --remove-bpe sentencepiece --no-repeat-ngram-size 3 --beam 35 --lenpen 1.2 --bert-score --clf-score /tmp-network/fast/hady/projects/summarization/gurkensalad/checkpoints/sentiment_aspects_classifiers/meansum/sentiment_clf.pk --prefix-size 15 --mtl-clf-score /tmp-network/fast/hady/projects/summarization/gurkensalad/checkpoints/sentiment_aspects_classifiers/meansum/aspects_mtl_clf.pk --results-path /tmp-network/fast/hady/projects/summarization/gurkensalad/checkpoints/meansum_checkpoints_1329820_checkpoint_best.hypo

Keybase proof

I hereby claim:

  • I am hadyelsahar on github.
  • I am hady_elsahar (https://keybase.io/hady_elsahar) on keybase.
  • I have a public key ASC_Qlut6OL_a_Ak13lXRFoGl2c1ArMYlHOdPy-nYMQrtwo

To claim this, I am signing this object:

,ann_name,annotations,golden,nann,nfacts,original_sentence,sent_count,triples,true,interann
0,NosubSPOAligner,[47],True,47,1,Aritz Aduriz Zubeldia (born 11 February 1981) is a Spanish professional footballer who plays for Athletic Bilbao as a striker.,1,['Aritz Aduriz \tmember of sports team\t Athletic Bilbao'],[1],[1.0]
1,Simple-Aligner,"[5, 5, 5, 3, 5, 3, 5, 5]",False,5,8,"Max Riemelt (born in East Berlin, East Germany on 7 January 1984) is a German actor. He is best known for playing Wolfgang Bogdanow in the television series Sense8.",2,"['Max Riemelt \toccupation\t actor', 'Max Riemelt \tdate of birth\t 7 January 1984', 'East Berlin \tcountry\t East Germany', 'East Berlin \tcapital of\t East Germany', 'East Berlin \tlocated in the administrative territorial entity\t East Germany', 'East Germany \tcapital\t East Berlin', 'East Germany \tcontains administrative territorial entity\t East Berlin', 'Sense8 \tcast member\t Max Riemelt']","[1, 1, 1, 1, 1, 1, 1, 1]","[1.0, 1.0, 1.0, 0.6, 1.0, 0.6, 1.0, 1.0]"
2,No
@hadyelsahar
hadyelsahar / gist:546b0733b09bbb6e14265b3bf562f6ad
Created April 14, 2018 12:19
Extracting text from Wikipedia English Dump
#!/usr/bin/env bash
# following the tutorial in https://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
git clone https://github.com/bwbaugh/wikipedia-extractor
wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
mkdir extracted
# adding execution rights to wikiextractor
sudo chmod u+x ./wikipedia-extractor/WikiExtractor.py
Interparcel
Interparcel deliver any content - boxes, whole house or office belongings
Deliver to over 250 countries. Find out if your home country is included
They use known couriers like UPS, FedEx and Parcel Force to transport your goods
Quick online booking process. Enter the dimensions of your parcel and get a quote
Parcel Monkey
Parcel Monkey deliver parcels or boxes
wget https://www.dropbox.com/s/tohrsllcfy7rch4/SimpleQuestions_v2.tgz
tar -xvzf ./SimpleQuestions_v2.tgz
# number of unique predicates in annotated Simple Question Dataset
cat ./SimpleQuestions_v2/annotated_fb_data_* | cut -d" " -f2 | sort | uniq | wc -l
# 1837
# number of unique predicates in FB5M (a subset of freebase)
cat ./SimpleQuestions_v2/freebase-subsets/freebase-FB5M.txt | cut -d" " -f2 | sort | uniq | wc -l
# 7523
@prefix ann: <http://triplr.dbpedia.org/resource/> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
@prefix nif: <http://ontology.neuinfo.org/NIF/Backend/nif_backend.owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix ann: <http://triplr.dbpedia.org/resource/> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix wdt: < http://www.wikidata.org/prop/direct/> .
#### MAIN PARAGRAPHS #####
wd:Q228?nif=context # repeated for every document (Q228 is the document.pageuri)
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "1306"^^xsd:nonNegativeInteger ;
# make sure it's skipped
nif:isString """Andorra (/ænˈdɔːrə/; [ənˈdorə], [anˈdɔra]), officially the Principality of Andorra (Catalan: Principat d'Andorra), also called the Principality of the Valleys of Andorra (Catalan: Principat de les Valls d'Andorra), is a sovereign landlocked microstate in Southwestern Europe, located in the eastern Pyrenees mountains and bordered by Spain and France. Created under a charter in A.D. 988, the present Principality was formed in A.D. 1278. It is known as a principality as it is a monarchy headed by two Co-Princes – the Spanish/Roman Catholic Bishop of Urgell and the President
import argparse
from IPython.core.debugger import Tracer;
from sklearn.preprocessing import normalize
import numpy as np
import pandas as pd
debug_here = Tracer()
import os