Skip to content

Instantly share code, notes, and snippets.

View baditaflorin's full-sized avatar

Florin Badita-Nistor baditaflorin

View GitHub Profile
We can't make this file beautiful and searchable because it's too large.
"user_username","article_url","image_count","post_tags","recommends","reading_time","title","text","link_count"
"neuroecology","https://medium.com/@neuroecology/punctuation-in-novels-8f316d542ec4","22","{Writing,Literature,""Data Visualization""}","2670","3.67641509433962","Punctuation in novels","","1"
"eklimcz","https://medium.com/truth-labs/designing-data-driven-interfaces-a75d62997631","14","{""Data Visualization"",""Design Thinking"",UX}","2660","7.83867924528302","Designing Data-Driven Interfaces","","2"
"quincylarson","https://medium.com/free-code-camp/the-economics-of-working-remotely-28d4173e16e2","5","{Tech,""Life Lessons"",""Data Science"",Travel,Startup}","2068","3.95786163522013","Fitter. Happier. More productive. Working remotely.","Travel the world as a digital nomad. Surf a new beach every morning. Eat a different local cuisine each night.
Or just stay home all day in your pajamas.
It doesn’t really matter. You can get your work done either way.
More than 10% of Americans now work remotely.
I’
---------------------------SELECT --------------------------------------------
select mps.user_username, -- 1st column
mps.article_url, -- 2nd column
mps.image_count, -- 3rd column
mps.post_tags, -- 4th column
mps.recommends, -- 5th column
mps.reading_time, -- 6th column
mps.title, -- 7th column
mpl.link_count, -- 8th column - we get this data from the left join, were we do a subquery
'' full_text, -- dummy 9th column
#!/bin/bash
echo 'Based on the work of Frederik Ramm https://lists.openstreetmap.org/pipermail/osmosis-dev/2013-October/001613.html'
CMDLINE=`
echo "--read-xml $1"
echo "--sort"
shift
while [[ $# > 0 ]]
do
echo "--read-xml $1"
echo "--sort"
select s.*,tag_name,title,mps.user_username,post_tags,article_url from (
SELECT post_id,ts_headline(text, keywords, 'MaxFragments=35,MaxWords=50,MinWords=6') as result
-- tweak the setting to reflect what you want. the text column is where i have the text
FROM medium_posts_text mptxt, plainto_tsquery('pg_catalog.english','training') as keywords
--change bot with the word that you are searching
WHERE to_tsvector(text) @@ keywords
) s
inner join medium_posts_tags mpt on mpt.post_id = s.post_id
inner join medium_posts_stats mps on mps.post_id = s.post_id
select * from (
select
regexp_split_to_table(lower(post_text), '\s+') as word
, count(1) as word_count
from
(select post_text from
We can't make this file beautiful and searchable because it's too large.
"user_username","article_url","image_count","post_tags","recommends","reading_time","title","link_count"
"ageitgey","https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471","12","{""Machine Learning""}","4132","14.0216981132075","Machine Learning is Fun!","12"
"tonyaub","https://medium.com/swlh/no-ui-is-the-new-ui-ab3f7ecec6b3","12","{Design,""Artificial Intelligence"",UI}","3666","7.7877358490566","No UI is the New UI","9"
"cdixon","https://medium.com/@cdixon/eleven-reasons-to-be-excited-about-the-future-of-technology-ef5f9b939cb2","32","{Technology,""Artificial Intelligence"",Future,Robotics,Space}","3658","11.1047169811321","Eleven Reasons To Be Excited About The Future of Technology","17"
"2noame","https://medium.com/basic-income/deep-learning-is-going-to-teach-us-all-the-lesson-of-our-lives-jobs-are-for-machines-7c6442e37a49","5","{""Artificial Intelligence"",""Machine Learning"",""Basic Income""}","3101","13.7276729559748","Deep Learning Is Going to Teach Us All the Lesson of Our Lives: Jobs
User Username Recommends
stevenlevy 12,636
ageitgey 9,605
cdixon 4,519
perborgen 4,215
tonyaub 3,666
olivercameron 3,552
2noame 3,151
GilFewster 2,608
intercom 2,270
@baditaflorin
baditaflorin / medium_top_1000_tags.csv
Created June 4, 2017 11:14
This is based on a scrapping project that i did, where i downloaded the list with all of the posts from medium.com https://medium.com/@baditaflorin
We can make this file beautiful and searchable if this error is corrected: It looks like row 9 should actually have 7 columns, instead of 4. in line 8.
"tag_name","count_tag_name","avg_reading_time","avg_recommends","avg_image_count","distinct_users","avg_post_data"
"Startup","134323","2.71552486545965","13.9311510314689219","1.7488739828622053","61152","2016-04-14 18:21:12.174416+03"
"Life","104197","1.78386182016248","9.0952714569517357","0.96809888960334750521","50767","2016-05-15 04:42:14.355121+03"
"Politics","99301","3.14315696061383","7.8097400831814383","1.2824543559480771","44825","2016-06-28 05:55:29.552608+03"
"Entrepreneurship","94911","2.79053337648529","14.2673241247063038","1.5481977852935908","43454","2016-04-19 22:28:35.956523+03"
"Life Lessons","94414","2.40382926434131","13.9626114771114453","1.1250450145105599","45045","2016-06-02 19:01:39.876276+03"
"Travel","80332","2.97209768940031","3.3994672110740427","4.3289224717422696","35644","2016-04-19 07:09:28.578+03"
"Design","75555","2.80802184122601","24.1416848653298921","3.5368142412811859","36471","2016-04-08 01:06:35.247556+03"
"Education","68855","2.74605567601954","6.1865369254229903"
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
page = requests.get("http://www.socialbakers.com/statistics/facebook/pages/detail/1196562180359709-leffe-italia")
soup = BeautifulSoup(page.content, 'html.parser')
if page.status_code == 200:
print "response ok"
#print(soup.prettify())
fb_account_description = soup.find('div', class_='account').get_text()
clang++ -g -O3 -Wall -Wextra -pedantic `getconf LFS_CFLAGS` -I/usr/include/postgresql/ -I/usr/include/libxml2/ -DOSMIUM_WITH_GEOS -o osm-history-importer importer.cpp -lexpat -lpq -lproj -lz -lprotobuf-lite -losmpbf -lpthread -lgeos
In file included from importer.cpp:17:
In file included from /usr/include/osmium.hpp:26:
In file included from /usr/include/osmium/input/pbf.hpp:33:
In file included from /usr/local/include/osmpbf/osmpbf.h:8:
/usr/local/include/osmpbf/osmformat.pb.h:1814:34: error: allocating an object of abstract class type '::OSMPBF::HeaderBBox'
if (bbox_ == NULL) bbox_ = new ::OSMPBF::HeaderBBox;
^
/usr/local/include/google/protobuf/message_lite.h:249:18: note: unimplemented pure virtual method 'ByteSizeLong' in 'HeaderBBox'
virtual size_t ByteSizeLong() const = 0;