Skip to content

Instantly share code, notes, and snippets.

@adulau
Last active July 7, 2020 14:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save adulau/be2b5f1047d9a305492da809ca8a9168 to your computer and use it in GitHub Desktop.
Save adulau/be2b5f1047d9a305492da809ca8a9168 to your computer and use it in GitHub Desktop.
Tweet analysis.md

Tweet analysis

Issues Unicode spaces

Tweets are in Unicode format and different languages. You might want to convert all the different kind of spaces into a single type of space.

If you are curious about all the different kind of spaces in Unicode, you might want to read Unicode spaces

 CHARS=$(printf "%b" "\U00A0\U1680\U180E\U2000\U2001\U2002\U2003\U2004\U2005\U2006\U2007\U2008\U2009\U200A\U200B\U202F\U205F\U3000\UFEFF")
 OUTPUT | sed -e 's/['"$CHARS"']/ /g' | INPUT

Extracting tweets containing link to Telegram, testing the links and get metadata

grep -Po 't.me/(.)+ ' coronavirus_telegram.txt | sed -e 's/['"$CHARS"']/ /g' | awk '{print $1}' | parallel 'curl -L -s -f "{1}" | egrep "(og:title|<Title>|al:ios:url|twitter:description|tgme_page_extra|tgme_page_description)"

Sample output:

<meta property="og:title" content="⚠️Coronavirus, día a día🔎 🦠">
<meta property="al:ios:url" content="tg://resolve?domain=coronavirusxeluniversal">
<meta name="twitter:description" content="Información de la pandemia confirmada por EL UNIVERSAL de México
<meta property="og:title" content="CoronaCoin Airdrop">
<meta property="al:ios:url" content="tg://resolve?domain=CoronaCoinAirdrop_Bot&amp;start=r0708463461">
<meta name="twitter:description" content="You can contact @CoronaCoinAirdrop_Bot right away.
<meta property="og:title" content="The Bread Box General Chat">
<meta property="al:ios:url" content="tg://resolve?domain=breadboxalerts">
<meta name="twitter:description" content="Premium Stock, Options, Futures and OTC Trading Group. Private chat for alerts. Years of trading experience goes behind every alert. 1 week trial available for &#036;5&#33; PM @BreadBoxBot to join our premium alert and education rooms&#33; Results speak for themself&#33;
<meta property="og:title" content="Join group chat on Telegram">
<meta property="al:ios:url" content="tg://join?invite=AAAAAE5ouN5hBtrRLmDb0w">
<meta name="twitter:description" content="
<meta property="og:title" content="⚠️Coronavirus, día a día🔎 🦠">
<meta property="al:ios:url" content="tg://resolve?domain=coronavirusxeluniversal">
<meta name="twitter:description" content="Información de la pandemia confirmada por EL UNIVERSAL de México
<meta property="og:title" content="The Bread Box General Chat">
<meta property="al:ios:url" content="tg://resolve?domain=breadboxalerts">
<meta name="twitter:description" content="Premium Stock, Options, Futures and OTC Trading Group. Private chat for alerts. Years of trading experience goes behind every alert. 1 week trial available for &#036;5&#33; PM @BreadBoxBot to join our premium alert and education rooms&#33; Results speak for themself&#33;
<meta property="og:title" content="Join group chat on Telegram">
<meta property="al:ios:url" content="tg://join?invite=AAAAAE5ouN5hBtrRLmDb0w">
<meta name="twitter:description" content="
<meta property="og:title" content="CoronaCoin Airdrop">
<meta property="al:ios:url" content="tg://resolve?domain=CoronaCoinAirdrop_Bot&amp;start=r0708463461">
<meta name="twitter:description" content="You can contact @CoronaCoinAirdrop_Bot right away.
<meta property="og:title" content="Público">
<meta property="al:ios:url" content="tg://resolve?domain=publico_es">
<meta name="twitter:description" content="Canal oficial de Público
<meta property="og:title" content="Human.uz | Stay Home">
<meta property="al:ios:url" content="tg://resolve?domain=humanuz_english">
<meta name="twitter:description" content="You can view and join @humanuz_english right away.
<meta property="og:title" content="Coronavirus Makers">
<meta property="al:ios:url" content="tg://resolve?domain=coronavirus_makers">
<meta name="twitter:description" content="LEE EL MENSAJE ANCLADO, NO HAGAS RUIDO 😘		Bot de ayuda @coronavirusmakers_bot		Canal de anuncios (solo noticias importantes y resúmenes diarios, unas 5 al día) : 	@CV19Makers_Anuncios	Web: https://www.coronavirusmakers.org/
<meta property="og:title" content="Envato Leaks | Download files from Envato elements, 123rf etc">
<meta property="al:ios:url" content="tg://resolve?domain=envatoleaks&amp;post=77">
<meta name="twitter:description" content="🔐 Category: #AfterEffectsTemplates
<meta property="og:title" content="پزشک آنلاین">
<meta property="al:ios:url" content="tg://resolve?domain=doctorsonline_giam&amp;post=609">
<meta name="twitter:description" content="🔴یک مقام ارشد سازمان جهانی بهداشت در جریان کنفرانس خبری این نهاد می‌گوید بر پایه مطالعات موجود، اکثر موارد انتقال ویروس کرونا از طریق افرادی صورت می‌گیرد که علائم بیماری را نشان داده‌اند، و نه افرادی که هنوز هیچ علامتی ندارند.
<meta property="og:title" content="elgarrotxi.cat">
<meta property="al:ios:url" content="tg://resolve?domain=elgarrotxicat">
<meta name="twitter:description" content="El digital ďinformació de la comarca de la Garrotxa
<meta property="og:title" content="Canal Ajuntament de Castelldefels">
<meta property="al:ios:url" content="tg://resolve?domain=AjCastelldefels">
<meta name="twitter:description" content="Canal de difusió de Telegram de l&#39;Ajuntament de Castelldefels.
<meta property="og:title" content="Público">
<meta property="al:ios:url" content="tg://resolve?domain=publico_es">
<meta name="twitter:description" content="Canal oficial de Público
<meta property="og:title" content="⚠️Coronavirus, día a día🔎 🦠">
<meta property="al:ios:url" content="tg://resolve?domain=coronavirusxeluniversal">
<meta name="twitter:description" content="Información de la pandemia confirmada por EL UNIVERSAL de México
<meta property="og:title" content="Money heist season 4 in English">
<meta property="al:ios:url" content="tg://resolve?domain=MoneyHeist2020&amp;post=145">
<meta name="twitter:description" content="🎞️ Money Heist || Season 04 Episode 01 || Game Over || 720p || English || Netflix originals
<meta property="og:title" content="Join group chat on Telegram">
<meta property="al:ios:url" content="tg://join?invite=AAAAAEHYZWD8rBbk7xiutQ">
<meta name="twitter:description" content="
<meta property="og:title" content="RDN Digital">
<meta property="al:ios:url" content="tg://resolve?domain=RDNDigital">
<meta name="twitter:description" content="Canal Oficial de RDN Digital | Medio de Comunicación Plural • Veraz • Alternativo | www.rdndigital.com		Director: @FranciscoMunyoz
<meta property="og:title" content="Coronavirus Info">
<meta property="al:ios:url" content="tg://resolve?domain=corona">
<meta name="twitter:description" content="A list of official channels with information on COVID-19.
<meta property="og:title" content="📶 ایران آزادی - اینفوگرافی">
<meta property="al:ios:url" content="tg://resolve?domain=iranazadinfo">
<meta name="twitter:description" content="📶 اینفوگرافی ایران آزادی		این کانال، زیر مجموعه سایت ایران آزادی است. 	https://fa.iranfreedom.org/	https://www.instagram.com/iran_azadi_e	https://t.me/IranAzadie	https://twitter.com/iranazadi1395	https://www.facebook.com/IranAzadie
<meta property="og:title" content="Coronavirus Info">
<meta property="al:ios:url" content="tg://resolve?domain=corona">
<meta name="twitter:description" content="A list of official channels with information on COVID-19.
<meta property="og:title" content="Waiting Room&#33;">
<meta property="al:ios:url" content="tg://join?invite=AAAAAE13A0gh9YT7-oQNqw">
<meta name="twitter:description" content="This is the waiting room to be accept in the VIP TIPS for @underdogtips. We will contact everyone individually&#33;
<meta property="og:title" content="L&#39;Apòstrof Cooperativa de comunicació">
<meta property="al:ios:url" content="tg://resolve?domain=apostrof_coop">
<meta name="twitter:description" content="Cooperativa de comunicació social i creativa. Horari 9:30-13:30
<meta property="og:title" content="CORONAVIRUS | КОРОНАВИРУС | ЭКСТРЕННО">
<meta property="al:ios:url" content="tg://resolve?domain=coronavirusalarm&amp;post=3851">
<meta name="twitter:description" content="‼️КАРТОННЫЕ КОРОБКИ ВМЕСТО ГРОБОВ
<meta property="og:title" content="Buytex">
<meta property="al:ios:url" content="tg://resolve?domain=buytexchange&amp;post=54">
<meta name="twitter:description" content="COVID-19 Is Changing Bitcoin Usage
<meta property="og:title" content="Corona Turkey #EvdeKal">
<meta property="al:ios:url" content="tg://resolve?domain=turkeycoronavirus">
<meta name="twitter:description" content="Bu grup Coronavirus ile ilgili gerçek bilgileri içerir. Asılsız haber paylaşımı yapılmamaktadır.
<meta property="og:title" content="Público">
<meta property="al:ios:url" content="tg://resolve?domain=publico_es">
<meta name="twitter:description" content="Canal oficial de Público
<meta property="og:title" content="Coronavirus Turkiye Son Dakika 🚨">
<meta property="al:ios:url" content="tg://resolve?domain=coronavirussd">
<meta name="twitter:description" content="Corona Virüsü ile İlgili Son Dakika Haberleri ve Gelişmeleri ☢️	Son gelişmelere en hızlı ulaşmak için bildirimleri aç 🚨	Reklam ve işbirliği için: @HawkAswad
<meta property="og:title" content="Naijacityscapesblog">
<meta property="al:ios:url" content="tg://resolve?domain=naijacityscapesblog">
<meta name="twitter:description" content="The official Telegram Chanel of Naijacityscapes	No.1 blog for Educational, inspirational, relevant, and lifestyle contents
<meta property="og:title" content="Help42500">
<meta property="al:ios:url" content="tg://resolve?domain=help42500bot">
<meta name="twitter:description" content="Назначение социальной выплаты на период чрезвычайного положения
<meta property="og:title" content="⚠️Coronavirus, día a día🔎 🦠">
<meta property="al:ios:url" content="tg://resolve?domain=coronavirusxeluniversal">
<meta name="twitter:description" content="Información de la pandemia confirmada por EL UNIVERSAL de México
<meta property="og:title" content="Human.uz | Stay Home">
<meta property="al:ios:url" content="tg://resolve?domain=humanuz_english">
<meta name="twitter:description" content="You can view and join @humanuz_english right away.
<meta property="og:title" content="AtumusDoctor">
<meta property="al:ios:url" content="tg://resolve?domain=AtumusDoctorBot">
<meta name="twitter:description" content="Plataforma de carácter informativo en permanente actualización de cifras estadísticas.
<meta property="og:title" content="UkrAgroConsult">
<meta property="al:ios:url" content="tg://resolve?domain=UkrAgroConsult">
<meta name="twitter:description" content="Black Sea ag market news &amp; analytics
<meta property="og:title" content="Público">
<meta property="al:ios:url" content="tg://resolve?domain=publico_es">
<meta name="twitter:description" content="Canal oficial de Público
<meta property="og:title" content="Ajuntament de Canet de Mar">
<meta property="al:ios:url" content="tg://resolve?domain=ajcanetdemar">
<meta name="twitter:description" content="Canal oficial d’informació de l’Ajuntament de Canet de Mar
<meta property="og:title" content="M.R. Research (Intraday)">
<meta property="al:ios:url" content="tg://resolve?domain=mr_research">
<meta name="twitter:description" content="Disclaimer: We&#39;r not SEBI registered analyst. Please consult your financial advisor prior taking any trades with our recommendations.		YouTube:	https://www.youtube.com/channel/UCOSMrlTqD6DgB9yhhamgvNQ		Facebook:	https://www.facebook.com/mrresearch2328
<meta property="og:title" content="Facebookaccuntsbot">
<meta property="al:ios:url" content="tg://resolve?domain=Facebookaccuntsbot&amp;start=4138">
<meta name="twitter:description" content="You can contact @Facebookaccuntsbot right away.
<meta property="og:title" content="Coronavirus Romania">
<meta property="al:ios:url" content="tg://resolve?domain=coronavirus_romania">
<meta name="twitter:description" content="Bine ai venit in carantina&#33;		Fii amabil și respectuos.	Propagatorii de conspirații primesc Mute.		Alerte din surse oficiale aici: https://t.me/covid2019ro		Pentru informații oficiale despre situațiile legate de SarsCov2 / Covid19 TelVerde 0800800358
<meta property="og:title" content="Público">
<meta property="al:ios:url" content="tg://resolve?domain=publico_es">
<meta name="twitter:description" content="Canal oficial de Público

Extracting a time series when a twitter is posting tweets

cat x.json | jq .created_at | awk '{ print strftime("%H", $0/1000); }' | sort | uniq -c

     10 00
     10 01
      2 02
      2 09
      8 10
      4 13
      4 14
      2 15
      2 16
      6 18
      8 19
      6 20
     10 22
      8 23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment