Skip to content

Instantly share code, notes, and snippets.

@embano1
Last active August 23, 2020 17:01
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save embano1/1631a6d44eaa7a934807ec80c7cc74e0 to your computer and use it in GitHub Desktop.
Save embano1/1631a6d44eaa7a934807ec80c7cc74e0 to your computer and use it in GitHub Desktop.
jq to filter youtube videos from Twitter likes (uses "tw")
# uses https://github.com/embano1/tw, JSON output stored in file for faster processing
$ tw -f tw.json likes >Downloads/twout.json

# filter only youtube videos and print tweet text
$ cat twout.json | jq '.[]|select( .entities.urls[].expanded_url | contains ("yout"))|.full_text,"----"'
@embano1
Copy link
Author

embano1 commented Jun 23, 2018

Using regex lookahead syntax:

# filter tweets for Go related content
$ cat twout.json | jq '.[]| select( .full_text | match("(Go|Golang)\\s+";"i"))|.text' 

# filter tweets for Go related content and best practices
$ cat twout.json | jq '.[]| select( .full_text | match("^(?=.*(Go\\s+|golang))(?=.*(production|best)).+";"i"))|.text' 

@embano1
Copy link
Author

embano1 commented Jun 28, 2018

Transform JSON stream into easier to consume array

# Create new array with specific fields
$ jq '[.[]|{user: .user.screen_name,name: .user.name, text: .full_text, url: [.entities.urls[].expanded_url]}]' Downloads/twout.json

[
  {
    "user": "the_sttts",
    "name": "Stefan Schimanski",
    "text": "An awesome comparison of 2nd gen options to build @kubernetesio controllers: kubebuilder, operator sdk and metacontroller by @admiraltyio https://t.co/UDb0G0NXpg",
    "url": [
      "https://admiralty.io/kubernetes-custom-resource-controller-and-operator-development-tools.html"
    ]
  },
  {
    "user": "heidiann360",
    "name": "Heidi Howard",
    "text": "I am incredibly excited to see this paper from #SIGMOD18 on implementing Flexible Paxos for geo-distributed consensus. It's brilliant to see the theory from my PhD being put into practice by others.  \nhttps://t.co/19QA0ib9qn",
    "url": [
      "https://dl.acm.org/citation.cfm?id=3196928"
    ]
  },
  {
    "user": "copyconstruct",
    "name": "Cindy Sridharan",
    "text": "Google has a styleguide for bash. https://t.co/H2K4gZSQXj\n\n- If you find you need to use arrays for anything more than assignment of ${PIPESTATUS}, you should use Python. \n- If you are writing a script that is more than 100 lines long, you should probably be writing it in Python.",
    "url": [
      "https://google.github.io/styleguide/shell.xml"
    ]
  },
...
]
# Select those from user "embano1"
$ jq '[.[]|{user: .user.screen_name,name: .user.name, text: .full_text, url: [.entities.urls[].expanded_url]}]|.[]|select(.user == "embano1")' Downloads/twout.json

{
  "user": "embano1",
  "name": "Michael Gasch",
  "text": "Exploring Prometheus Go client metrics https://t.co/ifPAitkPKA",
  "url": [
    "https://povilasv.me/prometheus-go-metrics/"
  ]
}
{
  "user": "embano1",
  "name": "Michael Gasch",
  "text": "Naming conventions for Go functions, variables, packages, receivers, etc. https://t.co/jLjhTDSW9b",
  "url": [
    "https://twitter.com/davecheney/status/998530498417704963"
  ]
}
...

@embano1
Copy link
Author

embano1 commented Jun 28, 2018

Better version of youtube filter

# filter only youtube videos and print tweet text and link(s)
$ jq '[.[]|select( .entities.urls[].expanded_url | contains ("youtube"))|{text: .full_text, url: [.entities.urls[].expanded_url]}]' Downloads/twout.json

[
  {
    "text": "\"Google Production Environment\"\n\nAn intro to the infra that allows running processes reliably and scalably in data centers, as well as the development and build infrastructure that enables engineers to develop, deploy, and run large-scale services.\n\nhttps://t.co/W8nkNJlfPM",
    "url": [
      "https://www.youtube.com/watch?v=dhTVVWzpc4Q"
    ]
  },
  {
    "text": "Great talk by @christianposta on evolution from Netflix OSS stack to using Service Mesh #servicemesh #microservices #istio #envoy https://t.co/EhWi7y3w0e",
    "url": [
      "https://m.youtube.com/watch?v=WaD0SBb13AU"
    ]
  },
...
]

@embano1
Copy link
Author

embano1 commented Jul 6, 2018

Blog post examples:

#1 Only print Tweets containing "String" (ignore case)

/ jq '.[]|select(.full_text|match("handy";"i"))|.full_text' tweets.json
(...)
"[New Post] More Handy CLI Tools for JSON: https://t.co/CxAWaJbfI5"
"\"Choosing an HTTP Status Code\"\n\nA handy flowchart for figuring out which one applies in your situation.\n\n(Sadly no paths lead to 418...  )\n\nhttps://t.co/utMDLotwJx"
(...)

#2 Only find Tweets with these "two" "Words" (ignore case and order)

$ jq '.[]| select( .full_text | match("(?=.*(\\bGo\\b|golang))(?=.*(production|practice|idiomatic))";"i"))|.text' tweets.json
(...)
"How I Structure Production Grade REST APIs in Go: https://t.co/UGf25MuV9P (The initial post in the series, it focuses on application structure and routing.)"
"Building Scalable Web Services in Go: https://t.co/2Ul9ibqopd (A few best practices aggregated from around the world of Go.)"
"List of articles discussing \"Idiomatic Go\"\n\nhttps://t.co/LWUvxxmTCc\n\n#golang"
(...)

#3 Only print "Field(s)" we're interested in

/ jq '[.[]|{user: .user.screen_name,name: .user.name, text: .full_text, url: [.entities.urls[].expanded_url]}]' tweets.json
(...)
[
  {
    "user": "timoreimann",
    "name": "Timo Reimann",
    "text": "@the_sttts @TheNikhita https://t.co/XVdwwzJxXX is comprehensive, though it might be a bit overwhelming depending on how much you know already.\n\nGoogle's style guide does a good job to explain bash behav
ior: https://t.co/z4sfr0oiKj\n\nFinally, enabling shellcheck is a great way to learn while scripting.",
    "url": [
      "https://mywiki.wooledge.org/BashFAQ",
      "https://google.github.io/styleguide/shell.xml"
    ]
  },
  {
    "user": "vCabbage",
    "name": "Kale Blankenship",
    "text": "@copyconstruct https://t.co/lziacf4S4q works pretty well for this.",
    "url": [
      "https://github.com/fortytw2/leaktest"
    ]
  },
(...)

#4 Filter for Tweets with Links to Youtube videos

$ jq '[.[]|select( .entities.urls[].expanded_url | contains ("youtube"))|{text: .full_text, url: [.entities.urls[].expanded_url]}]' tweets.json
(...)
  {
    "text": "\"Google Production Environment\"\n\nAn intro to the infra that allows running processes reliably and scalably in data centers, as well as the development and build infrastructure that enables engineers to develop, deploy, and run large-scale services.\n\nhttps://t.co/W8nkNJlfPM",
    "url": [
      "https://www.youtube.com/watch?v=dhTVVWzpc4Q"
    ]
  },
  {
    "text": "And my talk from yesterday about @kubernetes as an API driven platform – API concepts, CRDs and controllers – from our Reykjavík Kubernetes Meetup https://t.co/cLg5EFDDPD",
    "url": [
      "https://www.youtube.com/watch?v=BiE7oKeEzDU"
    ]
  },
(...)

@embano1
Copy link
Author

embano1 commented Sep 29, 2018

Find the last 10 tweets and print their .full_text

jq '.[:10]|.[].full_text' Downloads/twout.json
...
# including some dash separators
jq '.[:10]|.[]|.full_text,"----"' Downloads/twout.json
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment