Instantly share code, notes, and snippets.

@embano1 embano1/README.MD
Last active Sep 29, 2018

Embed
What would you like to do?
jq to filter youtube videos from Twitter likes (uses "tw")
# uses https://github.com/embano1/tw, JSON output stored in file for faster processing
$ tw -f tw.json likes >Downloads/twout.json

# filter only youtube videos and print tweet text
$ cat twout.json | jq '.[]|select( .entities.urls[].expanded_url | contains ("yout"))|.full_text,"----"'
@embano1

This comment has been minimized.

Copy link
Owner Author

embano1 commented Jun 23, 2018

Using regex lookahead syntax:

# filter tweets for Go related content
$ cat twout.json | jq '.[]| select( .full_text | match("(Go|Golang)\\s+";"i"))|.text' 

# filter tweets for Go related content and best practices
$ cat twout.json | jq '.[]| select( .full_text | match("^(?=.*(Go\\s+|golang))(?=.*(production|best)).+";"i"))|.text' 
@embano1

This comment has been minimized.

Copy link
Owner Author

embano1 commented Jun 28, 2018

Transform JSON stream into easier to consume array

# Create new array with specific fields
$ jq '[.[]|{user: .user.screen_name,name: .user.name, text: .full_text, url: [.entities.urls[].expanded_url]}]' Downloads/twout.json

[
  {
    "user": "the_sttts",
    "name": "Stefan Schimanski",
    "text": "An awesome comparison of 2nd gen options to build @kubernetesio controllers: kubebuilder, operator sdk and metacontroller by @admiraltyio https://t.co/UDb0G0NXpg",
    "url": [
      "https://admiralty.io/kubernetes-custom-resource-controller-and-operator-development-tools.html"
    ]
  },
  {
    "user": "heidiann360",
    "name": "Heidi Howard",
    "text": "I am incredibly excited to see this paper from #SIGMOD18 on implementing Flexible Paxos for geo-distributed consensus. It's brilliant to see the theory from my PhD being put into practice by others.  \nhttps://t.co/19QA0ib9qn",
    "url": [
      "https://dl.acm.org/citation.cfm?id=3196928"
    ]
  },
  {
    "user": "copyconstruct",
    "name": "Cindy Sridharan",
    "text": "Google has a styleguide for bash. https://t.co/H2K4gZSQXj\n\n- If you find you need to use arrays for anything more than assignment of ${PIPESTATUS}, you should use Python. \n- If you are writing a script that is more than 100 lines long, you should probably be writing it in Python.",
    "url": [
      "https://google.github.io/styleguide/shell.xml"
    ]
  },
...
]
# Select those from user "embano1"
$ jq '[.[]|{user: .user.screen_name,name: .user.name, text: .full_text, url: [.entities.urls[].expanded_url]}]|.[]|select(.user == "embano1")' Downloads/twout.json

{
  "user": "embano1",
  "name": "Michael Gasch",
  "text": "Exploring Prometheus Go client metrics https://t.co/ifPAitkPKA",
  "url": [
    "https://povilasv.me/prometheus-go-metrics/"
  ]
}
{
  "user": "embano1",
  "name": "Michael Gasch",
  "text": "Naming conventions for Go functions, variables, packages, receivers, etc. https://t.co/jLjhTDSW9b",
  "url": [
    "https://twitter.com/davecheney/status/998530498417704963"
  ]
}
...
@embano1

This comment has been minimized.

Copy link
Owner Author

embano1 commented Jun 28, 2018

Better version of youtube filter

# filter only youtube videos and print tweet text and link(s)
$ jq '[.[]|select( .entities.urls[].expanded_url | contains ("youtube"))|{text: .full_text, url: [.entities.urls[].expanded_url]}]' Downloads/twout.json

[
  {
    "text": "\"Google Production Environment\"\n\nAn intro to the infra that allows running processes reliably and scalably in data centers, as well as the development and build infrastructure that enables engineers to develop, deploy, and run large-scale services.\n\nhttps://t.co/W8nkNJlfPM",
    "url": [
      "https://www.youtube.com/watch?v=dhTVVWzpc4Q"
    ]
  },
  {
    "text": "Great talk by @christianposta on evolution from Netflix OSS stack to using Service Mesh #servicemesh #microservices #istio #envoy https://t.co/EhWi7y3w0e",
    "url": [
      "https://m.youtube.com/watch?v=WaD0SBb13AU"
    ]
  },
...
]
@embano1

This comment has been minimized.

Copy link
Owner Author

embano1 commented Jul 6, 2018

Blog post examples:

#1 Only print Tweets containing "String" (ignore case)

/ jq '.[]|select(.full_text|match("handy";"i"))|.full_text' tweets.json
(...)
"[New Post] More Handy CLI Tools for JSON: https://t.co/CxAWaJbfI5"
"\"Choosing an HTTP Status Code\"\n\nA handy flowchart for figuring out which one applies in your situation.\n\n(Sadly no paths lead to 418...  )\n\nhttps://t.co/utMDLotwJx"
(...)

#2 Only find Tweets with these "two" "Words" (ignore case and order)

$ jq '.[]| select( .full_text | match("(?=.*(\\bGo\\b|golang))(?=.*(production|practice|idiomatic))";"i"))|.text' tweets.json
(...)
"How I Structure Production Grade REST APIs in Go: https://t.co/UGf25MuV9P (The initial post in the series, it focuses on application structure and routing.)"
"Building Scalable Web Services in Go: https://t.co/2Ul9ibqopd (A few best practices aggregated from around the world of Go.)"
"List of articles discussing \"Idiomatic Go\"\n\nhttps://t.co/LWUvxxmTCc\n\n#golang"
(...)

#3 Only print "Field(s)" we're interested in

/ jq '[.[]|{user: .user.screen_name,name: .user.name, text: .full_text, url: [.entities.urls[].expanded_url]}]' tweets.json
(...)
[
  {
    "user": "timoreimann",
    "name": "Timo Reimann",
    "text": "@the_sttts @TheNikhita https://t.co/XVdwwzJxXX is comprehensive, though it might be a bit overwhelming depending on how much you know already.\n\nGoogle's style guide does a good job to explain bash behav
ior: https://t.co/z4sfr0oiKj\n\nFinally, enabling shellcheck is a great way to learn while scripting.",
    "url": [
      "https://mywiki.wooledge.org/BashFAQ",
      "https://google.github.io/styleguide/shell.xml"
    ]
  },
  {
    "user": "vCabbage",
    "name": "Kale Blankenship",
    "text": "@copyconstruct https://t.co/lziacf4S4q works pretty well for this.",
    "url": [
      "https://github.com/fortytw2/leaktest"
    ]
  },
(...)

#4 Filter for Tweets with Links to Youtube videos

$ jq '[.[]|select( .entities.urls[].expanded_url | contains ("youtube"))|{text: .full_text, url: [.entities.urls[].expanded_url]}]' tweets.json
(...)
  {
    "text": "\"Google Production Environment\"\n\nAn intro to the infra that allows running processes reliably and scalably in data centers, as well as the development and build infrastructure that enables engineers to develop, deploy, and run large-scale services.\n\nhttps://t.co/W8nkNJlfPM",
    "url": [
      "https://www.youtube.com/watch?v=dhTVVWzpc4Q"
    ]
  },
  {
    "text": "And my talk from yesterday about @kubernetes as an API driven platform – API concepts, CRDs and controllers – from our Reykjavík Kubernetes Meetup https://t.co/cLg5EFDDPD",
    "url": [
      "https://www.youtube.com/watch?v=BiE7oKeEzDU"
    ]
  },
(...)
@embano1

This comment has been minimized.

Copy link
Owner Author

embano1 commented Sep 29, 2018

Find the last 10 tweets and print their .full_text

jq '.[:10]|.[].full_text' Downloads/twout.json
...
# including some dash separators
jq '.[:10]|.[]|.full_text,"----"' Downloads/twout.json
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment