Skip to content

Instantly share code, notes, and snippets.

@Dan-Q
Last active April 28, 2024 17:48
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Dan-Q/65bb0a9470236520cbe255ff44924ce3 to your computer and use it in GitHub Desktop.
Save Dan-Q/65bb0a9470236520cbe255ff44924ce3 to your computer and use it in GitHub Desktop.
Improve the BBC News RSS feed by (a) filtering out sport, iplayer links, and BBC sounds links; and (b) stripping the anchor (#0, #1, #2 etc.) off <guid>s so "republished to front page" stories don't re-appear in your feed reader
#!/usr/bin/env ruby
require 'bundler/inline'
# # Sample crontab:
# # Every twenty minutes, run the script and log the results
# */20 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>>&1
# Dependencies:
# * open-uri - load remote URL content easily
# * nokogiri - parse/filter XML
gemfile do
source 'https://rubygems.org'
gem 'nokogiri'
end
require 'open-uri'
# Regular expression describing the GUIDs to reject from the resulting RSS feed
# We want to drop everything from the "sport" section of the website, also any iPlayer/Sounds/Ideas links
REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/(sport|iplayer|sounds|ideas)\//
# Load and filter the original RSS
rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING }.each(&:unlink)
# Strip the anchors off the <guid>s: BBC News "republishes" stories by using guids with #0, #1, #2 etc, which results in duplicates in feed readers
rss.css('guid').each{|g|g.content=g.content.gsub(/#.*$/,'')}
File.open( '/www/bbc-news-no-sport.xml', 'w' ){ |f| f.puts(rss.to_s) }
@vwillcox
Copy link

Thank you for this!

@Dan-Q
Copy link
Author

Dan-Q commented Mar 10, 2024

Blog post about this snippet: https://danq.me/2024/03/09/bbc-news-without-the-crap/

Blog post about an earlier version of this snippet: https://danq.me/2019/05/14/bbc-news-without-the-sport/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment