Skip to content

Instantly share code, notes, and snippets.

@AlexandrBasan
Last active August 29, 2015 14:18
Show Gist options
  • Save AlexandrBasan/304cf54e45c7735a566e to your computer and use it in GitHub Desktop.
Save AlexandrBasan/304cf54e45c7735a566e to your computer and use it in GitHub Desktop.
Ruby URL PARSER (find all http tags and img tags on page)
<% link_result = Array.new([]) %>
<% img_result = Array.new([]) %>
<% require 'open-uri' %>
<% uri = URI.parse('http://url.com') %>
<% data = uri.read %>
<% @links = data.scan(URI.regexp(%w(http https))) %>
</br>
<% @links.each do |element| %>
<% element.each do |link| %>
<!-- http links -->
<% if link =~ /[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?/ix %>
<% link_result.push(link) %>
<% end %>
<% end %>
<% end %>
<!-- src links -->
<% @img = data.scan(/src\s*=\s*(\"|')(([^\"';]*))(\"|')/) %>
</br>
<% @img.each do |element| %>
<% element.each do |link| %>
<!-- links in src tag -->
<% if link =~ /\w{1,100}\W{1,100}\w{1,100}.(?:jpe?g|png|gif)(?!\"|\')/ %>
<% img_result.push(link) %>
<% end %>
<% end %>
<% end %>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment