Skip to content

Instantly share code, notes, and snippets.

@Xliff
Created November 10, 2019 16:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Xliff/5fb6df9f353c812502ece6bf15e59755 to your computer and use it in GitHub Desktop.
Save Xliff/5fb6df9f353c812502ece6bf15e59755 to your computer and use it in GitHub Desktop.
Yahoo! search interface

I was working on the Yahoo search interface problem at Rosetta Code, and came up with the following:

use Gumbo; 
use LWP::Simple; 
my $dom = parse-html( LWP::Simple.get("http://search.yahoo.com/search?p=test") ); 
my @r = $dom.lookfor( TAG => "h3", class => "title"); 
for @r { 
  .lookfor( TAG => "a" ).map({ [ .attribs<href>, .contents[0] ] }).say; 
  my $desc = .parent.parent.lookfor( TAG => "div" )[1]; 
  $desc.nodes.map( *.contents ).say 
}; 
$dom.lookfor( TAG => "a", class => "next").say

It's mostly straight-forward until I try to pull the text out of the node in $desc...then it becomes problematic.

I've noticed that a lot of things are a bit wonky with Perl6's XML module, and the author looks to be fairly inactive. Unfortunately, Gumbo is tied to it. XML really could use a text method. I will see if I can write one...

@Xliff
Copy link
Author

Xliff commented Nov 10, 2019

method text (XML::Element $node) {
  do gather for $node.nodes {
    when XML::Text { take .text }
    default {
      take .text($_)
   }
  }.join(" ");
}

@Xliff
Copy link
Author

Xliff commented Nov 10, 2019

After much gnashing of teeth, here's the final, working version...

YahooSearch.pm6:

use Gumbo;
use LWP::Simple;
use XML::Text;

class YahooSearch {
  has $!dom;

  submethod BUILD (:$!dom) { }

  method new($term) {
    self.bless(
      dom => parse-html(
        LWP::Simple.get("http://search.yahoo.com/search?p={ $term }")
      )
    );
  }

  method next {
    $!dom = parse-html(
      LWP::Simple.get(
        $!dom.lookfor( TAG => 'a', class => 'next' ).head.attribs<href> 
      )
    );
    self;
  }

  method text ($node) {
    return ''         unless $node;
    return $node.text if     $node ~~ XML::Text;

    $node.nodes.map({ self.text($_).trim }).join(' ');
  }

  method results {
    state $n = 0;
    for $!dom.lookfor( TAG => 'h3', class => 'title') {
      given .lookfor( TAG => 'a' )[0] {
        next unless $_;                                               # No Link
        next if .attribs<href> ~~ / ^ 'https://r.search.yahoo.com' /; # Ad
        say "=== #{ ++$n } ===";
        say "Title: { .contents[0] ?? self.text( .contents[0] ) !! '' }";
        say "  URL: { .attribs<href> }";

        my $pt = .parent.parent.parent.elements( TAG => 'div' ).tail;
        say " Text: { self.text($pt) }";
      }
    };
  }
}

sub MAIN (Str $search-term) is export {
  my $y = YahooSearch.new($search-term);

  $y.results;
  $y.next.results;
}

To invoke from a REPL:

use YahooSearch;

$y = YahooSearch.new('test');
$y.results;
$y.next.results;

Gives:

=== #1 ===
Title: 
  URL: https://www.speedtest.net/
 Text: At Ookla, we are committed to ensuring that individuals with disabilities can access all of the content at www.speedtest.net. We also strive to make all content in Speedtest apps accessible. If you are having trouble accessing www.speedtest.net or Speedtest apps, please email legal@ziffdavis.com for assistance. Please put "ADA Inquiry" in the ...
=== #2 ===
Title: Test | Definition of Test by Merriam-Webster
  URL: https://www.merriam-webster.com/dictionary/test
 Text: Test definition is - a means of testing: such as. How to use test in a sentence.
=== #3 ===
Title: - Video Results
  URL: https://video.search.yahoo.com/search/video?p=test
 Text: More Test videos
=== #4 ===
Title: Test - definition of test by The Free Dictionary
  URL: https://www.thefreedictionary.com/test
 Text: test 1 (tĕst) n. 1. A procedure for critical evaluation; a means of determining the presence, quality, or truth of something; a trial: a test of one's eyesight; subjecting a hypothesis to a test ; a test of an athlete's endurance. 2. A series of questions, problems, or physical responses designed to determine knowledge, intelligence, or ability. 3. A ...
=== #5 ===
Title: 
  URL: https://www.test.com/
 Text: Provides extranet privacy to clients making a range of tests and surveys available to their human resources departments. Companies can test prospective and current employees. Information on surveys, certification, examination, testing and contact details.
=== #6 ===
Title: Internet Speed Test | Fast.com
  URL: https://fast.com/
 Text: How fast is your download speed? In seconds, FAST.com's simple Internet speed test will estimate your ISP speed.
=== #7 ===
Title: Speakeasy Internet Speed Test - Check Your Broadband Speed ...
  URL: https://www.speakeasy.net/speedtest/
 Text: The internet speed test trusted by millions. Use our free bandwidth test to check your speed and get the most from your ISP. New HTML5 speed test , no Flash required.
=== #8 ===
Title: The ACT Test for Students | ACT
  URL: https://www.act.org/content/act/en/products-and-services/the-act.html
 Text: The ACT test is a curriculum-based education and career planning tool for high school students that assesses the mastery of college readiness standards
=== #9 ===
Title: AT&T High Speed Internet Speed Test
  URL: https://speedtest.att.com/speedtest/
 Text: Want to know your Internet speed? The speed test takes less than a minute and performs two key measurements: Download speed (the speed of data sent from the Internet to your computer) Upload speed (the speed of data sent from your computer to the Internet) We also report latency, a factor that could ...
=== #10 ===
Title: "Key-Test" - keyboard test online
  URL: https://en.key-test.ru/
 Text: Key- Test Keyboard test online. To test the keyboard, press the keys (before switching to the English keyboard)
=== #11 ===
Title: Free Personality Test | 16Personalities
  URL: https://www.16personalities.com/free-personality-test
 Text: Free personality test - take it to find out why our readers say that this personality test is so accurate, “it's a little bit creepy.” No registration required!
=== #12 ===
Title: .com Practice 
  URL: https://www.tests.com/
 Text: Free practice tests and other test resources organized in 300 categories including: academic, career, personality, intelligence, and more.
=== #13 ===
Title:  (assessment) - Wikipedia
  URL: https://en.wikipedia.org/wiki/Test_(assessment)
 Text: A test or examination (informally, exam or evaluation) is an assessment intended to measure a test -taker's knowledge, skill, aptitude, physical fitness, or classification in many other topics (e.g., beliefs).
=== #14 ===
Title: Tests, tests, tests: Personality, IQ, career, love, health ...
  URL: https://www.queendom.com/tests/index.htm
 Text: Tests & quizzes: free IQ test , Big 5 personality test , emotional intelligence test (EQ), love tests, career aptitudes test , self-esteem self- test , communication skills assessment
=== #15 ===
Title:  
  URL: https://webcamtests.com/
 Text: About WebcamTests.com. This site provides a free tool to test your webcam online and check if it is working properly. In other words, you can test it directly from your browser without the need to install third-party software.
=== #16 ===
Title: Internet Speed 
  URL: https://www.spectrum.com/internet/speed-test.html
 Text: Find out your internet download and upload speed in mps per second with our internet speed test ! Get lightning fast internet speeds starting at 100 mps with Spectrum!
=== #17 ===
Title: Free 
  URL: https://www.16personalities.com/free-personality-test
 Text: Free personality test - take it to find out why our readers say that this personality test is so accurate, “it's a little bit creepy.” No registration required!
=== #18 ===
Title: Free personality 
  URL: https://www.16personalities.com/
 Text: Join our community. You are not alone. Our free forum is full of people just like you. Ask and give advice, connect with friends, hear stories, or maybe meet your love.
=== #19 ===
Title: Personality 
  URL: https://www.humanmetrics.com/cgi-win/jtypes2.asp
 Text: Personality test based on C. Jung and I. Briggs Myers type theory provides your type formula, type description, career choices
=== #20 ===
Title: Samples of Driver License Written Tests
  URL: https://www.dmv.ca.gov/portal/dmv/detail/pubs/interactive/tdrive/exam
 Text: California DMV Home Page is available for customers to check out publications, download forms, brochures, FAQs, Vehicle Information, Boats, Vessel, and Field Offices.
=== #21 ===
Title: Google
  URL: https://www.google.com/
 Text: Google allows users to search the Web for images, news, products, video, and other content.

@Xliff
Copy link
Author

Xliff commented Nov 10, 2019

Could cut down MAIN to:

sub MAIN (Str $search-term) {
  YahooSearch.new($search-term).results.next.results;
}

If I return self from results...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment