Skip to content

Instantly share code, notes, and snippets.

@dimitryzub
Last active August 6, 2021 08:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dimitryzub/ae459c73001e62b6e8248844e2679939 to your computer and use it in GitHub Desktop.
Save dimitryzub/ae459c73001e62b6e8248844e2679939 to your computer and use it in GitHub Desktop.
DuckDuckGo Scrape Ad Results
from selenium import webdriver
driver = webdriver.Chrome(executable_path='path/to/chromedriver.exe')
driver.get('https://duckduckgo.com/?q=rtx 3080&kl=us-en&ia=web')
for result in driver.find_elements_by_css_selector('.results--ads .result__body.links_main.links_deep'):
title = result.find_element_by_css_selector('.js-result-title-link').text
link = result.find_element_by_css_selector('.js-result-title-link').get_attribute('href')
source = result.find_element_by_css_selector('.js-result-extras-url').text
snippet = result.find_element_by_css_selector('.js-result-snippet > at').text
print(f'{title}\n{source}\n{snippet}\n{link}\n')
for sitelink in driver.find_elements_by_css_selector('.js-sitelinks-title'):
sitelink_title = sitelink.text
sitelink_url = sitelink.get_attribute('href')
print(f'{sitelink_title}\n{sitelink_url}\n')
driver.quit()
@ilyazub
Copy link

ilyazub commented Aug 3, 2021

Will result.find_elements_by_css_selector('.sitelink--small__title') work?

for result in driver.find_elements_by_css_selector('.result--ad'):
    title = result.find_element_by_css_selector('.results--ads .result__title .result__a').text
    link = result.find_element_by_css_selector('.results--ads .result__title .result__a').get_attribute('href')
    source = result.find_element_by_css_selector('.results--ads .result__extras__url').text
    snippet = result.find_element_by_css_selector('.results--ads .result__snippet').text
    print(f'{title}\n{link}\n{snippet}\n{source}\n')

    for sitelinks in result.find_elements_by_css_selector('.sitelink--small__title'):
        title = sitelinks.text
        link = sitelinks.get_attribute('href')
        print(f'{title}\n{link}\n')

@dimitryzub
Copy link
Author

dimitryzub commented Aug 4, 2021

@ilyazub thank you for pointing it out! It will work for inline site links but will skip expanded site links, e.g:

image

I updated the gist with different selectors since some of them don't work properly.

Here's a gif that demonstrates the output.

@ilyazub
Copy link

ilyazub commented Aug 6, 2021

Will it scrape ads on the right side?

image

@ilyazub
Copy link

ilyazub commented Aug 6, 2021

It will work for inline site links but will skip expanded site links.

Makes sense 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment