Created
August 23, 2019 07:10
-
-
Save CodersArts/4ed31166f21e2ddecc948f88740333ca to your computer and use it in GitHub Desktop.
Web Scraping Host way to Scrape content
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
soup.title | |
# <title>Returns title tags and the content between the tags</title> | |
soup.title.string | |
# u'Returns the content inside a title tag as a string' | |
soup.p | |
# <p class="title"><b>This returns everything inside the paragraph tag</b></p> | |
soup.p['class'] | |
# u'className' (this returns the class name of the element) | |
soup.a | |
# <a class="link" href="http://example.com/example" id="link1">This would return the first | |
matching anchor tag</a> | |
// Or, we could use the find all, and return all the matching anchor tags | |
soup.find_all('a') | |
# [<a class="link" href="http://example.com/example1" id="link1">link2</a>, | |
# <a class="link" href="http://example.com/example2" id="link2">like3</a>, | |
# <a class="link" href="http://example.com/example3" id="link3">Link1</a>] | |
soup.find(id="link3") | |
# <a class="link" href="http://example.com/example3" id="link3">This returns just the matching element by ID</a> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment