Skip to content

Instantly share code, notes, and snippets.

View mde-2590's full-sized avatar

ME mde-2590

View GitHub Profile
@MichelleDalalJian
MichelleDalalJian / py4e_ex_13
Created November 24, 2017 16:40
Extracting Data from XML: The program will prompt for a URL, read the XML data from that URL using urllib and then parse and extract the comment counts from the XML data, compute the sum of the numbers in the file.
from urllib import request
import xml.etree.ElementTree as ET
url = 'http://python-data.dr-chuck.net/comments_24966.xml'
print ("Retrieving", url)
html = request.urlopen(url)
data = html.read()
print("Retrieved",len(data),"characters")
tree = ET.fromstring(data)
@MichelleDalalJian
MichelleDalalJian / py4e_ex_12_02
Created November 24, 2017 16:04
Following Links in Python: The program will use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error
import ssl
import re
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = "http://py4e-data.dr-chuck.net/known_by_Bryce.html"
@MichelleDalalJian
MichelleDalalJian / py4e_ex_12_01
Last active November 18, 2023 00:44
Scraping Numbers from HTML using BeautifulSoup. The program will use urllib to read the HTML from the data files below, and parse the data, extracting numbers and compute the sum of the numbers in the file.
#Actual data: http://py4e-data.dr-chuck.net/comments_24964.html (Sum ends with 73)
from urllib import request
from bs4 import BeautifulSoup
html=request.urlopen('http://python-data.dr-chuck.net/comments_24964.html').read()
soup = BeautifulSoup(html)
tags=soup('span')
sum=0
for tag in tags:
sum=sum+int(tag.contents[0])
@MichelleDalalJian
MichelleDalalJian / py4e_ex_12
Created October 7, 2017 14:53
Exploring the HyperText Transport Protocol You are to retrieve the following document using the HTTP protocol in a way that you can examine the HTTP Response headers. http://data.pr4e.org/intro-short.txt There are three ways that you might retrieve this web page and look at the response headers: Preferred: Modify the socket1.py program to retrie…
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/intro-short.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)
while True:
data = mysock.recv(512)
if (len(data) < 1):
@manichabba
manichabba / jsonscraping.py
Created August 20, 2016 01:17
====Extracting Data from JSON: ==== The program will prompt for a URL, read the JSON data from that URL using urllib and then parse and extract the comment counts from the JSON data, compute the sum of the numbers in the file and provide the sum.
import urllib #importing urllib
import json #importing json
#requesting a json file url
url = raw_input("Enter the URL:")
#load json file as list -info
info = json.loads(urllib.urlopen(url).read())
x = 0
#loop through each item in list comments