This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Import Libraries | |
import requests | |
import pandas as pd | |
from bs4 import BeautifulSoup | |
# Get webpage with requests | |
web_page = requests.get('https://www.newvisions.org/ams2/pages/our-staff2') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Parse web_page | |
soup = BeautifulSoup(web_page.text, 'html.parser') | |
# Create set of results based on HTML tags with desired data | |
results = soup.find_all('div', attrs={'class':'matrix-content'}) | |
results = results[29:] | |
# See the length of the results | |
len(results) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Testing with the first teacher and obtaining the name | |
test_result = results[0] | |
test_result.find('h5') | |
test_result.find('h5').text | |
# Obtaining position(s) | |
test_result.find('p').text.strip('\n\t') | |
# Obtaining email | |
test_result.find('em').get_text() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Search for sum of duplicate | |
print(df.duplicated(['Teacher_Names']).sum()) | |
# Eliminating duplicates | |
df.drop_duplicates(['Teacher_Names'], keep='first', inplace=True) | |
# Export to csv without numbered indices | |
df.to_csv('BronxSchoolStaffInfo.csv', index=False) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Initiate DataFrame object | |
df = pd.DataFrame() | |
# Get all the teacher names | |
df['Teacher_Names'] = [result.find('h5').text for result in results] | |
# Get all the position titles | |
df['Positions'] = [result.find('p').text.strip('\n\t') for result in results] | |
# Create a function to get emails since some are missing |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Import Libraries | |
import requests | |
import pandas as pd | |
from bs4 import BeautifulSoup | |
# Get webpage with requests | |
web_page = requests.get('https://jssuni.edu.in/JSSWeb/WebShowFromDB.aspx?MODE=SSMD&PID=10002&CID=3&DID=2&MID=0&SMID=10402') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Get first result | |
test_result = results[0] | |
# Name | |
test_result.find('h2').text | |
# Designation | |
test_result.find('p').contents[1].strip(' ') | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Parse the HTML | |
soup = BeautifulSoup(web_page.text, 'html.parser') | |
# Create Set with HTML tags based on results | |
results = soup.find_all('div', attrs={'class':'tab-pane active in fade'}) | |
# Check results and remove first tag since it isn't a profile block | |
len(results) | |
results = results[1:] | |
len(results) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Initialize Data Frame | |
df = pd.DataFrame() | |
# Names | |
df['Name'] = [result.find('h2').text for result in results] | |
# Designation | |
df['Designation'] = [result.find('p').contents[1].strip(' ') for result in results] | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.cluster import KMeans | |
import json | |
import plotly | |
import plotly.graph_objs as go | |
class KmeansGrouper: | |
# may need a default about of clusters (maybe 2?) since a convergence warning due to duplicates values | |
# ex: only found 2 when we requested 4. |
OlderNewer