Divay Jindal divayjindal95

## scape_zomato.py
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd

df = pd.DataFrame(columns = ["Name","Cuisine","Rating","Reviews","Votes"])

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get("https://www.zomato.com/bangalore/order-food-online",headers=headers)

## terminologies
--------------------------------------------------------  Edit to Enlarge  ----------------------------------------------


Apache spark - Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.[1] Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).[2] However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications.

Database pipelining - http://www.tuplejump.com/img/ff08.theplatform.png
                      As you will notice it's just not about processing the data, but involves a lot of other components. Collection, storage, exploration, ML and visualization are critical to the proect's success.


SOLR -  Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.
	from bs4 import BeautifulSoup
	import requests
	import re
	import pandas as pd

	df = pd.DataFrame(columns = ["Name","Cuisine","Rating","Reviews","Votes"])

	headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
	response = requests.get("https://www.zomato.com/bangalore/order-food-online",headers=headers)
	-------------------------------------------------------- Edit to Enlarge ----------------------------------------------


	Apache spark - Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.[1] Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).[2] However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications.

	Database pipelining - http://www.tuplejump.com/img/ff08.theplatform.png
	As you will notice it's just not about processing the data, but involves a lot of other components. Collection, storage, exploration, ML and visualization are critical to the proect's success.


	SOLR - Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.