Skip to content

Instantly share code, notes, and snippets.

View divayjindal95's full-sized avatar

Divay Jindal divayjindal95

View GitHub Profile
@divayjindal95
divayjindal95 / scape_zomato.py
Last active September 9, 2019 18:31
Zomato scrapper
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
df = pd.DataFrame(columns = ["Name","Cuisine","Rating","Reviews","Votes"])
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get("https://www.zomato.com/bangalore/order-food-online",headers=headers)
@divayjindal95
divayjindal95 / terminologies
Created April 17, 2016 17:58 — forked from karimkhanp/terminologies
Dumping all terminologies, tool and technology required for BigData
-------------------------------------------------------- Edit to Enlarge ----------------------------------------------
Apache spark - Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.[1] Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).[2] However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications.
Database pipelining - http://www.tuplejump.com/img/ff08.theplatform.png
As you will notice it's just not about processing the data, but involves a lot of other components. Collection, storage, exploration, ML and visualization are critical to the proect's success.
SOLR - Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.