Skip to content

Instantly share code, notes, and snippets.

View SurendraTamang's full-sized avatar
🎯
Focusing

surendra SurendraTamang

🎯
Focusing
View GitHub Profile
@SurendraTamang
SurendraTamang / keybase.md
Created April 30, 2018 02:15
This gist is about owing my github account on keybase

Keybase proof

I hereby claim:

  • I am surendratamang on github.
  • I am surendra40 (https://keybase.io/surendra40) on keybase.
  • I have a public key ASAeuGtm34f3WTzmi2_YhpUxJzil6M4q-kkuPlEPiQaVIAo

To claim this, I am signing this object:



Scrapy

Open source web scraping framework where scraping means downloading the data and crawling means extracting the data from it. It manages,requests,parses html, collects data and saves it to our desired format.

We can download it by just typing

		pip install scrapy
@SurendraTamang
SurendraTamang / get_original_url.py
Created July 18, 2020 04:13
Extracting the original URL from the referrer URL
"""
For finding the real url
"""
import requests
def find_original_url(refrence_site,urlhost):
return requests.get(url).url.replace(f'?ref={refrence_site}','')
if __name__ == "__main__":
@SurendraTamang
SurendraTamang / json_update
Created July 19, 2020 15:43
This gist is for updating the json file
with open(self.outputfile, 'r+', encoding="utf-8") as json_file:
load_json_file = json.load(json_file)
load_json_file['link_list'].append(updated_data)
json_file.seek(0)
json_file.write(json.dumps(
load_json_file, indent=2, ensure_ascii=False))
json_file.truncate()
@SurendraTamang
SurendraTamang / gist:498ed9e2f14c68425ff5107eee2eab5c
Created August 7, 2020 15:56
For getting response in request
LIST_OF_USER_AGENTS = ['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36',
'Mozilla/5.0 (Windows NT 4.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36',
'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.3319.102 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36',
]
@SurendraTamang
SurendraTamang / gist:3b49ca346d28f44debb57a7c34a1feb1
Created December 1, 2020 08:57
This is for greedy regression. that ends with two digits
industry_name = re.findall('\s([A-Za-z0-9:\s]*?)\s\s',tat)[0]
# Most of the time we need query the Model we dynamically we can do it by using
from django.apps import apps
# Suppose we have the Class Model of Keyword in the app as app
# app_name and model_name will be two parameters that is needed to be passed!
instance_keyword = apps.get_model('app','keyword')
@SurendraTamang
SurendraTamang / addingProfile.py
Created April 13, 2021 06:12
Adding profile in browser can help us to get rid of login every in browser.
'''This script will help us to add the profile on our Chrome Driver and hence
we can add the profile
'''
# Open the browser and Goto the profile we want file
# Navigate to the chrome://version/ to see the Profile Path
# THis is my defaule
USER_DATA_PATH = 'C:\\Users\\{Our_User_Name of PC}\\AppData\\Local\\Google\\Chrome\\User Data'
PROFILE_DIRECTORY = 'Profile 6'
options = webdriver.ChromeOptions()
@SurendraTamang
SurendraTamang / clear and join
Created May 13, 2021 04:02
Joining the extracted value in scrapy
def _clear_and_join(lst, sep=' '):
'''Returns the list with clearing unwanted things
'''
lst = [re.sub('[\xa0\r\t®Â ]+', ' ', t, flags=re.DOTALL).strip() for t in lst]
lst = list(filter(None, [re.sub('\s*\n\s*', sep, t, flags=re.DOTALL).strip() for t in lst]))
text = sep.join(lst)
return text
@SurendraTamang
SurendraTamang / example.py
Created June 13, 2021 07:34
xpath #shorts find while working with that
response.xpath(f'{NEIGHBOURHOOD_XPATH}/following-sibling::td//option[not(@value = " ")]/@value').extract()
# Here we can use the xpath condition not to include if the value is " "