Skip to content

Instantly share code, notes, and snippets.

View poa00's full-sized avatar

poa00 poa00

  • 09:15 (UTC -05:00)
View GitHub Profile
@poa00
poa00 / test_winrt_ocr.py
Created May 3, 2024 12:00 — forked from wolfmanstout/test_winrt_ocr.py
Demonstrates using Python winrt to run OCR. Requires Python 3.7+.
import asyncio
import os
import winrt
from PIL import Image
from winrt.windows.graphics.imaging import BitmapDecoder, BitmapPixelFormat, SoftwareBitmap
from winrt.windows.media.ocr import OcrEngine
from winrt.windows.storage import StorageFile, FileAccessMode
import winrt.windows.storage.streams as streams
@poa00
poa00 / afullcode.py
Created April 27, 2024 22:28 — forked from raushanraj/afullcode.py
crawler 1... it is implemented in python..it start crawling from seed page and store all the links in repository..then it pop the link from repository and again recursively crawl it,simultaneously it build inverted index by storing hits(words on the pages) and their corresponding location on different pages crawled. also you can search any singl…
import urllib
def crawler(seedurl,max_depth):
tocrawl=[seedurl]
crawled=[]
index={}
graph={}
while tocrawl:
url=tocrawl.pop()
if url not in crawled and len(crawled)<=max_depth:
@poa00
poa00 / Fundamentals.ipynb
Created April 27, 2024 22:25 — forked from withLinda/Fundamentals.ipynb
Scrapr Fundamentals
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@poa00
poa00 / crawler2.py
Created April 27, 2024 22:24 — forked from raushanraj/crawler2.py
crawler2 it is coded to crawl site http://www.adafruit.com/ and store the information of all the products in a database
#scraper for http://www.adafruit.com/
from BeautifulSoup import *
#from nextpage import *
import urllib
import MySQLdb
#library to take care of the unicode problem
#More info here : http://stackoverflow.com/a/1207479
import unicodedata
db=MySQLdb.connect("127.0.0.1","root","","adafruit_products" )
from locale import currency
from freelancersdk.resources.projects import place_project_bid
from freelancersdk.session import Session
from freelancersdk.resources.users \
import get_self_user_id
from freelancersdk.resources.projects.projects import (
get_projects, get_project_by_id
)
from freelancersdk.resources.projects.helpers import (
create_get_projects_object, create_get_projects_project_details_object,
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
df = pd.DataFrame(list(zip([None if x == None else x.string for x in soup.find_all('h3')],
[None if x.find(attrs={'class':'price_color'}) == None else x.find(attrs={'class':'price_color'}).string.replace('£','') for x in soup.find_all(attrs={'class':'col-xs-6 col-sm-4 col-md-3 col-lg-3'})],
[None if x.find(attrs={'class':'instock availability'}).text == None else x.find(attrs={'class':'instock availability'}).text.strip() for x in soup.find_all(attrs={'class':'col-xs-6 col-sm-4 col-md-3 col-lg-3'})],
[None if x.find(attrs={'class':re.compile(r'star-rating$')}).get('class') == None else x.find(attrs={'class':re.compile(r'star-rating$')}).get('class')[1] for x in soup.find_all(attrs={'class':'col-xs-6 col-sm-4 col-md-3 col-lg-3'})])),
columns=['product_name','price','availability','rating'])
import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup
import re
import requests
import time
url = 'https://books.toscrape.com/catalogue/page-1.html'
driver = webdriver.Chrome()
driver.implicitly_wait(30)
@poa00
poa00 / batch remove 'Copy Of' from Google Drive Files.md
Created April 27, 2024 13:54
Remove the "Copy of" Prefix from all files in google drive

Remove 'Copy of' prefix from Google Drive filenames

xterm is a nifty Unix console emulator you can use in Google Colab notebooks. We can use it to manipulate files hosted on Google Drive in batch

step 1: mount your Drive in Colab

Paste the following code in a 'code' cell of the Colab notebook and click Run

from google.colab import drive
from splinter import Browser
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as bs
import pandas as pd
import requests
import os
import time