Skip to content

Instantly share code, notes, and snippets.

@vinovator
vinovator / nltk_name_classifier.py
Last active August 10, 2023 07:36
Classifier to determine the gender of a name using NLTK library
# nltk_name_classifier.py
# Python 2.7.6
"""
Classifier to determine the gender of a name using NLTK library
Classification - task of choosing the correct class label for a given input.
Supervised classifier:
Classifier that is built on training corpora containing the correct label
@vinovator
vinovator / crawler.py
Created April 29, 2016 13:50
Crawl a page and extract all urls recursively within same domain
# crawler.py
# Python 2.7.6
"""
Crawl a page and extract all urls recursively within same domain
"""
from BeautifulSoup import BeautifulSoup
@vinovator
vinovator / mailbot.py
Last active April 19, 2016 20:57
Mail bot sript that sends predefined response to predefined mails. Intended for raspberry pi, which has its dedicated mail id
# mailbot.py
# python 2.7.6
"""
Mail bot sript that sends predefined response to predefined mails
Intended for raspberry pi, which has its dedicated mail id
Algorithm
1) Check a dedicated mailbox inbox for "unread" mails
2) For each "unread" mail, fetch the sender, subject and content
@vinovator
vinovator / uk_mba.py
Created April 7, 2016 14:44
Scraper to extract Bschool information from find-mba.com
# uk_mba.py
# Python 2.7.6
"""
Extract business schools in UK with AACSB, AMBA and/or EQUIS accredition only
Scapring from http://find-mba.com/
"""
import requests
from BeautifulSoup import BeautifulSoup
@vinovator
vinovator / mongo_test_restaurants.py
Created March 17, 2016 16:08
Script to interact with MongoDB using pymongo driver
# mongo_test_restaurants.py
# Python 2.7.6
"""
Test script to connect to MongoDB collections using pymongo library
Connects to an already imported connection named "restaurants"
source - https://docs.mongodb.org/getting-started/python/
"""
from pymongo import MongoClient, ASCENDING, DESCENDING
@vinovator
vinovator / world_t20_itinerary.py
Last active March 4, 2016 16:54
Scrap Workd T20 schedule from ICC website using BeautifulSoup & Requests and format the excel output
# world_t20_itinerary.py
# Python 2.7.6
"""
Scrap Workd T20 schedule from ICC website using BeautifulSoup & Requests
Load the schedule into an excel file using pandas
Format the excel file using openpyxl
- Apply border, wrap text and color headers
- Highlight India matches
"""
@vinovator
vinovator / PdfAdapter.py
Created February 25, 2016 17:23
Reusable library to extract text from pdf file
# Python 2.7.6
# PdfAdapter.py
""" Reusable library to extract text from pdf file
Uses pdfminer library; For Python 3.x use pdfminer3k module
Below links have useful information on components of the program
https://euske.github.io/pdfminer/programming.html
http://denis.papathanasiou.org/posts/2010.08.04.post.html
"""
@vinovator
vinovator / tk_PromptPassword.py
Last active October 18, 2020 13:50
Reusable library which pops tkinter window to prompt password
# tk_PromptPassword.py
# Python 2.7.6
"""
Reusable library which pops tkinter window to prompt password
"""
import Tkinter as tk
import tkMessageBox as tkm # To show warning/ error messages
import logging
@vinovator
vinovator / imapMailboxMiner.py
Last active November 23, 2021 10:07
Python script to mine IMAP Mail servers, such as yahoo, gmail etc. Edit the IMAP server name to mine a particular mail service.
# imapMailboxMiner.py
# Python 2.7.6
"""
Connect to IMAP4 server and fetch mails
http://www.voidynullness.net/blog/2013/07/25/gmail-email-with-python-via-imap/
"""
import imaplib # Library to interact with IMPAP server
import sys
@vinovator
vinovator / checkDuplicates.py
Last active February 20, 2024 07:04
Python script to find duplicate files from a folder
# checkDuplicates.py
# Python 2.7.6
"""
Given a folder, walk through all files within the folder and subfolders
and get list of all files that are duplicates
The md5 checcksum for each file will determine the duplicates
"""
import os