Skip to content

Instantly share code, notes, and snippets.

@vinovator
vinovator / pdfTextMiner.py
Last active April 20, 2023 03:47
A sample code which uses pdfminer module to extract text from pdf files
# pdfTextMiner.py
# Python 2.7.6
# For Python 3.x use pdfminer3k module
# This link has useful information on components of the program
# https://euske.github.io/pdfminer/programming.html
# http://denis.papathanasiou.org/posts/2010.08.04.post.html
''' Important classes to remember
PDFParser - fetches data from pdf file
@vinovator
vinovator / portScanner.py
Created October 8, 2015 15:39
Simple Python socket program to scan TCP ports
# python 2.7.6.
# portScanner.py
import socket
from datetime import datetime
import sys
# Here we are scanning your own terminal
# Replace this with gethostbyname("host") to scan a remote host
@vinovator
vinovator / timeZoneExplorer.py
Last active October 9, 2015 20:39
Simple query to fetch all common time zones and their current time
# Python 2.7.6
# timeZoneExplorer.py
from pytz import timezone, common_timezones # import all_timezones for more exhaustive list
from datetime import datetime
import os
# Log file will be created in the same folder as the python script
my_path = "."
log_path = os.path.join(my_path + "/" + "loc_log.txt")
@vinovator
vinovator / jsonToCsv2.py
Last active November 3, 2015 13:59
Scans a JSON file and extracts the key value pairs to CSV
# jsonToCSV.py
# Python 2.7.6
'''
Place all the json payloads as separate text files in base folder
Program will extract each payload and generate single csv file
csv file will have key value pairs in separate columns
'''
import json
@vinovator
vinovator / forbes2kMiner.py
Last active March 2, 2018 22:46
Scrape JS rendered website using Selenium, PhantomJS and BeautifulSoup and wrangle the data using pandas. Extract Forbes 2000 list, process and import to csv file.
# forbes2kMiner.py
# Python 3.4
"""
Extracts the Forbes Global 2000 list of companies and imports into a CSV file
Since Forbes is a JS rendered site, selenium is used to mimic user action
BeautifulSoup is used to scrape html content
Since selenium is used, Firefox is needed as webdiver
"""
@vinovator
vinovator / persistListOfDicts.py
Last active November 13, 2015 16:28
Persist a list of dicts using pickle
# persistListOfDicts.py
# Python 2.7.6
import json
import os
import pickle # To persist each dict
json_path = "./JSON"
@vinovator
vinovator / Logger.py
Last active November 16, 2015 17:20
Basic logging example using logging module
# Logger.py
# Python2.7.6
# For more details - https://docs.python.org/3/howto/logging.html#logging-basic-tutorial
# logging.error - just displays the error message
# logging.exception - displays the stack trace along with the error message
import logging # For logs
import sys # To read parameters from command line
# Define the format of the logging
@vinovator
vinovator / persistListOfDicts1.py
Created November 16, 2015 17:13
Persist dicts using Json instead of pickle
# persistListOfDicts.py
# Python 2.7.6
import json
import os
json_path = "./JSON"
# Write dicts into a pickle file each
@vinovator
vinovator / DemergePDF.py
Created November 17, 2015 10:35
Divide a PDF file into 2 separate PDF files using PyPDF2 module
#Python 2.7.6
#DemergePDF.py
#Gets raw_inputs of 1 PDF file names from user and demerge into 2
import PyPDF2
import os
def getFileNameFromUser (file, path):
pdf_file_name = raw_input("Enter {0} name: ".format(file))
if pdf_file_name in os.listdir(path):
@vinovator
vinovator / CombinePDF_Py2.py
Created November 17, 2015 10:36
Combine 2 PDF files into a single file using PyPDF2 module. Python 2.7.6 version
#Python 2.7.6
#CombinePDF_Py2.py
#Gets raw_inputs of 2 PDF file names from user and combines them into 1
import PyPDF2
import os
def getFileNameFromUser (file, path):
pdf_file_name = raw_input("Enter {0} name: ".format(file))
if pdf_file_name in os.listdir(path):