Skip to content

Instantly share code, notes, and snippets.

View dynamicguy's full-sized avatar
🎯
Focusing

Nurul Ferdous dynamicguy

🎯
Focusing
View GitHub Profile
@dynamicguy
dynamicguy / text_cleaner.py
Created April 10, 2023 11:07
remove standard noise from text
def text_cleaner(text):
rules = [
{r'>\s+': u'>'}, # remove spaces after a tag opens or closes
{r'\s+': u' '}, # replace consecutive spaces
{r'\s*<br\s*/?>\s*': u'\n'}, # newline after a <br>
{r'</(div)\s*>\s*': u'\n'}, # newline after </p> and </div> and <h1/>...
{r'</(p|h\d)\s*>\s*': u'\n\n'}, # newline after </p> and </div> and <h1/>...
{r'<head>.*<\s*(/head|body)[^>]*>': u''}, # remove <head> to </head>
{r'<a\s+href="([^"]+)"[^>]*>.*</a>': r'\1'}, # show links instead of texts
{r'[ \t]*<[^<]*?/?>': u''}, # remove remaining tags
@dynamicguy
dynamicguy / s3batch.rb
Last active April 6, 2023 18:17
batch update s3 objects metadata with proper mime-type
#!/usr/bin/env ruby
require 'aws-sdk-s3'
require 'rack'
class BucketListObjectsWrapper
attr_reader :bucket
def initialize(bucket)
@bucket = bucket
@dynamicguy
dynamicguy / ocr.ipynb
Created May 26, 2022 04:57 — forked from hxy9243/ocr.ipynb
Excel OCR example
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# Basic
sudo apt-get -y update
sudo apt-get -qq install -y build-essential
# OpenCV
sudo apt-get -qq install -y libopencv-dev
sudo apt-get -qq install -y libtesseract-dev
# General dependencies
sudo apt-get -qq install -y libatlas-base-dev libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get -qq install -y --no-install-recommends libboost-all-dev
# Remaining dependencies, 14.04
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
"""
get_info() function reads the image using openCV and performs thresholding, dilation, noise removal, and
contouring to finally retrieve bounding boxes from the contour.
Below are some of the additional available functions from openCV for preprocessing:
Median filter: median filter blurs out noises by taking the medium from a set of pixels
cv2.medianBlur()
from skimage import io, color, img_as_float
from skimage.feature import corner_peaks, plot_matches
import matplotlib.pyplot as plt
import numpy as np
from skimage import io, img_as_float, color, exposure
img = img_as_float(io.imread('./ml/old-front.jpg'))
/*
Copyright 2016 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
DynamicGuy Contributor License Agreement
In order to clarify the intellectual property license granted with Contributions from any person or entity, the open source project DynamicGuy ("DynamicGuy") must have a Contributor License Agreement (CLA) on file that has been signed by each Contributor, indicating agreement to the license terms below. This license is for your protection as a Contributor as well as the protection of DynamicGuy and its users; it does not change your rights to use your own Contributions for any other purpose.
You accept and agree to the following terms and conditions for Your present and future Contributions submitted to DynamicGuy. Except for the license granted herein to DynamicGuy and recipients of software distributed by DynamicGuy, You reserve all right, title, and interest in and to Your Contributions.
Definitions. "You" (or "Your") shall mean the copyright owner or legal entity authorized by the copyright owner that is making this Agreement with DynamicGuy. For legal entities,