Skip to content

Instantly share code, notes, and snippets.

View rldotai's full-sized avatar

Brendan Bennett rldotai

View GitHub Profile
@dashed
dashed / github-pandoc.css
Created September 26, 2013 13:42
GitHub-like CSS for pandoc standalone HTML files (perfect for HTML5 output). Based on Marked.app's GitHub CSS. Added normalize.css (v2.1.3) in the prior to GitHub css.
/*! normalize.css v2.1.3 | MIT License | git.io/normalize */
/* ==========================================================================
HTML5 display definitions
========================================================================== */
/**
* Correct `block` display not defined in IE 8/9.
*/
@RishabhVerma
RishabhVerma / Flask-Restful_S3_File_Upload.py
Last active November 3, 2021 01:41
Uploading a file to S3 while using Flask with Flask-Restful to create a REST API.
# -*- coding: utf-8 -*-
"""
An example flask application showing how to upload a file to S3
while creating a REST API using Flask-Restful.
Note: This method of uploading files is fine for smaller file sizes,
but uploads should be queued using something like celery for
larger ones.
"""
from cStringIO import StringIO
@debasishg
debasishg / gist:8172796
Last active August 24, 2024 13:55
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t
@stefanschmidt
stefanschmidt / epub2pdf.sh
Created May 24, 2015 23:32
Convert from EPUB to PDF format
# depends on Calibre (http://calibre-ebook.com)
# the CSS snippet prevents images from filling the page
# adapt margins, page size and font size as needed
ebook-convert doc.epub doc.pdf \
--smarten-punctuation \
--pretty-print \
--preserve-cover-aspect-ratio \
--insert-blank-line \
--margin-top 60 \
--margin-left 60 \
@bartvm
bartvm / dl-frameworks.rst
Last active December 7, 2020 18:18
A comparison of deep learning frameworks

A comparison of Theano with other deep learning frameworks, highlighting a series of low-level design choices in no particular order.

Overview

Symbolic: Theano, CGT; Automatic: Torch, MXNet

Symbolic and automatic differentiation are often confused or used interchangeably, although their implementations are significantly different.

@manigandham
manigandham / rich-text-html-editors.md
Last active July 23, 2024 12:07
Rich text / HTML editors and frameworks

Strictly Frameworks

Abstracted Editors

These use separate document structures instead of HTML, some are more modular libraries than full editors

@dannguyen
dannguyen / EXAMPLE_WATSON_API_README.md
Last active November 23, 2020 13:32
Transcribing ProPublica podcast with Python and Watson Speech to Text API

Using IBM Watson Speech to Text API to translate a ProPublica podcast

An example of using the Watson Speech to Text API to translate a podcast from ProPublica: How a Reporter Pierced the Hype Behind Theranos

This is just a simpler demo of the same technique I demonstrate to make automated video supercuts in this repo: https://github.com/dannguyen/watson-word-watcher

The transcription takes just a few minutes (less if you parallelize the requests to IBM) and is free...but it isn't perfect by any means. It doesn't fare super well on proper nouns:

  • Charles Ornstein's last name is transcribed as Orenstein
  • John Carreyrou's last name becomes John Kerry Roo
@dannguyen
dannguyen / selenium-screenshotting.md
Last active February 15, 2023 15:59
Using Selenium and Python to screenshot a javascript-heavy page

Using Selenium and Python to screenshot a javascript-heavy page

As websites become more JavaScript heavy, it's harder to automate things like screenshotting for archival purposes. I've seen examples and suggestions to use PhantomJS for visual testing/archiving of websites, but have run into issues such as the non-rendering of webfonts. I've never tried out Selenium until today...and while I'm not thinking about performance implications yet, Selenium seems far more accurate than PhantomJS...which makes sense since it actually opens a real browser. And it's not too hard to script to do complex interactions: here's an [example of how to log in to Twitter, write a tweet, upload an image, and send a tweet via Selenium and DOM element selection](https://gist.github.com/dannguyen/8a6fa49253c1d6a0eb92

@dannguyen
dannguyen / README.md
Last active July 6, 2024 16:36
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.