Skip to content

Instantly share code, notes, and snippets.

View letnotimitateothers's full-sized avatar

Amélie Medem letnotimitateothers

View GitHub Profile
@nadya-p
nadya-p / pdf_to_text.py
Last active August 15, 2022 04:42
Extract text contents of PDF files recursively
from tika import parser
import os
def extract_text_from_pdfs_recursively(dir):
for root, dirs, files in os.walk(dir):
for file in files:
path_to_pdf = os.path.join(root, file)
[stem, ext] = os.path.splitext(path_to_pdf)
if ext == '.pdf':
@DecisionNerd
DecisionNerd / csv2json.sh
Created November 13, 2015 03:13
CSV to JSON converter using BASH. Original script from http://blog.secaserver.com/2013/12/convert-csv-json-bash/
#!/bin/bash
# CSV to JSON converter using BASH
# original script from http://blog.secaserver.com/2013/12/convert-csv-json-bash/
# thanks SecaGuy!
# Usage ./csv2json.sh input.csv > output.json
input=$1
[ -z $1 ] && echo "No CSV input file specified" && exit 1
[ ! -e $input ] && echo "Unable to locate $1" && exit 1
@LorisBachert
LorisBachert / TikaExtractor.java
Last active August 16, 2023 11:43
Using Apache TIKA to extract the following formats: DOC, DOCX, PPT, PPTX, XLS, XLSX, PDF, JPG, PNG, TXT Note: Tesseract must be installed in order to get JPG and PNG extraction working.
/**
* Uses Tikas {@link AutoDetectParser} to extract the text of a file.
*
* @param document
* @return The text content of a file
*/
@Override
public String extractTextOfDocument(File file) throws Exception {
InputStream fileStream = new FileInputStream(file);
Parser parser = new AutoDetectParser();
@davegurnell
davegurnell / error-handling-in-scala.md
Created September 5, 2014 10:25
Error handling in Scala

Error Handling in Scala

Scala does not have checked exceptions like Java, so you can't do soemthing like this to force a programmer to deal with an exception:

public void stringToInt(String str) throws NumberFormatException {
  Integer.parseInt(str)
}
<!DOCTYPE html>
<html>
<head>
<title>@comeetie : carte données carroyées</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="http://code.jquery.com/ui/1.10.3/themes/smoothness/jquery-ui.css">
<script src="http://code.jquery.com/jquery-1.9.1.js"></script>
@hugoferreira
hugoferreira / UnixUtilities.scala
Last active April 11, 2016 23:47
Unix utilities in Scala
import java.io.{BufferedOutputStream, FileOutputStream, FileInputStream, BufferedInputStream}
import java.util.zip.{GZIPOutputStream, GZIPInputStream}
import scala.io.{Source, Codec}
import scala.language.{reflectiveCalls, implicitConversions}
object main extends App {
import utils._
val inFile = "/Users/bytter/Documents/Development/shiftforward/spitz/coopeventsfiltered.log.gz"
@rkaneko
rkaneko / Geocode.java
Last active July 19, 2016 10:47
The Snippet to call Google Geocode API on Play framework(v2.1.0) .
import org.codehaus.jackson.JsonNode;
import play.libs.WS;
import play.libs.F.Promise;
import play.libs.WS.WSRequestHolder;
/**
* Refer
* http://goo.gl/HXhJG
* Javadoc api/2.1.0/java/play/libs/WS.WSRequestHolder : http://goo.gl/AXmBY
# ========================================
# Testing n-gram analysis in ElasticSearch
# ========================================
curl -X DELETE localhost:9200/test
curl -X PUT localhost:9200/test -d '
{
"settings" : {
"index" : {
"analysis" : {
@opensas
opensas / status.scala
Created December 8, 2011 19:39
tiny little scala script to check for play documentation translated files, see http://www.dzone.com/links/first_steps_with_scala_say_goodbye_to_bash_scripts.html
#!/bin/sh
exec scala -savecompiled "$0" "$@"
!#
import java.io._
val docs = new File(".").listFiles
.filter(_.getName.endsWith(".textile")) // process only textile files
.map(new DocumentationFile(_))
@davemo
davemo / README.md
Created September 13, 2011 22:59
A simple Backbone.js powered Slideshow, with pause/play controls and jump-to controls.

#A simple Slideshow module wrapped in a Backbone View

  • Dependencies ** underscore.js ** backbone.js

Viewable in action in this jsfiddle