Skip to content

Instantly share code, notes, and snippets.

@clemsos
clemsos / gensim_workflow.py
Last active February 22, 2022 11:09
How to calculate TF-IDF similarity matrix of a complete corpus with Gensim
#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
This script just show the basic workflow to compute TF-IDF similarity matrix with Gensim
OUTPUT :
@clemsos
clemsos / csv_to_elastic_search_bulk_insert.py
Last active February 27, 2024 10:15
Elastic Search : index large csv files with Python Pandas
from pyelasticsearch import ElasticSearch
import pandas as pd
from time import time
root_path="/home/clemsos/Dev/mitras/"
raw_data_path=root_path+"data/"
csv_filename="week10.csv"
t0=time()
@clemsos
clemsos / citations2tex.py
Last active August 29, 2015 14:02
Convert scientific citations in plain text to Latex
#!/usr/bin/python
# convert citations into latex format
#
# (Nivre et al., 2007)
# (Sagae and Tsujii 2007)
# Nivre (2007)
# (Chen et al., 2007; Dredze et al., 2007).
#
# \cite{Nivre2007}
@clemsos
clemsos / color_pages_pdf.sh
Created July 29, 2014 10:09
Count color and B&W pages in a PDF
#!/bin/bash
file="$1"
colorpages=0
# count all pages
totalpages=$(gs -q -dNODISPLAY -c "($1) (r) file runpdfbegin pdfpagecount = quit")
echo "Total pages : $totalpages"
# find pages with colors
for page in $(identify -density 12 -format '%p ' "$file") ; do
@clemsos
clemsos / odt_to_tex.sh
Created July 29, 2014 10:12
Couvert Word (.odt) files to Latex
#!/bin/bash
w2l -config ./w2l-config.xml chapitre-xxx.odt chapters/chapitre-xxx.tex
@clemsos
clemsos / run_tests.sh
Last active August 29, 2015 14:04
Bash script to run test with colored output using Python nosetest (all files or a single file)
#!/bin/bash
# USAGE :
#
# chmod +x run_tests.sh
# ./run_tests.sh # run all tests
# ./run_tests.sh xxx.py # run a single test
#
test_dir=`pwd`/tests
@clemsos
clemsos / multiple_pages_pdf_to_svg_inkscape.sh
Created September 4, 2014 14:49
PDF Multiple pages to clean SVG using Inkscape
#!/bin/bash
PDF_IN="LS58_Presentation_FR_EN.pdf"
BASENAME_OUT="junkware"
PAGE_START=2
PAGE_END=2
for ((i=$PAGE_START; i<=$PAGE_END; i++));
do
@clemsos
clemsos / index.html
Created September 25, 2015 12:01 — forked from anonymous/index.html
JS Bin // source https://jsbin.com/bedalu/1
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>JS Bin</title>
<style id="jsbin-css">
div {
width : 17px;
height: 19px;
color : white;
@clemsos
clemsos / gitbook_to_pdf.sh
Last active August 7, 2023 04:14
Build Gitbook PDF using Pandoc
# #!/bin/bash
GITBOOK_REP=$1
SUMMARY_FILE="SUMMARY.md"
echo $OUTPUT_FILE
if [ -d "$GITBOOK_REP" ]; then
echo "Entering directory '$GITBOOK_REP'..."
cd $GITBOOK_REP
@clemsos
clemsos / _.md
Last active April 5, 2016 08:33
D3 simple color selector