Skip to content

Instantly share code, notes, and snippets.

@deargle
deargle / tokenize.py
Last active April 22, 2021 12:32
Example of TfidfVectorizer with custom tokenizer that does basic stemming
# -*- coding: utf-8 -*-
"""
Created on Tue Apr 24 16:30:42 2018
@author: deargle
"""
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem.porter import PorterStemmer
import nltk
@petercossey
petercossey / ubuntu-powerline-install.md
Last active July 12, 2022 12:44
Powerline font install for Ubuntu 16.10 (also confirmed working for 17.04, 17.10, 18.04, 19.10)

Install Powerline fonts for Z shell

Please note: there is an APT package called "fonts-powerline" which is tested and working for Ubuntu 20.04 which achieves the same outcome. Try "sudo apt install fonts-powerline"

If you're using Z Shell and a special prompt theme designed with Powerline fonts in mind, you'll need to install them on your machine. These are the most clear and cut-down instructions that I've found to work with Ubuntu 16.10 (also confirmed working for 17.04, 17.10, 18.04, 19.10) and all credit goes to renshuki's Ubuntu 14.04 + Terminator + Oh My ZSH with Agnoster Theme gist. I've extracted just the Powerline font instructions - my personal setup uses Prezto instead of Oh My ZSH (not included here).

Get the font and config files

cd ~
Because of 'Meslo for Powerline' font doens't work with Putty.
So we need another patched font to display powerline correctly.
Here are the list:
- DejaVu Sans Mono for Powerline (https://github.com/powerline/fonts/tree/master/DejaVuSansMono)
- Droid Sans Mono for Powerline (https://github.com/powerline/fonts/tree/master/DroidSansMono)
To change font: On main window (Putty Configuration) -> Window -> Apearance -> Font settings -> Change
To test, enter this in the terminal screen: echo "\ue0b0 \u00b1 \ue0a0 \u27a6 \u2718 \u26a1 \u2699"
@clemsos
clemsos / gensim_workflow.py
Last active February 22, 2022 11:09
How to calculate TF-IDF similarity matrix of a complete corpus with Gensim
#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
This script just show the basic workflow to compute TF-IDF similarity matrix with Gensim
OUTPUT :