Skip to content

Instantly share code, notes, and snippets.

@danijar
danijar / blog_tensorflow_variable_sequence_labelling.py
Last active May 15, 2022 14:28
TensorFlow Variable-Length Sequence Labelling
# Working example for my blog post at:
# http://danijar.com/variable-sequence-lengths-in-tensorflow/
import functools
import sets
import tensorflow as tf
from tensorflow.models.rnn import rnn_cell
from tensorflow.models.rnn import rnn
def lazy_property(function):

How to install OpenCV 3.1 on Ubuntu 14.04 64bits

Update latest packages and installed

$ sudo apt-get update
$ sudo apt-get upgrade

apt-get update - 更新最新的套件資訊 apt-get upgrade - 更新套件

@ramhiser
ramhiser / one-hot.py
Last active April 7, 2021 06:44
Apply one-hot encoding to a pandas DataFrame
import pandas as pd
import numpy as np
from sklearn.feature_extraction import DictVectorizer
def encode_onehot(df, cols):
"""
One-hot encoding is applied to columns specified in a pandas DataFrame.
Modified from: https://gist.github.com/kljensen/5452382
@willpatera
willpatera / Google-Sheet-Form-Post.md
Last active November 20, 2023 06:53
Post to google spreadsheet from html form

Overview

This collection of files serves as a simple static demonstration of how to post to a google spreadsheet from an external html <form> following the example by Martin Hawksey

Depreciation Warning: This code is not maintained, and should be seen as reference implementation only. If you're looking to add features or update, fork the code and update as needed.

Run example

You should be able to just open index.html in your browser and test locally.

import numpy as np
import marisa_trie
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.externals import six
class MarisaCountVectorizer(CountVectorizer):
# ``CountVectorizer.fit`` method calls ``fit_transform`` so
# ``fit`` is not provided
def fit_transform(self, raw_documents, y=None):
@songron
songron / NTU - Machine Learning
Last active August 25, 2018 08:11
Codes for Machine Learning Foundations(NTU) 台湾国立大学《机器学习基石》(Coursera版)相关的代码、编程作业等。
Codes for Machine Learning Foundations(NTU)
台湾国立大学《机器学习基石》(Coursera版)相关的代码、编程作业等。
课程地址:https://class.coursera.org/ntumlone-001/
@wpm
wpm / spark_parallel_boost.py
Last active December 3, 2018 02:56
A simple example of how to integrate the Spark parallel computing framework and the scikit-learn machine learning toolkit. This script randomly generates test and train data sets, trains an ensemble of decision trees using boosting, and applies the ensemble to the test set. The ensemble training is done in parallel.
from pyspark import SparkContext
import numpy as np
from sklearn.cross_validation import train_test_split, Bootstrap
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
def run(sc):
@luw2007
luw2007 / 词性标记.md
Last active March 18, 2024 06:36
词性标记: 包含 ICTPOS3.0词性标记集、ICTCLAS 汉语词性标注集、jieba 字典中出现的词性、simhash 中可以忽略的部分词性

词的分类

  • 实词:名词、动词、形容词、状态词、区别词、数词、量词、代词
  • 虚词:副词、介词、连词、助词、拟声词、叹词。

ICTPOS3.0词性标记集

n 名词

nr 人名

@dreampuf
dreampuf / gist:3946886
Created October 24, 2012 15:51
Pinyin Split
#!/usr/bin/env python
#vim: encoding=utf-8
"""
拼音分词
"""
__author__ = "dreampuf<soddyque@gmail.com>"
import unittest
from unicodedata import *
script_data = {
"names":['Common', 'Latin', 'Greek', 'Cyrillic', 'Armenian', 'Hebrew', 'Arabic',
'Syriac', 'Thaana', 'Devanagari', 'Bengali', 'Gurmukhi', 'Gujarati', 'Oriya',
'Tamil', 'Telugu', 'Kannada', 'Malayalam', 'Sinhala', 'Thai', 'Lao', 'Tibetan',
'Myanmar', 'Georgian', 'Hangul', 'Ethiopic', 'Cherokee', 'Canadian_Aboriginal',
'Ogham', 'Runic', 'Khmer', 'Mongolian', 'Hiragana', 'Katakana', 'Bopomofo',
'Han', 'Yi', 'Old_Italic', 'Gothic', 'Deseret', 'Inherited', 'Tagalog',
'Hanunoo', 'Buhid', 'Tagbanwa', 'Limbu', 'Tai_Le', 'Linear_B', 'Ugaritic',