Skip to content

Instantly share code, notes, and snippets.

View shiumachi's full-sized avatar

Sho Shimauchi shiumachi

View GitHub Profile
@shiumachi
shiumachi / cosine_similarities.py
Last active January 5, 2018 17:20
tfidfとbigramによるコサイン類似度
import pandas as pd
import numpy as np
import MeCab
from collections import Counter
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
m = MeCab.Tagger("-Ochasen")
m2 = MeCab.Tagger("-Owakati")
@shiumachi
shiumachi / get_socks_proxy_command.sh
Created August 15, 2017 04:39
指定したAWSインスタンスに対するSOCKSプロキシを構築するコマンドを取得する
PROFILE=your_profile
INSTANCE_NAME=your_instance_name
SSH_KEYPATH=your_ssh_key_path
PUBLIC_HOSTNAME=`aws --profile ${PROFILE} ec2 describe-instances | jq -r ".Reservations[] | select(.Instances[0].Tags[].Value == \"${INSTANCE_NAME}\") | .Instances[0] | .PublicDnsName"`
echo "establish SOCKS proxy"
echo "ssh -i ${SSH_KEYPATH} -D 8157 -q ec2-user@${PUBLIC_HOSTNAME}"
@shiumachi
shiumachi / get_instance_hostname_and_ipaddress.sh
Created August 15, 2017 04:30
AWS上の特定のインスタンスのホスト名とIPアドレスを取得する
PROFILE=your_profile
INSTANCE_NAME=your_instance_name
aws --profile ${PROFILE} ec2 describe-instances | jq -r ".Reservations[] | select(.Instances[0].Tags[].Value == \"${INSTANCE_NAME}\") | .Instances[0] | {PrivateDnsName: .PrivateDnsName, PrivateIpAddress: .PrivateIpAddress, PublicDnsName: .PublicDnsName, PublicIpAddress: .PublicIpAddress}"
@shiumachi
shiumachi / dirs_compressor.py
Created January 28, 2017 12:29
Compress multiple directories into each archives
# dirs_compressor.py
#
# Usage:
# $ python dirs_compressor.py targed_dir
#
import sys
import os
import os.path
import logging
@shiumachi
shiumachi / tips_python_web_crawler.md
Last active January 4, 2017 01:17
PythonでのWebクローラ作成時に学んだことメモ

例外

try:
except (ConnectionError, ChunkedEncodingError, TooManyRedirects, NewConnectionError) as e:
                logging.warn("Skip URL {} Reason: {}".format(url, e))

SQLite

@shiumachi
shiumachi / hive_many_columns.py
Created September 2, 2016 02:54
create Hive table with 100K columns
COLS=100000
with open("/tmp/create_table_with_many_columns.hql", "w") as f:
f.write("CREATE TABLE many_cols_tbl (\n")
for i in xrange(COLS):
f.write("id%d INT,\n" % (i,))
f.write("last_id INT);\n")
@shiumachi
shiumachi / create_ansible_directory_layout.sh
Created September 4, 2015 02:16
Create Ansible Directory Layout Based on Ansible Document
#!/bin/bash
# Create Ansible directory layaout based on Ansible Documentation http://docs.ansible.com/ansible/playbooks_best_practices.html
#
# inventory file for production servers
touch production
# inventory file for staging environment
touch staging
@shiumachi
shiumachi / sentence_generator.py
Created June 4, 2014 15:56
簡単な文章を自動生成するWSGIサーバ
#! /usr/bin/env python3
import random
from wsgiref import simple_server
誰 = ['太郎', '二郎', '花子']
どこ = ['東京', '大阪', '名古屋']
どうした = ['泳いだ', '走った', '仕事した']
def pick(l):
@shiumachi
shiumachi / hipchatdump2csv.py
Last active February 15, 2017 19:22
convert hipchat message logs from json to csv.
# coding=utf-8
#
# hipchatdump2csv.py
#
# convert hipchat message logs from json to csv.
# csv format:
# from.name, from.user_id, date, message
#
# usage: python convert.py
#
@shiumachi
shiumachi / match_literal_text.py
Created February 3, 2014 11:55
制御文字や句読点にマッチする正規表現
import re
# ref: Regular Expression Cookbook p.26
text = """!\"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
aaa
bbb
"""