Skip to content

Instantly share code, notes, and snippets.

View umihico's full-sized avatar

Umihiko Iwasa umihico

View GitHub Profile
@umihico
umihico / README.md
Last active December 29, 2018 08:04
python script to use google translation API

python script to use google translation API

set your own apikey in credentials.py.

google_apikey = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

then test it.

$ python google_translation_api.py
['こんにちは', '東京']
@umihico
umihico / README.md
Last active December 29, 2018 06:48
script to install selenium on ec2

script to install selenium on ec2

Install

curl https://gist.githubusercontent.com/umihico/538217c829b6ab34e3ecba32eff22112/raw/selenium-setup-ec2.sh|bash -xv
@umihico
umihico / README.md
Last active March 4, 2019 04:59
script to install pyenv on ec2

script to install pyenv on ec2

Install

curl https://gist.githubusercontent.com/umihico/397b0e21a4d5f954ba35416aa28e8b3f/raw/pyenv-ec2-setup.sh|sudo bash -xv
@umihico
umihico / README.md
Last active December 29, 2018 07:42
script to install rbenv on ec2

script to install rbenv on ec2

Install

curl https://gist.githubusercontent.com/umihico/c925549e74eccd1690287c6dee8e53af/raw/rbenv-ec2-setup.sh|sudo bash -xv
@umihico
umihico / README.md
Last active December 30, 2018 03:02
setup scripts for baby EC2

setup scripts for baby EC2

How to run

curl https://gist.githubusercontent.com/umihico/591f0d3e07b2efda3896443edbd253ef/raw/install-selector.sh|sudo bash

That's it!

How to use

You'll see menu after curl command above.

@umihico
umihico / README.md
Last active December 29, 2018 03:27
bash script to create gist and clone it.

bash script to create gist and clone it.

Install

$ git clone https://gist.github.com/737d0872b075ee6aae411a87162ed5fd.git create-gist-cli
$ cd create-gist-cli
$ echo YOUR-GITHUB-API-KEY>apikey_create_gist.txt

Create gist

@umihico
umihico / README.md
Last active September 24, 2019 00:20
Customizing EC2 ssh welcome logo

Customizing EC2 ssh welcome logo

BEFORE

BEFORE

AFTER

AFTER

Install

you can run install by this command (you'll get my icon togather unfortunatly)
curl https://gist.githubusercontent.com/umihico/2b3d28010ab185999eb6eef50ac7c3d7/raw/install|bash -xv

日本でもスクレイピングの注意点を指摘する記事は多数ありますが、具体的な事例、方針は

  • Librahack事件
  • 1秒1アクセス
  • robots.txt

 程度なので海外の事例はないのか?と調べたらやっぱりありました。スクレイピングをブロックするサービスを提供する会社distil networksの記事 "Is Web Scraping Illegal? Depends on What the Meaning of the Word Is Is."を事件一覧として、各事件をググって詳細を調べました。箇条書き形式で参考文献もそのまま付け足しています。

Information wants to be free
@umihico
umihico / download_github.py
Last active September 3, 2018 07:44
description
import itertools
from requests import get
pagelist_get = get
from lxml.html import fromstring
from traceback import print_exc
from umihico_commons.functools import map_multithreading, flatten, save_as_txt
from umihico_commons.proxy import ProxyRequests
from umihico_commons.chrome_wrapper import Chrome, Keys
from time import sleep
from tqdm import tqdm