Skip to content

Instantly share code, notes, and snippets.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@upepo
upepo / Data_Wrangling.md
Last active December 5, 2016 14:16
기본적인 데이터 전처리 스킬정리

Data Wrangling (데이터 전처리)

Text Processing

데이터마이닝에 필요한 텍스느 전처리 기술 소개. 리눅스 환경에서 bash와 python을 이용하여 텍스트 데이터를 필요한 형태로 처리하는데 중점을 둔다.

### MATPLOTLIBRC FORMAT
# This is a sample matplotlib configuration file - you can find a copy
# of it on your system in
# site-packages/matplotlib/mpl-data/matplotlibrc. If you edit it
# there, please note that it will be overwritten in your next install.
# If you want to keep a permanent local copy that will not be
# overwritten, place it in HOME/.matplotlib/matplotlibrc (unix/linux
# like systems) and C:\Documents and Settings\yourname\.matplotlib
# (win32 systems).
@upepo
upepo / hive
Last active August 29, 2015 14:07
hive memo
hive memo
String functions in hive
- http://www.folkstalk.com/2011/11/string-functions-in-hive.html
@upepo
upepo / display.md
Last active August 29, 2015 14:07
전시

2015

2015/01/13

  • The Internet of Things has four big data problems : 데이터가 파편화. 데이터가 많지만 한회사가 전체를 가지고 분석하기는 힘들걸? 그래서 inference도 더 필요하고, 데이터가지고 아전인수격의 결론도 많이 낼거다. (센싱데이터의 거래소가 생길까?...)
    1. Nobody will wear 50 devices
@upepo
upepo / random_sample.py
Created January 13, 2015 05:10
random sampling
#!/usr/bin/env python
import sys
import random
import argparse
parser = argparse.ArgumentParser(description='')
parser.add_argument('--ratio','-r', type=float, nargs='?'
,required=True
,help='ratio for random sampling')
{
"IAB1": "Arts & Entertainment",
"IAB1-1": "Books & Literature",
"IAB1-2": "Celebrity Fan/Gossip",
"IAB1-3": "Fine Art",
"IAB1-4": "Humor",
"IAB1-5": "Movies",
"IAB1-6": "Music",
"IAB1-7": "Television",
"IAB2": "Automotive",