Skip to content

Instantly share code, notes, and snippets.

View secsilm's full-sized avatar
🚴
Focusing

Alan Lee secsilm

🚴
Focusing
View GitHub Profile
@secsilm
secsilm / str_count.py
Created March 25, 2017 14:04
使用 Python 统计字符串中英文、空格、数字、标点个数
# coding: utf-8
import string
from collections import namedtuple
def str_count(s):
'''找出字符串中的中英文、空格、数字、标点符号个数'''
count_en = count_dg = count_sp = count_zh = count_pu = 0
@secsilm
secsilm / children.csv
Created June 16, 2017 02:43
CSDN - 儿童统计
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
no gend age high weight
6 0 10 1.46 38
18 0 11 1.56 48
17 0 11 1.5 40
7 1 10 1.48 39
12 1 10 1.43 43
26 1 12 1.64 60
15 0 10 1.48 39
45 0 10 1.43 35
21 1 11 1.55 46
@secsilm
secsilm / simple-autoencoder.ipynb
Created July 25, 2017 11:52
simple wrong autoencoder
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

使用集成学习提升机器学习算法性能

这篇文章是对 PythonWeekly 推荐的一篇讲集成模型的文章的翻译,原文为 Ensemble Learning to Improve Machine Learning Results,由 Vadim Smolyakov 于 2017 年 8 月 22 日发表在 Medium 上,Vadim Smolyakov 是一名 MIT 的研究生,对数据科学和机器学习充满热情。

集成学习(Ensemble Learning)通过联合几个模型来帮助提高机器学习结果。与单一模型相比,这种方法可以很好地提升模型的预测性能。这也是为什么集成模型在很多著名机器学习比赛中被优先使用的原因,例如 Netflix 比赛,KDD 2009 和 Kaggle。

集成方法是一种将几种机器学习技术组合成一个预测模型的元算法(meta-algorithm),以减小方差(bagging),偏差(boosting),或者改进预测(stacking)。

集成方法可以分为两类:

@secsilm
secsilm / cartopy-image.py
Last active September 4, 2017 13:20
Add image on the top of map using cartopy
'''
This code is a example for adding image on the top of map using cartopy.
The generated image can be found here: https://i.imgur.com/aTY1rYY.png
'''
import matplotlib.pyplot as plt
import cartopy.crs as crs
from matplotlib.offsetbox import AnnotationBbox, OffsetImage
from PIL import Image
@secsilm
secsilm / color_scatter.html
Created March 15, 2018 05:39
The generated html file by color scatter example
This file has been truncated, but you can view the full file.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>color_scatter.py example</title>
<link rel="stylesheet" href="https://cdn.pydata.org/bokeh/release/bokeh-0.12.14.min.css" type="text/css" />
<script type="text/javascript" src="https://cdn.pydata.org/bokeh/release/bokeh-0.12.14.min.js"></script>
@secsilm
secsilm / unzip-jay.py
Created April 4, 2018 11:51
将存在于多个文件夹中的 zip 文件解压到另一个目录下的独立文件夹
import re
import shutil
import warnings
import zipfile
from pathlib import Path
# zip 文件所在的地址
in_path = Path('D:\BaiduYunDownload\Jay Chou')
# 解压地址
out_path = Path('D:\BaiduYunDownload')
@secsilm
secsilm / standardization-vs-normalization.ipynb
Created April 25, 2018 03:32
Standardization and Normalization in sklearn
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@secsilm
secsilm / tensorflowhub_share_ppt.ipynb
Last active June 4, 2019 08:28
使用 TensorFlow Estimators 和 TensorFlow Hub 对酒店评论进行情绪分类
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@secsilm
secsilm / read_color.py
Last active August 1, 2019 06:02
Python 读取 excel 文件并保留格式
from openpyxl import load_workbook
def read_color(f):
wb = load_workbook(f)
ws = wb.active
for row in ws.iter_rows():
for cell in row:
print(f"cell value={cell.value}, cell color={cell.fill.start_color.index}")