Skip to content

Instantly share code, notes, and snippets.

View MacHu-GWU's full-sized avatar

Sanhe Hu MacHu-GWU

  • Amazon Web Service Data Lab
  • Maryland, USA
View GitHub Profile
@MacHu-GWU
MacHu-GWU / Sublime-Shortcuts-and-Keymap.rst
Created August 16, 2017 18:01
Sublime Shortcuts and Keymap

Sublime Shortcuts and Keymap

General

General
@MacHu-GWU
MacHu-GWU / Keymap-Template.rst
Last active August 16, 2017 20:18
A template for keymap documentation for TextEditor/IDLE

General

General
Description
@MacHu-GWU
MacHu-GWU / upsert.py
Created June 9, 2017 18:21 — forked from timtadh/upsert.py
How to compile an INSERT ... ON DUPLICATE KEY UPDATE with SQL Alchemy with support for a bulk insert.
#!/usr/bin/env python
# Copyright (c) 2012, Tim Henderson
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# - Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.

Table of Contents

To do list and time line for deliver Innovizio's Runtime Predict Model as a Product

Evaluate Innovizio's work

  1. Reproduce the result they claims to be.
  2. Code, Data (Could be in database/S3/locally) are properly delivered.
@MacHu-GWU
MacHu-GWU / inspect_mate.py
Created May 9, 2017 20:34
``inspect_mate`` provides more methods to get information about class attribute than the standard library ``inspect``.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
``inspect_mate`` provides more methods to get information about class attribute
than the standard library ``inspect``.
This module is Python2/3 compatible, tested under Py2.7, 3.3, 3.4, 3.5, 3.6.
Includes tester function to check:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
2017-01-26
You have to under stand these:
basic concept:

image

image

image

image

Define the table

Because each document may have multiple category, topic and entity. If we only use the highest confident label, then the complexity can be greatly reduced. After that, document vs category, topic and entity would be many-to-one relationship.

@MacHu-GWU
MacHu-GWU / mongocrawl.py
Last active October 3, 2016 18:27
A MongoDB backed, width first web crawler framework.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
**中文文档**
mongocrawl是一个使用MongoDB作为数据持久层, 广度优先的定向爬虫框架。非常适合于
数据资源所在的Url是树状层级结构的爬虫项目。