Skip to content

Instantly share code, notes, and snippets.

View wenhuizhang's full-sized avatar
🎯
Focusing

Wenhui Zhang wenhuizhang

🎯
Focusing
View GitHub Profile
@wenhuizhang
wenhuizhang / encrypt.sh
Created September 3, 2023 01:03 — forked from krzys-h/encrypt.sh
Encrypt existing partitions with LUKS2 on Ubuntu 20.04
#!/bin/bash
# Encrypt an existing partition with LUKS2 on Ubuntu 20.04 LTS
# DISCLAIMER: USE AT YOUR OWN RISK AND MAKE BACKUPS
# Made for my personal use and has almost NO error checking!!
# Based on instructions from:
# https://wiki.archlinux.org/index.php/dm-crypt/Device_encryption#Encrypt_an_existing_unencrypted_filesystem
DISK="$1"
@wenhuizhang
wenhuizhang / ByzantineFailure.md
Last active September 25, 2015 06:33
Byzantine Failure
@wenhuizhang
wenhuizhang / distributed_systems_readings.md
Last active May 2, 2024 17:50
distributed systems readings

#Distributed System Course List

##Systems

  • Cornell CS 614 - Advanced Course in Computer Systems - Ken Birman teaches this course. The readings cover more distributed systems research than is typical (which I am in favour of!). In fact, there's barely anything on traditional internal OS topics like filesystems or memory management. There's some worthwhile commentary at the bottom of the page.

  • Princeton COS 518 - Advanced Operating Systems - short and snappy reading list of two papers per topic, covering some interesting stuff like buffering inside the operating system, and L4.

@wenhuizhang
wenhuizhang / OpenSource_ML.md
Last active September 25, 2015 02:45
Open Source for Machine Learning

#Open Source machine learning

Understanding language is not easy, even for us humans, but computers are slowly getting better at it. 50 years ago, the psychiatrist chat bot Elyza could successfully initiate a therapy session but very soon you understood that she was responding using simple pattern analysis. Now, the IBM’s supercomputer Watson defeats human champions in a quiz show live on TV. The software pieces required to understand language, like the ones used by Watson, are complex. But believe it or not, many of these pieces are actually available for free as open-source. This post summarizes how open-source software can help you analyze language data using this flow chart as a guideline. http://entopix.com/so-you-need-to-understand-language-data-open-source-nlp-software-can-help.html

If your language data is already available as text, it is most likely to be stored in files. Apache libraries like POI and PDFBox extract text from the most common formats. Apache Tika is a toolkit that uses such lib

@wenhuizhang
wenhuizhang / web_crawler.md
Last active April 13, 2020 18:43
web crawler
Name Language Platform
Heritrix Java Linux
Nutch Java Cross-platform
Scrapy Python Cross-platform
DataparkSearch C++ Cross-platform
GNU Wget C Linux
GRUB C#, C, Python, Perl Cross-platform
ht://Dig C++ Unix
HTTrack C/C++ Cross-platform
@wenhuizhang
wenhuizhang / Find_Seat_DP
Last active August 29, 2015 14:14
DataIncubator
/*
Q1: There is a subway car with N adjacent seats in a row.
People walk into the subway and choose a random available seat (drawn uniformly).
The only constraint on seat availability is that they do not like to sit next to one another
so there is always (at least) one empty seat between any two individuals.
This process continues until all available seats are taken.
What is the mean and standard deviation of the fraction of occupied seats
(when the process is complete) for different values of N?
Give the answer with 10 digits of significance.
*/
//
// Prefix header
//
// The contents of this file are implicitly included at the beginning of every source file.
//
#import <Availability.h>
#ifndef __IPHONE_5_0
#warning "This project uses features only available in iOS SDK 5.0 and later."
@wenhuizhang
wenhuizhang / R_Venture Capital Analysis (US since 1998)
Last active August 29, 2015 13:59
Venture Capital Analysis (US since 1998)
Title
We first obtained our data from a web site using
data <- read.csv("http://www.capitalhacks.org/wp-content/uploads/2014/03/TechCrunchcontinentalUSA.csv",
stringsAsFactors = F)
Preprocessing: removed 1328 jad-tech-consulting from the data set since 23-Sep-93 was substanially earlier than any other data sets, and thus presumed to be an outlier.
We then added a column for the number of quarters since jan 1 1999 using
@wenhuizhang
wenhuizhang / EdgeTable.h
Created April 2, 2014 17:47
Lab4_Texture Mapping (Base as Phong)
#include "stdafx.h"
#include <vector>
#include <iostream>
#ifndef Edge_H_INCLUDED
#define Edge_H_INCLUDED
using namespace std;
@wenhuizhang
wenhuizhang / Edgetable.h
Created March 12, 2014 16:24
Lab3_Graphics_Phong Shading
#include "stdafx.h"
#include <vector>
#include <iostream>
#ifndef Edge_H_INCLUDED
#define Edge_H_INCLUDED
using namespace std;
class Edge{