Skip to content

Instantly share code, notes, and snippets.

View morris821028's full-sized avatar
💭
I may be slow to respond.

Shiang-Yun Yang morris821028

💭
I may be slow to respond.
View GitHub Profile
@morris821028
morris821028 / gist:e35243749f919c6acee5
Created December 12, 2014 12:41
database hw 1 practice
1. 找出 drinker 名稱為 alex 的人以及他常去的酒吧
SELECT drinker, bar FROM `frequents` WHERE drinker = 'alex'
2. 列出 每個 drinker 的名稱 及 有幾種喜歡喝的酒
SELECT drinker, COUNT(*) FROM `likes` INNER JOIN `drinkers` ON likes.drinker = drinkers.name GROUP BY drinker
3. 列出 所有賣的酒的平均價錢 < 4 的 bar name 及平均價錢
@morris821028
morris821028 / gist:e4856a97f1a054b3fb5b
Created December 12, 2014 12:42
database hw 1 scheme data
bars
name city owner
joe bar champaign joe
green st bar champaign sally
purple bar urbana paul
drunk urbana bob
bar bar Decatur zoe
beers
@morris821028
morris821028 / gist:3b7adc3ffdbdfbc51563
Created January 11, 2015 23:24
Github bash.bat in Windows
@%~d1
@cd "%~1" > NUL
@C:\Windows\SysWOW64\cmd.exe /c ""C:\Program Files (x86)\Git\bin\sh.exe" --login -i"
@morris821028
morris821028 / gist:f15d842eafc4be6616e4
Created January 12, 2015 00:55
txt Windows To Unix, replace '\r\n' to '\n'
$ awk '{ sub("\r$", ""); print }' out.txt > output.txt
@morris821028
morris821028 / gist:bd993ece62657892e2f3
Created February 25, 2015 09:04
UVa Problem & Algorithm discuss
// 20150101 - 20150225 Facebook Timeline.
[UVa]1679 - Easy Geometry 開學第一題!
在凸多邊形內部找一最大空白矩形。多邊形頂點數最多 10 萬個。
三分內嵌三分再內嵌二分,幾何算法從函數觀點下手的有趣題目。
〔UVa〕12415 - Digit Patterns
讀入一個 regex,一個主字串 s,請問有不同的 i 滿足 s 的子字串 s[j...i] 符合 regex,套用 NFA 轉換成 DFA,可惜的是壓縮後狀態數還是太多,即使通過劉汝佳給的 small gift testdata,而且裡面倒數第二筆測資輸出有誤。當初學的時候就有這個疑問,果然狀態總數的增長非常大。
所謂的動態 NFA 轉換 DFA 指得到底是什麼?又被劉汝佳坑了一題 QQQQQQQQ
// You can edit this code!
// Click here and start typing.
// http://hdechallenge.appspot.com/challenge?email=b7615199%40hotmail.com&key=9160760b19bc9fe7dd746d2d53e23e86640f1d71
package main
import "fmt"
func sum(n int) int {
if n == 0 {
return 0
@morris821028
morris821028 / clickerHeroSave.txt
Created March 15, 2015 15:16
Game clickerHero
e5yhJOj8cDmJV0h4d1G9lfvDbklwRtpabTWVVfzhdJGVF4t4c2CZIv6FMDT9QLyeN1j6MUwhOcDycQw3M6zNMBzBNEiQwui8duGt9d0JY9WrxyHHbw2lxmkLIgjyoty4Lsjecg0JNljbINwIMYDlIyynM5zCEawsMXDuk93HZvSTsKynMXi6wSi4dcXhBnn0cSm3Fzk4ZCXLM5iCO1nDsLiDMoiCIv6ndBHDJP1ZZ8S4wniOMCynIP6Cd4HTJU1QZjSLwDiFNQCpIV6AdJHNJ11bZcSjwBi1NNS1IG6edRH4JY1pZySBwoiSN9iNIv64d2HpJE1qZgSSwuiENyyxIg6odcHlJW1kZNSVwSiVOWC7IZ6JdaHjJL1cZGSawUi0OiSzIX6HdtHkJg19ZBSSw0iKMNTEAFi6OhnKRmyvdLWFUZs7IQj2ESxZI5jdpO0ZcUn2VmlyLUCBIwxqMFisIS6Od1HWJg10Z4SYwUirMbTwMwi9OXnQR5yKdRWTUgsvIQjjEV04IUjPpC0wcbn3VUlfLoCFIoxcNXSEIN6qdAHmJe1aZTSpwoiaMXTjYwirOBnERTyQdIWCUnsbIfjPEJ32IrjfpD0mcznHVDlDLhCRIxxPO1C4Il6Pd8HqJK1pZHSEwIiLMuTIkPiGOUnaR3y7dXWDUPsRIsjmIGw6Iqj1ph0gcHnhVClyLRCaIdygMnSaIZ6od5HZJ71oZ9ScwvidMvjbIxidO9nlRsy1dLWIUCsTI6joIlzYIAjYpa07cknnVXl5LiCwISyJNCCUIG6PdIHpJk1lZJSNwyifMmjyUkiQONnGRTybdYWlU5sYIGjFIj21INjjpz0UcOnTVIlpLAC1Iey1NVyPIo65doH1Jz11ZASIwoi6MUjVgRi3OTnNRvyUdiWTUPstIVjyI65pIQj9pZ0lcjniVClSLQCZIczeMBCaIy6vdqHlJ51UZdSlwAiNMrz2EAiLOmn6ROyNdGWKUysoINjDMZypIPjqpP0XcFnIVAlpLqCpIQzW
@morris821028
morris821028 / gist:0f8928ccf42b470c6a14
Created April 2, 2015 00:04
BigData BlockRecordReader Notes

格式

解析每一個網站有很多方法,但是由於每個網站的 html 雜燴在同一個 file 中,導致解析問題的存在。

hadoop 內建 readline() 支持跨檔讀取,同時也支持跨數個檔案讀取。 這一點可以從 hadoop MRunit 中測試得到。

回過頭來探討

案例一

@morris821028
morris821028 / gist:51d778ee8e3589e70e3e
Created April 20, 2015 13:57
Clean Github Commit

將原本的檔案複製到新的 new_dir,或者直接刪除 .git。

cd new_dir
git init

使用 forced updated