Skip to content

Instantly share code, notes, and snippets.

View illy's full-sized avatar
💭
I may be slow to respond.

illy

💭
I may be slow to respond.
View GitHub Profile
@illy
illy / gist:1670417
Last active December 13, 2022 14:51
This is a manual for helping linguists using Shell programming language to do manual analysis.

The Power of Shell

-- Using shell programming for corpus analysis

S. Li

University of Birmingham

January, 2012

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><HTML>
<HEAD>
<TITLE></TITLE>
</HEAD>
<BODY>
<A name=1></a>&nbsp;<br>&nbsp;<br>
&nbsp;<br>
MERGING&nbsp;CORPUS&nbsp;LINGUISTICS&nbsp;AND&nbsp;<br>
COLLABORATIVE&nbsp;KNOWLEDGE&nbsp;<br>
CONSTRUCTION&nbsp;&nbsp;&nbsp;<br>

Thank Eric Wendelins very much, because he agreed me to translate and republish his posts.

Yes, you may translate any of my articles and post them. Attribution would be appreciated, but not required.

Thanks a lot!

Cheers,
Eric

原文由Eric Wendelins所著:Effective bash shorthand

@illy
illy / grep is a beautiful tool.md
Created February 13, 2012 23:39
Chinese translation of grep is a beautiful tool

#grep is a beautiful tool#

Global Regular Expression Print (Grep)大概是每個命令行用戶的必備工具吧?如同find命令一般,將它與其他命令結合起來使用,將大大提高你的效率。

以下這個簡要的教程將幫助你認識到Grep的簡潔和厲害。如果你在Windows平台上,請下載Crywin;如果你剛開始使用正則式,這兒有一個不錯的正則式教程。

##教程##

假設我們希望在JavaScript文件中搜索重複的標籤。讓我們來看看怎麼利用基本的grep來實現。這一技巧可以幫助你搜索無數的重複項,如下:

Thank Eric Wendelins very much, because he agreed me to translate and republish his posts.

Yes, you may translate any of my articles and post them. Attribution would be appreciated, but not required.

Thanks a lot!

Cheers,
Eric

@illy
illy / Get sed savvy – part 1.md
Created February 20, 2012 08:22
Chinese Translation of Get sed savvy

#Get sed savvy – part 1#

Original post

今天我將繼續介紹命令行工具,主題是sed。(Stream EDitor)是目前介紹過最複雜的工具,它自成一體。把他們放在一篇裡面會太擁擠,所以我會分開介紹。

sed的精華是檢索和替換,所以我們將從這裡開始,然後延伸到其他。

##教程##

@illy
illy / gist:1868428
Created February 20, 2012 08:30 — forked from lucasfais/gist:1207002
Sublime Text 2 - Useful Shortcuts

Sublime Text 2 – Useful Shortcuts (Mac OS X)

General

⌘T go to file
⌘⌃P go to project
⌘R go to methods
⌃G go to line
⌘KB toggle side bar
⌘⇧P command prompt
@illy
illy / moin2gitit.rb
Created February 21, 2012 21:07 — forked from eungju/moin2gitit.rb
Copy pages from MoinMoin to gitit
#!/usr/bin/env ruby
require 'fileutils'
if ARGV.length != 2
puts "Usages: #{$0} <moin pages directory> <gitit wikidata directory>"
exit 1
end
from_dir = File.expand_path ARGV[0]
@illy
illy / UoB PhD project scripts.md
Created February 22, 2012 22:07
This file records the scripts used in the UoB PhD project.

#The Script of Data Preparation for the University of Birmingham PhD Dissertation Corpus Project

Sheng Li University of Birmingham

This file records the scripts used in the UoB PhD corpus project, hoping to provide some clues for the future projects.

All scripts here are basic shell scripts, which are consisted of different GNU tools, such as pdftotext.

Notice, if you want to run pdftotext on your own machine, you might install xpdf first. On OS X, you can use [homebrew]https://github.com/mxcl/homebrew, [macports]http://www.macports.org/ or [gentoo prefix]http://www.gentoo.org/proj/en/gentoo-alt/prefix/bootstrap-macos.xml to install this tool automatically, or compile the install package manually. On Debian platform, you can simply type

@illy
illy / uob phd pdf convertion log original
Created February 23, 2012 03:59
uob phd pdf convertion log
501 Syntax Error: Unknown character collection 'Adobe-Korea1'
498 Syntax Error: Unknown CMap 'KSCms-UHC-H' for character collection 'Adobe-Korea1'
498 Syntax Error: Couldn't find 'KSCms-UHC-H' CMap file for 'Adobe-Korea1' collection
422 Syntax Error: Unknown font tag 'C2_0'
208 Syntax Error: Unknown character collection 'Adobe-Japan1'
180 Syntax Error: Unknown font tag 'C0_0'
33 Syntax Error: Unknown font tag 'C2_2'
26 Syntax Warning: Bad annotation destination
15 Syntax Error: Unknown font tag 'C2_1'
11 Syntax Error: Unknown character collection 'Adobe-GB1'