-- Using shell programming for corpus analysis
S. Li
University of Birmingham
January, 2012
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><HTML> | |
<HEAD> | |
<TITLE></TITLE> | |
</HEAD> | |
<BODY> | |
<A name=1></a> <br> <br> | |
<br> | |
MERGING CORPUS LINGUISTICS AND <br> | |
COLLABORATIVE KNOWLEDGE <br> | |
CONSTRUCTION <br> |
Thank Eric Wendelins very much, because he agreed me to translate and republish his posts.
Yes, you may translate any of my articles and post them. Attribution would be appreciated, but not required.
Thanks a lot!
Cheers,
Eric
原文由Eric Wendelins所著:Effective bash shorthand
Thank Eric Wendelins very much, because he agreed me to translate and republish his posts.
Yes, you may translate any of my articles and post them. Attribution would be appreciated, but not required.
Thanks a lot!
Cheers,
Eric
#Get sed savvy – part 1#
今天我將繼續介紹命令行工具,主題是sed。(Stream EDitor)是目前介紹過最複雜的工具,它自成一體。把他們放在一篇裡面會太擁擠,所以我會分開介紹。
sed的精華是檢索和替換,所以我們將從這裡開始,然後延伸到其他。
##教程##
#!/usr/bin/env ruby | |
require 'fileutils' | |
if ARGV.length != 2 | |
puts "Usages: #{$0} <moin pages directory> <gitit wikidata directory>" | |
exit 1 | |
end | |
from_dir = File.expand_path ARGV[0] |
#The Script of Data Preparation for the University of Birmingham PhD Dissertation Corpus Project
This file records the scripts used in the UoB PhD corpus project, hoping to provide some clues for the future projects.
All scripts here are basic shell scripts, which are consisted of different GNU tools, such as pdftotext.
Notice, if you want to run pdftotext on your own machine, you might install xpdf first. On OS X, you can use [homebrew]https://github.com/mxcl/homebrew, [macports]http://www.macports.org/ or [gentoo prefix]http://www.gentoo.org/proj/en/gentoo-alt/prefix/bootstrap-macos.xml to install this tool automatically, or compile the install package manually. On Debian platform, you can simply type
501 Syntax Error: Unknown character collection 'Adobe-Korea1' | |
498 Syntax Error: Unknown CMap 'KSCms-UHC-H' for character collection 'Adobe-Korea1' | |
498 Syntax Error: Couldn't find 'KSCms-UHC-H' CMap file for 'Adobe-Korea1' collection | |
422 Syntax Error: Unknown font tag 'C2_0' | |
208 Syntax Error: Unknown character collection 'Adobe-Japan1' | |
180 Syntax Error: Unknown font tag 'C0_0' | |
33 Syntax Error: Unknown font tag 'C2_2' | |
26 Syntax Warning: Bad annotation destination | |
15 Syntax Error: Unknown font tag 'C2_1' | |
11 Syntax Error: Unknown character collection 'Adobe-GB1' |