Created
April 9, 2019 21:25
-
-
Save rishabkdoshi/c5bd3e5a195029658fad90706c0bdcfc to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!DOCTYPE html | |
PUBLIC "-//W3C//DTD HTML 4.01//EN" | |
"http://www.w3.org/TR/html4/strict.dtd"> | |
<html> | |
<head> | |
<meta http-equiv='Content-Type' content='text/html;charset=UTF-8'> | |
<title>Assignment 2. Shell scripting</title> | |
</head> | |
<body> | |
<h1>Assignment 2. Shell scripting</h1> | |
<h2>Laboratory: Spell-checking Hawaiian</h2> | |
<p>Keep a log in the file <samp>lab2.log</samp> of what you do in the | |
lab so that you can reproduce the results later. This should not | |
merely be a transcript of what you typed: it should be more like a | |
true lab notebook, in which you briefly note down what you did and | |
what happened.</p> | |
<p>For this laboratory we assume you're in the standard C or <a | |
href='http://en.wikipedia.org/wiki/POSIX'>POSIX</a> <a | |
href='http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07'>locale</a>. The | |
shell command <a | |
href='http://pubs.opengroup.org/onlinepubs/9699919799/utilities/locale.html'><samp>locale</samp></a> | |
should output <samp>LC_CTYPE="C"</samp> | |
or <samp>LC_CTYPE="POSIX"</samp>. If it doesn't, use the following | |
shell command:</p> | |
<pre><samp><a href='http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#export'>export</a> LC_ALL='C' | |
</samp></pre> | |
<p>and make sure <samp>locale</samp> outputs the right thing afterwards.</p> | |
<p>We also assume the file <samp>words</samp> contains a sorted list of | |
English words. Create such a file by sorting the contents of the file | |
<samp>/usr/share/dict/words</samp> on the SEASnet GNU/Linux hosts, and putting | |
the result into a file named <samp>words</samp> in your working | |
directory. To do that, you can use | |
the <samp><a href='http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html'>sort</a></samp> | |
command.</p> | |
<p>Then, take a text file containing the HTML in this | |
assignment's web page, and run the following commands with that | |
text file being standard input. Describe generally what each command | |
outputs (in particular, how its output differs from that of the | |
previous command), and why.</p> | |
<pre><samp>tr -c 'A-Za-z' '[\n*]' | |
tr -cs 'A-Za-z' '[\n*]' | |
tr -cs 'A-Za-z' '[\n*]' | sort | |
tr -cs 'A-Za-z' '[\n*]' | sort -u | |
tr -cs 'A-Za-z' '[\n*]' | sort -u | comm - words | |
tr -cs 'A-Za-z' '[\n*]' | sort -u | comm -23 - words # ENGLISHCHECKER | |
</samp></pre> | |
<p>Let's take the last command (marked ENGLISHCHECKER) | |
as the crude implementation of an | |
English spelling checker. Suppose we want to change ENGLISHCHECKER to be a | |
spelling checker for | |
<a href='http://en.wikipedia.org/wiki/Hawaiian_language'>Hawaiian</a>, | |
a language whose traditional orthography | |
has only the following | |
letters (or their capitalized equivalents):</p> | |
<pre><samp>p k ' m n w l h a e i o u | |
</samp></pre> | |
<p>In this lab for convenience we use ASCII apostrophe (') to | |
represent the Hawaiian ‘okina (‘); it has no capitalized | |
equivalent.</p> | |
<p>Create in the file <samp>hwords</samp> a simple Hawaiian | |
dictionary containing a copy of all the Hawaiian words in | |
the tables in | |
"<a href='https://www.mauimapp.com/moolelo/hwnwdshw.htm'>Hawaiian to English</a>", an introductory list of words. | |
Use <a href='http://www.gnu.org/software/wget/'>Wget</a> to obtain | |
your copy of that web page. | |
Extract these words systematically from the tables in "Hawaiian to English". | |
Remove all instances of "<samp>?</samp>", "<samp><u></samp>" and | |
"<samp></u></samp>". | |
For each remaining line of the form | |
"<samp><var>A</var><td<var>X</var>><var>W</var></td><var>Z</var></samp>", | |
where <var>A</var> and <var>Z</var> are zero or more spaces, | |
<var>X</var> contains no "<samp>></samp>" characters | |
and <var>W</var> consists of entirely Hawaiian characters or spaces, | |
assume that <var>W</var> contains zero or more nonempty Hawaiian words | |
and extract those words; each word is a maximal sequence of one or | |
more Hawaiian characters. | |
Treat upper case letters as if they were lower case, and treat | |
<samp>`</samp> (ASCII grave accent) as if it were <samp>'</samp> | |
(ASCII apostrophe, which we use to represent ‘okina). | |
For example, the entry "<samp>H<u>a</u>`ule lau</samp>" | |
contains the two words | |
"<samp>ha'ule</samp>" and "<samp>lau</samp>". | |
Sort the resulting list of words, removing any duplicates. | |
Do not attempt to repair any remaining problems by hand; just use the | |
systematic rules mentioned above. Automate the systematic rules | |
into a shell script <samp>buildwords</samp>, which you should copy into your | |
log; it should read the HTML from standard input and write a sorted list of | |
unique words to standard output. For example, we should be able to run this | |
script with a command like this:</p> | |
<pre><samp>cat foo.html bar.html | ./buildwords | less | |
</samp></pre> | |
<p>If the shell script has bugs and | |
doesn't do all the work, your log should record in detail each bug | |
it has.</p> | |
<p>From the ENGLISHCHECKER command, derive a shell command | |
HAWAIIANCHECKER that checks the | |
spelling of Hawaiian rather than English, under the assumption | |
that <samp>hwords</samp> is a Hawaiian dictionary and that every | |
maximal nonempty sequence of ASCII letters or apostrophe is intended | |
to be a Hawaiian word and needs its spelling checked. Input that | |
is upper case should be lower-cased before it is checked against the | |
dictionary, since the dictionary is in all lower case.</p> | |
<p>Check your work by running your Hawaiian spelling checker on | |
<a href='assign2.html'>this very web page</a> (which you should also | |
fetch with Wget), and on the Hawaiian dictionary <samp>hwords</samp> | |
itself. Count the number of distinct misspelled words on this very web | |
page, using both ENGLISHCHECKER and HAWAIIANCHECKER. Count the number | |
of distinct words on this very web page that ENGLISHCHECKER reports as | |
misspelled but HAWAIIANCHECKER does not, and give two examples of | |
these words. Similarly, count the number of distinct words (and give | |
two examples) that HAWAIIANCHECKER reports as misspelled on this very | |
web page but ENGLISHCHECKER does not.</p> | |
<h2>Homework: Find poorly-named files</h2> | |
<p><em>Warning: it will be difficult to do this homework without | |
attending the lab session for hints.</em></p> | |
<p>You're working in a project that has lots of files and | |
is intended to be portable to a wide variety of systems, | |
some of which have fairly-restrictive rules for file names. | |
Your project has established the following portability guidelines:</p> | |
<ol> | |
<li>A file name component (i.e., the part of file names | |
other than "<samp>/</samp>") must contain only ASCII | |
letters, "<samp>.</samp>", "<samp>-</samp>", and "<samp>_</samp>".</li> | |
<li>A file name component cannot start with "<samp>-</samp>".</li> | |
<li>Except for "<samp>.</samp>" and "<samp>..</samp>", | |
a file name component cannot start with "<samp>.</samp>".</li> | |
<li>A file name component must not contain more than 14 characters.</li> | |
<li>No two file names in your project can differ only in case. | |
For example, if your project has a file name with component | |
"<samp>St._Andrews</samp>" it cannot also have a file | |
name with component "<samp>st._anDrEWS</samp>" in the same directory.</li> | |
</ol> | |
<p>Your boss has given you the job of looking for projects that do not | |
follow these guidelines. Write a shell script <samp>poornames</samp> | |
that accepts a project's directory name <var>D</var> as an operand | |
and looks for violations of one or more of the guidelines the last | |
file name component of <var>D</var>, or in files under <var>D</var>; | |
the files might be immediate children of <var>D</var>, or might be | |
under some subdirectory (recursively).</p> | |
<p>If given no operands, <samp>poornames</samp> should act as | |
if <var>D</var> is "<samp>.</samp>", the current working directory. | |
If given two or more operands, or a single operand that begins | |
with "<samp>-</samp>" or is not the name of a directory, | |
<samp>poornames</samp> should diagnose the problem on stderr and exit | |
with a failure status. | |
If <samp>poornames</samp> encounters an error when traversing a directory | |
(e.g., lack of permissions to read a subdirectory) it should diagnose | |
the problem on stderr, but it need not exit with a failure status.</p> | |
<p>The <samp>poornames</samp> script should output a | |
line containing each full file name if and only if the file name's | |
last component does not follow the guidelines. | |
For example, the command "<samp>poornames /usr/bin</samp>" | |
should output a line "<samp>/usr/bin/[</samp>" because | |
there is a directory entry with that name, and | |
its last component "<samp>[</samp>" contains a character | |
that falls outside the guidelines. | |
The <samp>poornames</samp> script should not output duplicate lines. | |
The order of output lines does not matter.</p> | |
<p>The <samp>poornames</samp> script should not follow symbolic links | |
when recursively looking for poorly-named files. However, it should | |
check the symbolic links' names, just as it checks the names of | |
all other files.</p> | |
<p>The <samp>poornames</samp> script should work with file names whose | |
components contain special characters like spaces, "*", and leading | |
"<samp>-</samp>" (except that <var>D</var> itself cannot begin with | |
"<samp>-</samp>"). However, you need not worry about file names | |
containing newlines. | |
</p> | |
<p>Your script should be runnable as an ordinary user, and should be | |
portable to any system that | |
supports <a href='http://pubs.opengroup.org/onlinepubs/9699919799/utilities/toc.html'>standard | |
POSIX shell and utilities</a>; please see | |
its <a href='http://pubs.opengroup.org/onlinepubs/9699919799/idx/utilities.html'>list | |
of utilities</a> for the commands that your script may use. (Hint: see | |
the <samp>find</samp> and <samp>sort</samp> | |
utilities.)</p> | |
<p>When testing your script, it is a good idea to do the testing in a | |
subdirectory devoted just to testing. This will reduce the likelihood | |
of messing up your home directory or main development directory if | |
your script goes haywire.</p> | |
<p>Once your script passes your initial tests, try it out on | |
three test cases <samp>/usr/bin</samp>, <samp>/usr/lib</samp>, | |
and <samp>/usr/share/man</samp> on the SEASnet GNU/Linux | |
host <samp>lnxsrv07</samp>. Save the outputs of these runs | |
into the six files <samp>bin.1</samp>, <samp>bin.2</samp>, | |
<samp>lib.1</samp>, <samp>lib.2</samp>, <samp>man.1</samp>, | |
and <samp>man.2</samp>, where the <samp>.1</samp> files | |
record standard output and the <samp>.2</samp> files | |
record standard error.</p> | |
<h2>Submit</h2> | |
<p>Submit the following files.</p> | |
<ul> | |
<li>The script <samp>buildwords</samp> as described in the lab.</li> | |
<li>The file <samp>lab2.log</samp> as described in the lab.</li> | |
<li>The <samp>poornames</samp> script described in the homework.</li> | |
<li>The six test output files described in the homework.</li> | |
</ul> | |
<p><em>Warning: You should edit your files with Emacs or with other | |
editors that end lines with plain LF (i.e., newline | |
or <samp>'\n'</samp>).</em> Do not use Notepad or similar tools | |
that may convert line endings | |
to <a href='https://en.wikipedia.org/wiki/CRLF'>CRLF</a> form.</p> | |
<p>All files should be ASCII text files, with no | |
carriage returns, and with no more than 80 columns per line (except possibly | |
for the test output files). The shell | |
command:</p> | |
<pre><samp>awk '/\r/ || 80 < length' buildwords lab2.log poornames | |
</samp></pre> | |
<p>should output nothing.</p> | |
<hr> | |
<address> | |
© 2005, 2007, 2008, 2010, 2013, 2019 Paul Eggert. | |
See <a href='../copyright.html'>copying rules</a>.<br> | |
$Id: assign2.html,v 1.32 2019/03/27 22:57:58 eggert Exp $ | |
</address> | |
</body> | |
</html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment