Skip to content

Instantly share code, notes, and snippets.

@OhMeadhbh
Last active November 16, 2019 04:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save OhMeadhbh/e0457b6eb09abdaa144699db56ebbeb7 to your computer and use it in GitHub Desktop.
Save OhMeadhbh/e0457b6eb09abdaa144699db56ebbeb7 to your computer and use it in GitHub Desktop.
In Defense of Two Spaces After a Period
I spend more time than is healthy worrying about software documentation. Don't get me wrong, I enjoy coding. I enjoy the process of investigating problems, possibly breaking them down into sub-problems and then searching my toolbox of conceptual solutions to find the one that's *just right*. But communicating a developer's intent clearly is an important part of constructing a solution. If you work in a team with other software developers, communicating intent is of vital importance. Even if you work in isolation, documenting your intent is important so that when you eventually come back to your code several years later, you have a chance of understanding what you were trying to do.
And it was while I was in deep thought about documenting software that I thought that maybe, just maybe, there's justification for two spaces after a period in the modern world.
I frequently use EMACS to edit files; sometimes I use VI, I'm not a zealot. So I like to see text files as just that: text files. Text can be underwhelming visually sometimes, so I also like to use MarkDown, ASCIIDoc or Emacs Org Mode. Editors exist to make these formats easy to edit; but I still like to have access to the original text. It is always useful to be able to diff two text files, and many visual editors make this harder than it has to be, if it is at all possible.
But there is a problem with file differs: they often misunderstand the differences in text you're interested in. Many git users will have seen diffs where complete paragraphs have been removed, and then the complete paragraph has been added with small changes. Brandon Rhodes gives a good example on his page regarding Semantic Linefeeds. Go look at it now, it's at: https://rhodesmill.org/brandon/2012/one-sentence-per-line/ .
The problem I have with Brian Kernighan's advice of "start each sentence on a new line," is that it looks ugly. But I can't deny it's good advice; just look at the way the differ mangled the update in Rhodes' example.
My simple solution is to keep all my paragraphs on a single line, but use a program to break them apart before checking them into source control. But you may be asking, what does this have to do with "two spaces after a period?" In English we use periods for many things other than signifiying the end of a sentence. For example, we use periods to denote abbreviations. If I wanted to talk about science fiction writer A. E. van Vogt, who pretty much only used his initials, it would be annoying if my unlining script put the middle 'E' on it's own line. Not impossible to read, of course, but annoying.
So my solution was to assume sentences ended with a period and two spaces instead of a period and one space. My line breaking script will get confused if I ever only use one space after a period, but it won't be the end of the world. And when i say "line breaking script," I really mean "invocation of sed." The `sed` command is a handy utility that will convert a period and two spaces into a period and a line feed with this simple command:
sed 's/\.\ \ /\.\n/g' input_file.txt > output_file.xtx
Going the other way requires only that you reverse the regular expressions in the sed substitution specification:
sed 's/\.\n/\.\ \ /g' input_file.xtx > output_file.txt
Now there is a (mostly) automated way to edit text files where paragraphs look "correct" to me, but we split out sentences on individual lines so we have something akin to semantic linefeeds. If you're reading this in a github gist, check out the revision history to see if the diffs look more or less comprehensible to you. Also check out the "xtx" file below, whose content should be the same as this file, but with each sentence broken out into its own line.
And after uploading this as a gist on github, I noticed the default diff algorithm deleted the two lines of the previous paragraph and then added three lines. This isn't exactly what I was expecting, but it did remind me that I was once interested in diff algorithms. Having not used that information in a couple decades, I have forgotten most of it but was happy to find git provides quite a few options. Look at the '--diff-algorithm' parameter in the git diff command; documentation can be found here at https://git-scm.com/docs/git-diff . For the advanced student, googling "patience diff" and "myers diff" will reveal interesting discussions if you don't already have a textbook that covers it.
Finally, I'm adding a "ztz" file to the gist which is the same content, but with newlines at the end of each line. I suspect it will be easier for humans to read, but the diffs will less easy to follow.
In Defense of Two Spaces After a Period
I spend more time than is healthy worrying about software documentation.
Don't get me wrong, I enjoy coding.
I enjoy the process of investigating problems, possibly breaking them down into sub-problems and then searching my toolbox of conceptual solutions to find the one that's *just right*.
But communicating a developer's intent clearly is an important part of constructing a solution.
If you work in a team with other software developers, communicating intent is of vital importance.
Even if you work in isolation, documenting your intent is important so that when you eventually come back to your code several years later, you have a chance of understanding what you were trying to do.
And it was while I was in deep thought about documenting software that I thought that maybe, just maybe, there's justification for two spaces after a period in the modern world.
I frequently use EMACS to edit files; sometimes I use VI, I'm not a zealot.
So I like to see text files as just that: text files.
Text can be underwhelming visually sometimes, so I also like to use MarkDown, ASCIIDoc or Emacs Org Mode.
Editors exist to make these formats easy to edit; but I still like to have access to the original text.
It is always useful to be able to diff two text files, and many visual editors make this harder than it has to be, if it is at all possible.
But there is a problem with file differs: they often misunderstand the differences in text you're interested in.
Many git users will have seen diffs where complete paragraphs have been removed, and then the complete paragraph has been added with small changes.
Brandon Rhodes gives a good example on his page regarding Semantic Linefeeds.
Go look at it now, it's at: https://rhodesmill.org/brandon/2012/one-sentence-per-line/ .
The problem I have with Brian Kernighan's advice of "start each sentence on a new line," is that it looks ugly.
But I can't deny it's good advice; just look at the way the differ mangled the update in Rhodes' example.
My simple solution is to keep all my paragraphs on a single line, but use a program to break them apart before checking them into source control.
But you may be asking, what does this have to do with "two spaces after a period?" In English we use periods for many things other than signifiying the end of a sentence.
For example, we use periods to denote abbreviations.
If I wanted to talk about science fiction writer A. E. van Vogt, who pretty much only used his initials, it would be annoying if my unlining script put the middle 'E' on it's own line.
Not impossible to read, of course, but annoying.
So my solution was to assume sentences ended with a period and two spaces instead of a period and one space.
My line breaking script will get confused if I ever only use one space after a period, but it won't be the end of the world. And when i say "line breaking script," I really mean "invocation of sed." The `sed` command is a handy utility that will convert a period and two spaces into a period and a line feed with this simple command:
sed 's/\.\ \ /\.\n/g' input_file.txt > output_file.xtx
Going the other way requires only that you reverse the reglar expressions in the sed substitution specification:
sed 's/\.\n/\.\ \ /g' input_file.xtx > output_file.txt
Now there is a (mostly) automated way to edit text files where paragraphs look "correct" to me, but we split out sentences on individual lines so we have something akin to semantic linefeeds.
If you're reading this in a github gist, check out the revision history to see if the diffs look more or less comprehensible to you.
Also check out the "bis" file below, whose content should be the same as this file, but with each sentence broken out into its own line.
And after uploading this as a gist on github, I noticed the default diff algorithm deleted the two lines of the previous paragraph and then added three lines.
This isn't exactly what I was expecting, but it did remind me that I was once interested in diff algorithms.
Having not used that information in a couple decades, I have forgotten most of it but was happy to find git provides quite a few options.
Look at the '--diff-algorithm' parameter in the git diff command; documentation can be found here at https://git-scm.com/docs/git-diff .
For the advanced student, googling "patience diff" and "myers diff" will reveal interesting discussions if you don't already have a textbook that covers it.
Finally, I'm adding a "ztz" file to the gist which is the same content, but with newlines at the end of each line.
I suspect it will be easier for humans to read, but the diffs will less easy to follow.
In Defense of Two Spaces After a Period
I spend more time than is healthy worrying about software documentation. Don't get me wrong, I enjoy
coding. I enjoy the process of investigating problems, possibly breaking them down into sub-problems and
then searching my toolbox of conceptual solutions to find the one that's *just right*. But communicating
a developer's intent clearly is an important part of constructing a solution. If you work in a team with
other software developers, communicating intent is of vital importance. Even if you work in isolation,
documenting your intent is important so that when you eventually come back to your code several years
later, you have a chance of understanding what you were trying to do.
And it was while I was in deep thought about documenting software that I thought that maybe, just maybe,
there's justification for two spaces after a period in the modern world.
I frequently use EMACS to edit files; sometimes I use VI, I'm not a zealot. So I like to see text files
as just that: text files. Text can be underwhelming visually sometimes, so I also like to use MarkDown,
ASCIIDoc or Emacs Org Mode. Editors exist to make these formats easy to edit; but I still like to have
access to the original text. It is always useful to be able to diff two text files, and many visual
editors make this harder than it has to be, if it is at all possible.
But there is a problem with file differs: they often misunderstand the differences in text you're
interested in. Many git users will have seen diffs where complete paragraphs have been removed, and then
the complete paragraph has been added with small changes. Brandon Rhodes gives a good example on his page
regarding Semantic Linefeeds. Go look at it now, it's at:
https://rhodesmill.org/brandon/2012/one-sentence-per-line/ .
The problem I have with Brian Kernighan's advice of "start each sentence on a new line," is that it looks
ugly. But I can't deny it's good advice; just look at the way the differ mangled the update in Rhodes'
example.
My simple solution is to keep all my paragraphs on a single line, but use a program to break them apart
before checking them into source control. But you may be asking, what does this have to do with "two
spaces after a period?" In English we use periods for many things other than signifiying the end of a
sentence. For example, we use periods to denote abbreviations. If I wanted to talk about science fiction
writer A. E. van Vogt, who pretty much only used his initials, it would be annoying if my unlining script
put the middle 'E' on it's own line. Not impossible to read, of course, but annoying.
So my solution was to assume sentences ended with a period and two spaces instead of a period and one
space. My line breaking script will get confused if I ever only use one space after a period, but it
won't be the end of the world. And when i say "line breaking script," I really mean "invocation of sed."
The `sed` command is a handy utility that will convert a period and two spaces into a period and a line
feed with this simple command:
sed 's/\.\ \ /\.\n/g' input_file.txt > output_file.xtx
Going the other way requires only that you reverse the regular expressions in the sed substitution
specification:
sed 's/\.\n/\.\ \ /g' input_file.xtx > output_file.txt
Now there is a (mostly) automated way to edit text files where paragraphs look "correct" to me, but we
split out sentences on individual lines so we have something akin to semantic linefeeds. If you're
reading this in a github gist, check out the revision history to see if the diffs look more or less
comprehensible to you. Also check out the "xtx" file below, whose content should be the same as this
file, but with each sentence broken out into its own line.
And after uploading this as a gist on github, I noticed the default diff algorithm deleted the two lines
of the previous paragraph and then added three lines. This isn't exactly what I was expecting, but it did
remind me that I was once interested in diff algorithms. Having not used that information in a couple
decades, I have forgotten most of it but was happy to find git provides quite a few options. Look at the
'--diff-algorithm' parameter in the git diff command; documentation can be found here at
https://git-scm.com/docs/git-diff . For the advanced student, googling "patience diff" and "myers diff"
will reveal interesting discussions if you don't already have a textbook that covers it.
Finally, I'm adding a "ztz" file to the gist which is the same content, but with newlines at the end of
each line. I suspect it will be easier for humans to read, but the diffs will less easy to follow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment