Skip to content

Instantly share code, notes, and snippets.

@cslarsen
Created March 2, 2011 19:56
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cslarsen/851611 to your computer and use it in GitHub Desktop.
Save cslarsen/851611 to your computer and use it in GitHub Desktop.
Linear scan for palindromes
/*
* Find the longest palindrome in the text.
*
* This is Greplin's first challenge, and I originally solved it in Python.
*
* This algorithm is linear on the average input, but has a quadratic
* worst case running time. There exists an even better algorithm, but
* this should do.
*
* There might also be a small error below, but you get the general idea.
*
* Christian Stigen Larsen
*/
#include <stdio.h>
static const char text[] =
"Fourscoreandsevenyearsagoourfaathersbroughtforthonthiscontainentanewnati"
"onconceivedinzLibertyanddedicatedtothepropositionthatallmenarecreatedequ"
"alNowweareengagedinagreahtcivilwartestingwhetherthatnaptionoranynartions"
"oconceivedandsodedicatedcanlongendureWeareqmetonagreatbattlefiemldoftzha"
"twarWehavecometodedicpateaportionofthatfieldasafinalrestingplaceforthose"
"whoheregavetheirlivesthatthatnationmightliveItisaltogetherfangandpropert"
"hatweshoulddothisButinalargersensewecannotdedicatewecannotconsecrateweca"
"nnothallowthisgroundThebravelmenlivinganddeadwhostruggledherehaveconsecr"
"ateditfaraboveourpoorponwertoaddordetractTgheworldadswfilllittlenotlenor"
"longrememberwhatwesayherebutitcanneverforgetwhattheydidhereItisforusthel"
"ivingrathertobededicatedheretotheulnfinishedworkwhichtheywhofoughthereha"
"vethusfarsonoblyadvancedItisratherforustobeherededicatedtothegreattdafsk"
"remainingbeforeusthatfromthesehonoreddeadwetakeincreaseddevotiontothatca"
"useforwhichtheygavethelastpfullmeasureofdevotionthatweherehighlyresolvet"
"hatthesedeadshallnothavediedinvainthatthisnationunsderGodshallhaveanewbi"
"rthoffreedomandthatgovernmentofthepeoplebythepeopleforthepeopleshallnotp"
"erishfromtheearth";
void print_best(const char* l, const char* r)
{
static int best = 2;
if ( r-l > best )
printf("%.*s\n", (best=r-l), l);
}
int main()
{
const char *l, *r;
for ( const char *p = text + 1; *p; ++p ) {
l = p; r = p - 1;
while ( *--l == *++r );
print_best(l+1, r);
l = r = p;
while ( *--l == *++r );
print_best(l+1, r);
}
return 0;
}
@cslarsen
Copy link
Author

cslarsen commented Mar 3, 2011

The above code is quite fast as well. It runs through the complete works of Shakespeare in about 0.03 seconds of wall clock time, or a processing rate of 128 Mb/second, and grows linearly with the input. It has a worst case running time which is quadratic, though; if you feed it a string of same characters, for instance.

To find the palindromes in Shakespeare's collected works, I first modified the code to load the text from disk. I changed print_best to print all palindromes equal to or longer than the current best, so we get a longer list of palindromes. I also had to prepare Shakespeare's collected works into a format suitable for processing: First I removed
the name of which character is speaking (done by sed), then I converted all text to lowercase and deleted all non-alphanumeric characters.

Since Shakespeare wrote his plays more than 300 years before copyright law was invented, you can download and use it freely. I got mine
off a site with all the stuff in different directories. Here's what I did to prepare them, all in one jolly line of unix goodness:

find shakespeare/ -type f | xargs sed 's/^[A-Za-z]*//g' | tr A-Z a-z | tr -dC a-z > shakespeare.txt 

I compiled my code and ran it on the file:

$ llvm-g++ -O4 -flto pali.cpp -o pali
$ cat shakespeare.txt | time ./pali
ll
ll
ll
ama
lonanol
tomymot
withtiw
iwerewi
erewere
tarorat
rownwor
sieveis
tomymot
imadami
madammadam
ereherehere
reherehereher
hereherehereh
illitmadamtilli
madammadammadam
madammadammadam

real    0m0.026s
user    0m0.022s
sys 0m0.004s

The longest palindromes appear in the lines:

GLOUCESTER

So will it, madam till I lie with you.

from the play Richard III. The preparsing of the collected works above includes the character's name. Without it, the output is a bit different:

$ find shakespeare/ -type f | xargs cat | sed -e 's/^[A-Za-z]*//g' \ 
    | tr A-Z a-z | tr -dC a-z > collected-works-no-names.txt 
$ cat collected-works-no-names.txt | ./pali
ama
lonanol
nymedemyn
ereherehere
reherehereher
illitmadamtilli

The second last character-by-character palindrome comes from the tragedy Troilus and Cressida:

PANDARUS

Hark! they are coming from the field: shall we
stand up here, and see them as they pass toward
Ilium? good niece, do, sweet niece Cressida.

CRESSIDA

At your pleasure.

PANDARUS

Here, here, here's an excellent place; here we may
see most bravely: I'll tell you them all by their
names as they pass by; but mark Troilus above the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment