Skip to content

Instantly share code, notes, and snippets.

@chrisbloom7
Created June 12, 2011 03:27
Show Gist options
  • Star 17 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save chrisbloom7/1021218 to your computer and use it in GitHub Desktop.
Save chrisbloom7/1021218 to your computer and use it in GitHub Desktop.
Find the longest common substring in an array of strings (PHP)
<?php
function longest_common_substring($words)
{
$words = array_map('strtolower', array_map('trim', $words));
$sort_by_strlen = create_function('$a, $b', 'if (strlen($a) == strlen($b)) { return strcmp($a, $b); } return (strlen($a) < strlen($b)) ? -1 : 1;');
usort($words, $sort_by_strlen);
// We have to assume that each string has something in common with the first
// string (post sort), we just need to figure out what the longest common
// string is. If any string DOES NOT have something in common with the first
// string, return false.
$longest_common_substring = array();
$shortest_string = str_split(array_shift($words));
while (sizeof($shortest_string)) {
array_unshift($longest_common_substring, '');
foreach ($shortest_string as $ci => $char) {
foreach ($words as $wi => $word) {
if (!strstr($word, $longest_common_substring[0] . $char)) {
// No match
break 2;
} // if
} // foreach
// we found the current char in each word, so add it to the first longest_common_substring element,
// then start checking again using the next char as well
$longest_common_substring[0].= $char;
} // foreach
// We've finished looping through the entire shortest_string.
// Remove the first char and start all over. Do this until there are no more
// chars to search on.
array_shift($shortest_string);
}
// If we made it here then we've run through everything
usort($longest_common_substring, $sort_by_strlen);
return array_pop($longest_common_substring);
}
?>
<?php
$array = array(
'PTT757LP4',
'PTT757A',
'PCT757B',
'PCT757LP4EV'
);
echo longest_common_substring($array);
// => T757
?>
@pr0fess0r
Copy link

Hi Chris
This is a fantastic function and has been a huge help to me.
One feature I'd love to see is the ability to create the substring with only complete words. For example, given
"Justin Bieber cannot sing" and
"Justin Bieber can sing"
can your function be modified to return
"Justin Bieber"
rather than
"Justin Bieber can"
?
Many thanks

Lucas

@giftlab
Copy link

giftlab commented Apr 12, 2013

@pr0fess0r that's easy enough to do post-processing - just check that the last character in the returned string is not a letter, then trim it. If it finishes with a complete word, then the string returned either matches the end of an input string (test with substr), or is terminated by a non-character (space, period, comma, etc).

@bobbbb
Copy link

bobbbb commented Jun 29, 2013

Dear Chris, this is a great function. Thank you for sharing!

Is there any chance you can tell me how to modify it to the the longest common subsequence problem?

I mean, while substrings are always consecutive parts of a string, subsequences need not be.

@chrisbloom7
Copy link
Author

Sorry for the late responses. Not sure GitHub ever notified me I had any comments.

@pr0fess0r - the function currently isn't aware of words, only characters. I don't think @giftlab's suggestion would work since the last character returned in your example would be the "n" in "can". My head hasn't been in PHP-land in a very long time so no solution is jumping immediately to mind. I'd love to hear how you solve it if you manage(d) to. I'm happy you found use for the code in any case.

@bobbbb - glad you found it useful. I don't have an immediate solution for what you're trying to solve and it's not likely to be anything I would be able to spend time looking at right now. Please do let me know if you found/find a solution.

@sasivarnakumar
Copy link

Thank you very much dude :) It helped me save my time writing a complex work around to my situation here.
Nice solution 👍

@chrisbloom7
Copy link
Author

@sasivarnakumar glad you found it useful after all this time!

@AliceWonderMiscreations

There is a bug

$a='Islas Vírgenes Británicas';
$b='Islas Vírgenes de EE. UU.';

$test = array($a, $b);

$foo = longest_common_substring($test);

print($foo);
// prints islas vírgenes opposed to Islas Vírgenes

Any suggestions?

@chrisbloom7
Copy link
Author

@AliceWonderMiscreations Looks like an issue with UTF-8 characters. PHP has a set of multibyte functions that you could use in place of their non-multibyte versions in the function. See http://php.net/manual/en/ref.mbstring.php

@relipse
Copy link

relipse commented Nov 24, 2021

Great function! Thank you!

@deadlydud
Copy link

to make it work with PHP8.x .. replace the create function line with:

16 $sort_by_strlen = function($a, $b) {
17 if (strlen($a) == strlen($b)) { return strcmp($a, $b); } return (strlen($a) < strlen($b)) ? -1 : 1;
18 };

@chrisbloom7
Copy link
Author

Thanks for that addition, @deadlydud! I think this was originally created for version 5.x, and I haven't used PHP much in the last 10 years 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment