An alternative implementation in C for this Code Review question in JavaScript: List all repeated substrings with a fixed length by Ismael Miguel,
On behalf of @CiaPan
The problem:
given a character string
str
and a positive integern
find all repeated (possibly overlapping) substrings ofstr
having the lengthn
.
As I commented in the original post, in C there is a quite effective solution
based on pointer/array similarity: one can build an array of all suffixes
of string str
, that is an array f[]
of pointers to str[0]
, str[1]
etc.;
then we sort the f
array alphabetically. Once sorted, we scan it for all strings
that have same n
first characters.
Since all such strings groups occupy consecutive places, the scan is linear.
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
int CmpF(const void *l, const void *r)
{
// arguments are pointers to f[] items, which
// themselves are pointers to string's characters
const char *lstr = *(const char *const *)l;
const char *rstr = *(const char *const *)r;
return strcmp(lstr, rstr); // compare two suffixes
}
void DbgPrintTab(const char *f[], int len)
{
for(int i=0; i<len; i++)
printf("%s\n", f[i]);
printf("\n");
}
void FindDupSubstrings(const char *str, int n)
{
int len = strlen(str);
const char *f[200]; // suffixes (yes, quite a simplistic solution)
if(len > 200) // just to be safe...
return;
printf("input:'%s', len:%d, n:%d\n\n", str, len, n);
for(int i=0; i<len; i++) // fill array of suffixes
f[i] = str+i;
printf("suffixes:\n");
DbgPrintTab(f, len);
qsort(f, len, sizeof f[0], CmpF); // sort array of suffixes
printf("sorted:\n");
DbgPrintTab(f, len);
printf("results:\n");
for(int i=0; i<len-1; i++) { // scan array of suffixes
if(strncmp(f[i], f[i+1], n) == 0) // recognize equal substrings
printf("pos %3d and %3d: %.*s\n", f[i+1]-str, f[i]-str, n, f[i]);
}
printf("\n");
}
int main(int argc, char * argv[])
{
const char *str = "abcdeffeffdeffabc";
FindDupSubstrings(str, 3);
FindDupSubstrings(str, 4);
return 0;
}
Output:
input:'abcdeffeffdeffabc', len:17, n:3
suffixes:
abcdeffeffdeffabc
bcdeffeffdeffabc
cdeffeffdeffabc
deffeffdeffabc
effeffdeffabc
ffeffdeffabc
feffdeffabc
effdeffabc
ffdeffabc
fdeffabc
deffabc
effabc
ffabc
fabc
abc
bc
c
sorted:
abc
abcdeffeffdeffabc
bc
bcdeffeffdeffabc
c
cdeffeffdeffabc
deffabc
deffeffdeffabc
effabc
effdeffabc
effeffdeffabc
fabc
fdeffabc
feffdeffabc
ffabc
ffdeffabc
ffeffdeffabc
results:
pos 0 and 14: abc
pos 3 and 10: def
pos 7 and 11: eff
pos 4 and 7: eff
input:'abcdeffeffdeffabc', len:17, n:4
suffixes:
abcdeffeffdeffabc
bcdeffeffdeffabc
cdeffeffdeffabc
deffeffdeffabc
effeffdeffabc
ffeffdeffabc
feffdeffabc
effdeffabc
ffdeffabc
fdeffabc
deffabc
effabc
ffabc
fabc
abc
bc
c
sorted:
abc
abcdeffeffdeffabc
bc
bcdeffeffdeffabc
c
cdeffeffdeffabc
deffabc
deffeffdeffabc
effabc
effdeffabc
effeffdeffabc
fabc
fdeffabc
feffdeffabc
ffabc
ffdeffabc
ffeffdeffabc
results:
pos 3 and 10: deff
For calls like
FindDupSubstrings("aaaaaa", 5);
FindDupSubstrings("aaaaaa", 4);
the results are:
input:'aaaaaa', len:6, n:5
suffixes:
aaaaaa
aaaaa
aaaa
aaa
aa
a
sorted:
a
aa
aaa
aaaa
aaaaa
aaaaaa
results:
pos 0 and 1: aaaaa
input:'aaaaaa', len:6, n:4
suffixes:
aaaaaa
aaaaa
aaaa
aaa
aa
a
sorted:
a
aa
aaa
aaaa
aaaaa
aaaaaa
results:
pos 1 and 2: aaaa
pos 0 and 1: aaaa