Skip to content

Instantly share code, notes, and snippets.

Created December 23, 2015 08:04
Show Gist options
  • Save anonymous/fa8c66bafee07ee6bf02 to your computer and use it in GitHub Desktop.
Save anonymous/fa8c66bafee07ee6bf02 to your computer and use it in GitHub Desktop.

An alternative implementation in C for this Code Review question in JavaScript: List all repeated substrings with a fixed length by Ismael Miguel,

On behalf of @CiaPan


The problem:

given a character string str and a positive integer n find all repeated (possibly overlapping) substrings of str having the length n.

As I commented in the original post, in C there is a quite effective solution based on pointer/array similarity: one can build an array of all suffixes of string str, that is an array f[] of pointers to str[0], str[1] etc.; then we sort the f array alphabetically. Once sorted, we scan it for all strings that have same n first characters. Since all such strings groups occupy consecutive places, the scan is linear.

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

int CmpF(const void *l, const void *r)
{
    // arguments are pointers to f[] items, which
    // themselves are pointers to string's characters

    const char *lstr = *(const char *const *)l;
    const char *rstr = *(const char *const *)r;

    return strcmp(lstr, rstr);  // compare two suffixes
}

void DbgPrintTab(const char *f[], int len)
{
    for(int i=0; i<len; i++)
        printf("%s\n", f[i]);
    printf("\n");
}

void FindDupSubstrings(const char *str, int n)
{
    int len = strlen(str);
    const char *f[200];     // suffixes (yes, quite a simplistic solution)
    if(len > 200)           // just to be safe...
        return;

    printf("input:'%s', len:%d, n:%d\n\n", str, len, n);

    for(int i=0; i<len; i++)                // fill array of suffixes
        f[i] = str+i;

    printf("suffixes:\n");
    DbgPrintTab(f, len);

    qsort(f, len, sizeof f[0], CmpF);       // sort array of suffixes

    printf("sorted:\n");
    DbgPrintTab(f, len);

    printf("results:\n");
    for(int i=0; i<len-1; i++) {            // scan array of suffixes
        if(strncmp(f[i], f[i+1], n) == 0)   // recognize equal substrings
            printf("pos %3d and %3d: %.*s\n", f[i+1]-str, f[i]-str, n, f[i]);
    }
    printf("\n");
}

int main(int argc, char * argv[])
{
    const char *str = "abcdeffeffdeffabc";
    FindDupSubstrings(str, 3);
    FindDupSubstrings(str, 4);

    return 0;
}

Output:

input:'abcdeffeffdeffabc', len:17, n:3

suffixes:
abcdeffeffdeffabc
bcdeffeffdeffabc
cdeffeffdeffabc
deffeffdeffabc
effeffdeffabc
ffeffdeffabc
feffdeffabc
effdeffabc
ffdeffabc
fdeffabc
deffabc
effabc
ffabc
fabc
abc
bc
c

sorted:
abc
abcdeffeffdeffabc
bc
bcdeffeffdeffabc
c
cdeffeffdeffabc
deffabc
deffeffdeffabc
effabc
effdeffabc
effeffdeffabc
fabc
fdeffabc
feffdeffabc
ffabc
ffdeffabc
ffeffdeffabc

results:
pos   0 and  14: abc
pos   3 and  10: def
pos   7 and  11: eff
pos   4 and   7: eff

input:'abcdeffeffdeffabc', len:17, n:4

suffixes:
abcdeffeffdeffabc
bcdeffeffdeffabc
cdeffeffdeffabc
deffeffdeffabc
effeffdeffabc
ffeffdeffabc
feffdeffabc
effdeffabc
ffdeffabc
fdeffabc
deffabc
effabc
ffabc
fabc
abc
bc
c

sorted:
abc
abcdeffeffdeffabc
bc
bcdeffeffdeffabc
c
cdeffeffdeffabc
deffabc
deffeffdeffabc
effabc
effdeffabc
effeffdeffabc
fabc
fdeffabc
feffdeffabc
ffabc
ffdeffabc
ffeffdeffabc

results:
pos   3 and  10: deff

For calls like

FindDupSubstrings("aaaaaa", 5);
FindDupSubstrings("aaaaaa", 4);

the results are:

input:'aaaaaa', len:6, n:5

suffixes:
aaaaaa
aaaaa
aaaa
aaa
aa
a

sorted:
a
aa
aaa
aaaa
aaaaa
aaaaaa

results:
pos   0 and   1: aaaaa

input:'aaaaaa', len:6, n:4

suffixes:
aaaaaa
aaaaa
aaaa
aaa
aa
a

sorted:
a
aa
aaa
aaaa
aaaaa
aaaaaa

results:
pos   1 and   2: aaaa
pos   0 and   1: aaaa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment