Skip to content

Instantly share code, notes, and snippets.

@veer66
Last active June 26, 2016 02:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save veer66/433a32d2ea87331d9c94fc28f9474661 to your computer and use it in GitHub Desktop.
Save veer66/433a32d2ea87331d9c94fc28f9474661 to your computer and use it in GitHub Desktop.
Try to use regular expression in C for searching UTF-8 Thai string by calling Onigmo
/*
*
* Copyright (c) 2016 Vee Satayamas
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
*/
/*
*
* How to build:
* gcc -Wall -g oni1.c -lonig
*
********************************************
*
* What is needed?
* Onigmo (libonig)
*
* libonig on Debian/Ubuntu can be installed by apt-get install libonig-dev
*
*/
#include <stdio.h>
#include <oniguruma.h>
void
display_region(OnigRegion *region)
{
int i;
for (i = 0; i < region->num_regs; i++)
{
printf("%d...%d\n", region->beg[i], region->end[i]);
}
}
int
main(int argc, char **argv)
{
regex_t *reg;
OnigErrorInfo e_info;
UChar *pat = (UChar *)"[0-9]+";
UChar *str = (UChar *)"กา2559บิน";
OnigEncoding enc = ONIG_ENCODING_UTF8;
int r;
unsigned char *start, *end, *range;
OnigRegion *region;
int r_search;
r = onig_new(&reg,
pat,
pat + onigenc_str_bytelen_null(enc, pat),
ONIG_OPTION_NONE,
enc,
ONIG_SYNTAX_DEFAULT,
&e_info);
if (r != ONIG_NORMAL)
{
char s[ONIG_MAX_ERROR_MESSAGE_LEN];
onig_error_code_to_str(s, r, &e_info);
fprintf(stderr, "ERROR: %s\n", s);
return -1;
}
end = str + onigenc_str_bytelen_null(enc, str);
region = onig_region_new();
start = str;
range = end;
r_search = onig_search(reg,
str,
end,
start,
range,
region,
ONIG_OPTION_NONE);
if (r_search >= 0)
{
printf("MATCH:\n");
display_region(region);
}
else if (r_search == ONIG_MISMATCH)
{
printf("MISMATCH\n");
}
else
{
fprintf(stderr, "ERROR\n");
return -1;
}
onig_region_free(region, 1);
onig_free(reg);
onig_end();
return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment