Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save agentzh/5427856 to your computer and use it in GitHub Desktop.
Save agentzh/5427856 to your computer and use it in GitHub Desktop.
diff --git a/src/ngx_http_lua_regex.c b/src/ngx_http_lua_regex.c
index 41d0701..7772cbe 100644
--- a/src/ngx_http_lua_regex.c
+++ b/src/ngx_http_lua_regex.c
@@ -1097,6 +1097,10 @@ ngx_http_lua_ngx_re_parse_opts(lua_State *L, ngx_lua_regex_compile_t *re,
re->options |= PCRE_UTF8;
break;
+ case 'U':
+ re->options |= PCRE_UTF8|PCRE_NO_UTF8_CHECK;
+ break;
+
case 'x':
re->options |= PCRE_EXTENDED;
break;
diff --git a/t/034-match.t b/t/034-match.t
index 6efefe7..f37d923 100644
--- a/t/034-match.t
+++ b/t/034-match.t
@@ -9,7 +9,7 @@ use Test::Nginx::Socket;
repeat_each(2);
-plan tests => repeat_each() * (blocks() * 2 + 10);
+plan tests => repeat_each() * (blocks() * 2 + 12);
#no_diff();
no_long_string();
@@ -945,3 +945,32 @@ error: pcre_exec\(\) failed: -10 on "你.*?" using "你好"
--- no_error_log
[error]
+
+
+=== TEST 43: UTF-8 mode without UTF-8 sequence checks
+--- config
+ location /re {
+ content_by_lua '
+ m = ngx.re.match("你好", ".", "U")
+ if m then
+ ngx.say(m[0])
+ else
+ ngx.say("not matched!")
+ end
+ ';
+ }
+--- stap
+F(ngx_lua_regex_compile) {
+ printf("regex opts: %x\n", $rc->options)
+}
+
+--- stap_out
+regex opts: 2800
+
+--- request
+ GET /re
+--- response_body
+你
+--- no_error_log
+[error]
+
@lancelijade
Copy link

It looks much simpler, but according to my simple test, the PCRE_NO_UTF8_CHECK dose not really be set, below is the test

server
{
    listen 80;
    server_name test;
    access_log /data/logs/test-access.log combined_x;
    error_log  /data/logs/test-error.log debug;
    location /
    {
        content_by_lua '
          local key = string.rep ("张", 30000)
          local it, err = ngx.re.gmatch(key, ".", "Usoj")
          while true do
            local m, err = it()
            if not m then break end
          end
        ';
    }
}

[root@test ngx_openresty-1.2.7.6]# time curl "http://test/"

real 0m7.028s
user 0m0.002s
sys 0m0.000s

and compared with my patch

[root@test ngx_openresty-1.2.7.5-with-pcreutf-noutfcheckpatch]# time curl "http://test/"

real 0m0.013s
user 0m0.000s
sys 0m0.003s

I do think this maybe something wrong in the libpcre or misusing of pcre_exec, and tomorrow i will do a deeper test.

anyway, thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment