Create a gist now

Instantly share code, notes, and snippets.

@jpsim /answers.md
Last active Apr 23, 2018

What would you like to do?
  1. Plain Strings (207): foo
  2. Anchors (208): k$
  3. Ranges (202): ^[a-f]*$
  4. Backrefs (201): (...).*\1
  5. Abba (169): ^(.(?!(ll|ss|mm|rr|tt|ff|cc|bb)))*$|^n|ef
  6. A man, a plan (177): ^(.)[^p].*\1$
  7. Prime (286): ^(?!(..+)\1+$)
  8. Four (199): (.)(.\1){3}
  9. Order (198): ^[^o].....?$
  10. Triples (507): (^39|^44)|(^([0369]|([147][0369]*[258])|(([258]|[147][0369]*[147])([0369]*|[258][0369]*[147])([147]|[258][0369]*[258])))*$)
  11. Glob (364): ^((\*(er|f|i|p|t|v))|(b|c(h|o|r)|do|l|mi|p(a|r|u)|re|w))
  12. Balance (286): ^(<(<(<(<(<(<<>>)*>)*>)*>)*>)*>)*$
  13. Powers (56): ^((((((((((x)\10?)\9?)\8?)\7?)\6?)\5?)\4?)\3?)\2?)\1?$

Total Score: 3060

@nadinengland

This comment has been minimized.

Show comment Hide comment
@nadinengland

nadinengland Dec 20, 2013

Abba (190): ^((?!(.)(.)\3\2).)*$

Abba (190): ^((?!(.)(.)\3\2).)*$

@jonathanmorley

This comment has been minimized.

Show comment Hide comment
@jonathanmorley

jonathanmorley Dec 20, 2013

Abba (193): ^(?!.*(.)(.)\2\1)

Abba (193): ^(?!.*(.)(.)\2\1)

@alkino

This comment has been minimized.

Show comment Hide comment
@alkino

alkino Dec 20, 2013

Power (58): ^(x|(.*)\2)$
Glob (378): ^(\*(er|f|i|p|t|v)|b|c[^a]|do|l|mi|p|re|w)

alkino commented Dec 20, 2013

Power (58): ^(x|(.*)\2)$
Glob (378): ^(\*(er|f|i|p|t|v)|b|c[^a]|do|l|mi|p|re|w)

@pochmann

This comment has been minimized.

Show comment Hide comment
@pochmann

pochmann Dec 20, 2013

Triples (588):(00[039]|12|015|50)$|1..?4|4.2|1.7|6.0|006
Powers (72):^(xx?|xxxx|x{8}|x{16}|x{32}|(x{64})*)$

Triples (588):(00[039]|12|015|50)$|1..?4|4.2|1.7|6.0|006
Powers (72):^(xx?|xxxx|x{8}|x{16}|x{32}|(x{64})*)$

@balrok

This comment has been minimized.

Show comment Hide comment
@balrok

balrok Dec 20, 2013

Order (199): ^[^o]?.{5}$

balrok commented Dec 20, 2013

Order (199): ^[^o]?.{5}$

@Ghoughpteighbteau

This comment has been minimized.

Show comment Hide comment
@Ghoughpteighbteau

Ghoughpteighbteau Dec 20, 2013

^(((((((((x|xx)\9?)\8?)\7?)\6?)\5?)\4?)\3?)\2?)\1?$
59 points

^(((((((((x|xx)\9?)\8?)\7?)\6?)\5?)\4?)\3?)\2?)\1?$
59 points

@drbitboy

This comment has been minimized.

Show comment Hide comment
@drbitboy

drbitboy Dec 21, 2013

Powers:

^(((((((((xx?)\9?)\8?)\7?)\6?)\5?)\4?)\3?)\2?)\1?$

60 points, but I like 72 by pochmann above

Powers:

^(((((((((xx?)\9?)\8?)\7?)\6?)\5?)\4?)\3?)\2?)\1?$

60 points, but I like 72 by pochmann above

@rexso

This comment has been minimized.

Show comment Hide comment
@rexso

rexso Dec 21, 2013

Glob (382): ^(*(er|[fiptv])|[blpw]|[crdm][^ab]\w)

rexso commented Dec 21, 2013

Glob (382): ^(*(er|[fiptv])|[blpw]|[crdm][^ab]\w)

@zjhiphop

This comment has been minimized.

Show comment Hide comment
@zjhiphop

zjhiphop Dec 21, 2013

Glob (382): ^(\*(er|[fiptv])|[blpw]|[crdm][^ab]\w)

Glob (382): ^(\*(er|[fiptv])|[blpw]|[crdm][^ab]\w)

@czjxy881

This comment has been minimized.

Show comment Hide comment
@czjxy881

czjxy881 Dec 21, 2013

Glob (385):^(p|w|l|c[^a]|b|do|mi|[fiptv])|rr
power(77):^(x?x|x{4}|(x{8}){1,4}|(x{64})
)$

Glob (385):^(p|w|l|c[^a]|b|do|mi|[fiptv])|rr
power(77):^(x?x|x{4}|(x{8}){1,4}|(x{64})
)$

@pochmann

This comment has been minimized.

Show comment Hide comment
@pochmann

pochmann Dec 21, 2013

Powers (93): ^(?!(x(xx)+)\1*$)

Powers (93): ^(?!(x(xx)+)\1*$)

@htchaan

This comment has been minimized.

Show comment Hide comment
@htchaan

htchaan Dec 21, 2013

Abba (193): ^((.)(?!\2))*$|ef

htchaan commented Dec 21, 2013

Abba (193): ^((.)(?!\2))*$|ef

@rabcyr

This comment has been minimized.

Show comment Hide comment
@rabcyr

rabcyr Dec 21, 2013

glob (385): rr|wn|c$|^[blpw]|^c[hor]|^*[fiptv]

rabcyr commented Dec 21, 2013

glob (385): rr|wn|c$|^[blpw]|^c[hor]|^*[fiptv]

@rabcyr

This comment has been minimized.

Show comment Hide comment
@rabcyr

rabcyr Dec 21, 2013

balance (287): ^(<(<(<(<(<>)>)>)>)>|.{54})*$

rabcyr commented Dec 21, 2013

balance (287): ^(<(<(<(<(<>)>)>)>)>|.{54})*$

@bitzhuxb

This comment has been minimized.

Show comment Hide comment
@bitzhuxb

bitzhuxb Dec 22, 2013

what is the meaning of "?!" and what is "ef" in the end?

what is the meaning of "?!" and what is "ef" in the end?

@lyda

This comment has been minimized.

Show comment Hide comment
@lyda

lyda Dec 22, 2013

For Powers(77) I have: ^(xx?|xxxx|(x{8}){1,4}|(x{64})+)$
Note that when playing with the powers re (x{64}){0,} can make your browser very sad.

There are two new ones and I'm not hugely happy with my solutions:
Long count (219): ^0* 00.{8}0011 01.. ..01 0110 01* 10* 1001 1010 101
Long count v2 (198): ^0* 0_1.{6}0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1_0 1*

lyda commented Dec 22, 2013

For Powers(77) I have: ^(xx?|xxxx|(x{8}){1,4}|(x{64})+)$
Note that when playing with the powers re (x{64}){0,} can make your browser very sad.

There are two new ones and I'm not hugely happy with my solutions:
Long count (219): ^0* 00.{8}0011 01.. ..01 0110 01* 10* 1001 1010 101
Long count v2 (198): ^0* 0_1.{6}0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1_0 1*

@huberteff

This comment has been minimized.

Show comment Hide comment
@huberteff

huberteff Dec 22, 2013

Order (156): ^a_b_c_d_e_f_g_h_i_j_k_l_m_n_o_p_q_r_s_t_u_v_w_x_y_z_$

Order (156): ^a_b_c_d_e_f_g_h_i_j_k_l_m_n_o_p_q_r_s_t_u_v_w_x_y_z_$

@romain-dartigues

This comment has been minimized.

Show comment Hide comment
@romain-dartigues

romain-dartigues Dec 23, 2013

@bitzhuxb < (?!x) is a negative-lookahead (GFGI), |ef mean "or ef"

You can get a (free) visual representation of the logical path of the regexp on https://www.debuggex.com/

@bitzhuxb < (?!x) is a negative-lookahead (GFGI), |ef mean "or ef"

You can get a (free) visual representation of the logical path of the regexp on https://www.debuggex.com/

@bbarry

This comment has been minimized.

Show comment Hide comment
@bbarry

bbarry Dec 23, 2013

So far I have:

  1. Plain Strings: foo (207)
  2. Anchors: k$ (208)
  3. Ranges: ^[a-f]+$ (202)
  4. Backrefs: (...).*\1 (201)
  5. Abba:^(?!.*(.)\1)|ef (195)
  6. A man, a plan: ^(.)[^p].*\1$ (177)
  7. Prime: ^(?!(..+)\1+$) (286)
  8. Four: (.)(.\1){3} (199)
  9. Order: ^.{5}[^e]?$ (199)
  10. Triples: 00($|3|6|9|12|15)|4.2|.1.+4|55|.17 (597)
  11. Glob: [bncrw][bporn]|^p|c$|ta (397)
  12. Balance: ^(<(<(<(..)*>)*>)*>)*$ (288)
  13. Powers: ^(x|(xx){1,9}|x{32}|(x{64})+)$ (80)
  14. Long count: ^((.+)0 \2+1 ?)*$ (253)
  15. Long count v2: ^((.+)0 \2+1 ?)*$ (253)
  16. Alphabetical ( .+[ts]..)\1|(tat|r). r|a t|e .r (307)

score: 4061

bbarry commented Dec 23, 2013

So far I have:

  1. Plain Strings: foo (207)
  2. Anchors: k$ (208)
  3. Ranges: ^[a-f]+$ (202)
  4. Backrefs: (...).*\1 (201)
  5. Abba:^(?!.*(.)\1)|ef (195)
  6. A man, a plan: ^(.)[^p].*\1$ (177)
  7. Prime: ^(?!(..+)\1+$) (286)
  8. Four: (.)(.\1){3} (199)
  9. Order: ^.{5}[^e]?$ (199)
  10. Triples: 00($|3|6|9|12|15)|4.2|.1.+4|55|.17 (597)
  11. Glob: [bncrw][bporn]|^p|c$|ta (397)
  12. Balance: ^(<(<(<(..)*>)*>)*>)*$ (288)
  13. Powers: ^(x|(xx){1,9}|x{32}|(x{64})+)$ (80)
  14. Long count: ^((.+)0 \2+1 ?)*$ (253)
  15. Long count v2: ^((.+)0 \2+1 ?)*$ (253)
  16. Alphabetical ( .+[ts]..)\1|(tat|r). r|a t|e .r (307)

score: 4061

@Zingler

This comment has been minimized.

Show comment Hide comment
@Zingler

Zingler Dec 23, 2013

If any one is curious, here is the general solution for triples. Works for any arbitrary multiple of 3.

([0369]|[147][0369][258]|([258]|[147][0369][147])([0369]|[258][0369][147])([147]|[258][0369][258]))*

Zingler commented Dec 23, 2013

If any one is curious, here is the general solution for triples. Works for any arbitrary multiple of 3.

([0369]|[147][0369][258]|([258]|[147][0369][147])([0369]|[258][0369][147])([147]|[258][0369][258]))*

@bbarry

This comment has been minimized.

Show comment Hide comment
@bbarry

bbarry Dec 23, 2013

general triples (formatted):

^(
  [0369]
 |[258][0369]*[147]
 |[147]([0369]|[147][0369]*[258])*[258]
 |[258][0369]*[258]([0369]|[147][0369]*[258])*[258]
 |[147]([0369]|[147][0369]*[258])*[147][0369]*[147]
 |[258][0369]*[258]([0369]|[147][0369]*[258])*[147][0369]*[147]
)*$

see: http://quaxio.com/triple/

bbarry commented Dec 23, 2013

general triples (formatted):

^(
  [0369]
 |[258][0369]*[147]
 |[147]([0369]|[147][0369]*[258])*[258]
 |[258][0369]*[258]([0369]|[147][0369]*[258])*[258]
 |[147]([0369]|[147][0369]*[258])*[147][0369]*[147]
 |[258][0369]*[258]([0369]|[147][0369]*[258])*[147][0369]*[147]
)*$

see: http://quaxio.com/triple/

@fauxparse

This comment has been minimized.

Show comment Hide comment
@fauxparse

fauxparse Dec 25, 2013

A man, a plan (170):
^(.?)(.)(.).?\3\2\1$

I know it's a lower score, but it's less cheaty.

A man, a plan (170):
^(.?)(.)(.).?\3\2\1$

I know it's a lower score, but it's less cheaty.

@sammiya

This comment has been minimized.

Show comment Hide comment
@sammiya

sammiya Dec 26, 2013

Abba(195): ^(?!.*(.)\1)|ef

sammiya commented Dec 26, 2013

Abba(195): ^(?!.*(.)\1)|ef

@vagrawal

This comment has been minimized.

Show comment Hide comment
@vagrawal

vagrawal Dec 27, 2013

Powers: ^(?!(x(xx)+)\1*$) (93)

Does anyone knows the non-cheating solution to alphabetical.

Powers: ^(?!(x(xx)+)\1*$) (93)

Does anyone knows the non-cheating solution to alphabetical.

@wimjongman

This comment has been minimized.

Show comment Hide comment
@wimjongman

wimjongman Dec 27, 2013

a man, a plan. no cheating

^(.)(.).*\2\1$

176

a man, a plan. no cheating

^(.)(.).*\2\1$

176

@bbarry

This comment has been minimized.

Show comment Hide comment
@bbarry

bbarry Dec 28, 2013

@fugyk: I don't think it is possible to create a regular expression that matches any list of words in alphabetical order (one of 6 letter combinations only would be incredibly long). I guess the hint is a reference back to #9 which is possible without cheating (even though it says to cheat). I stand corrected, ty @berndjendrissek.

You could create the brute force expression of every possible 6 letter word in order containing the letters aenrst (which is still terribly long), or the one containing all the words in this list (-69 points, goes: ^(aerate ?)*(arrest ?)*...$ and is 409 characters long). Anything beyond that I feel is cheating so badly that it doesn't matter how much more you do.

I want to know what the meaning of the hitchhikers guide reference in long count v2 is all about. My solution there cheats a lot (it matches far more than the intended binary nibble count sequence, just happens to not match any of his counter examples because none of them swap more than 1 nibble out).

bbarry commented Dec 28, 2013

@fugyk: I don't think it is possible to create a regular expression that matches any list of words in alphabetical order (one of 6 letter combinations only would be incredibly long). I guess the hint is a reference back to #9 which is possible without cheating (even though it says to cheat). I stand corrected, ty @berndjendrissek.

You could create the brute force expression of every possible 6 letter word in order containing the letters aenrst (which is still terribly long), or the one containing all the words in this list (-69 points, goes: ^(aerate ?)*(arrest ?)*...$ and is 409 characters long). Anything beyond that I feel is cheating so badly that it doesn't matter how much more you do.

I want to know what the meaning of the hitchhikers guide reference in long count v2 is all about. My solution there cheats a lot (it matches far more than the intended binary nibble count sequence, just happens to not match any of his counter examples because none of them swap more than 1 nibble out).

@berndjendrissek

This comment has been minimized.

Show comment Hide comment
@berndjendrissek

berndjendrissek Dec 28, 2013

@bbarry: You don't need to list every possible word for a non-cheating solution. It suffices to exclude any adjacent pair with a common prefix (which may have zero length) followed by even a single out-of-order letter: ^(?!.*\b(([^ ]*)t[^ ]+ \2[^t][^ ]*|([^ ]*)s[^ ]+ \3[^st][^ ]*|([^ ]*)r[^ ]+ \4[aen][^ ]*|([^ ]*)n[^ ]+ \5[ae][^ ]*|([^ ]*)e[^ ]+ \6[a][^ ]*).*$)

@bbarry: You don't need to list every possible word for a non-cheating solution. It suffices to exclude any adjacent pair with a common prefix (which may have zero length) followed by even a single out-of-order letter: ^(?!.*\b(([^ ]*)t[^ ]+ \2[^t][^ ]*|([^ ]*)s[^ ]+ \3[^st][^ ]*|([^ ]*)r[^ ]+ \4[aen][^ ]*|([^ ]*)n[^ ]+ \5[ae][^ ]*|([^ ]*)e[^ ]+ \6[a][^ ]*).*$)

@alokmenghrajani

This comment has been minimized.

Show comment Hide comment
@alokmenghrajani

alokmenghrajani Jan 6, 2014

Erling added 3 more levels:
14. Long count
15. Long count v2
16. Alphabetical

Erling added 3 more levels:
14. Long count
15. Long count v2
16. Alphabetical

@DiEvAl

This comment has been minimized.

Show comment Hide comment
@DiEvAl

DiEvAl Jan 7, 2014

My (non-cheaty) solution to glob:
339: ^(\*?)(\w*)(\*?)(\w*)(\*?)(\w*) .* ((?!\1).+|\1)\2((?!\3).+|\3)\4((?!\5).+|\5)\6$
This works only for globs that match (\*?\w*){3}, but you can easily expand it.

Also a version that supports ? wildcards (globs must match (\*?\??\w*){2}):
^(\*?)(\??)(\w*)(\*?)(\??)(\w*) .* ((?!\1).+|\1)((?!\2).|\2)\3((?!\4).+|\4)((?!\5).|\5)\6$

DiEvAl commented Jan 7, 2014

My (non-cheaty) solution to glob:
339: ^(\*?)(\w*)(\*?)(\w*)(\*?)(\w*) .* ((?!\1).+|\1)\2((?!\3).+|\3)\4((?!\5).+|\5)\6$
This works only for globs that match (\*?\w*){3}, but you can easily expand it.

Also a version that supports ? wildcards (globs must match (\*?\??\w*){2}):
^(\*?)(\??)(\w*)(\*?)(\??)(\w*) .* ((?!\1).+|\1)((?!\2).|\2)\3((?!\4).+|\4)((?!\5).|\5)\6$

@DiEvAl

This comment has been minimized.

Show comment Hide comment
@DiEvAl

DiEvAl Jan 8, 2014

My non-cheaty solution for glob (using berndjendrissek's idea):
234: ^(?!.*(\b(\w*)t\w* \2[aenrs]|\b(\w*)s\w* \3[aenr]|\b(\w*)r\w* \4[aen]|\b(\w*)n\w* \5[ae]|\b(\w*)e\w* \6a))

Actually, long counts are just a special case of alphabetical. So here are my 2 solutions for it:
Almost non-cheaty long count (1 and 2):
221: ^(?!.*\b((\d*)1\d* \2[0]|(\d{4}) \3))(\d{4} ){15}
But this regex also matches stuff like 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 11111, 0000 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 abcd, 0000 0002 0021 0023 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Non-cheaty long count (1 and 2):
211: ^(?!.*\b((\d*)1\d* \2[0]|(\d{4}) \3))([01]{4} ){15}[01]{4}$
This should match only the correct string (although I haven't tested it on all ℵ0 strings ;-) ).

DiEvAl commented Jan 8, 2014

My non-cheaty solution for glob (using berndjendrissek's idea):
234: ^(?!.*(\b(\w*)t\w* \2[aenrs]|\b(\w*)s\w* \3[aenr]|\b(\w*)r\w* \4[aen]|\b(\w*)n\w* \5[ae]|\b(\w*)e\w* \6a))

Actually, long counts are just a special case of alphabetical. So here are my 2 solutions for it:
Almost non-cheaty long count (1 and 2):
221: ^(?!.*\b((\d*)1\d* \2[0]|(\d{4}) \3))(\d{4} ){15}
But this regex also matches stuff like 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 11111, 0000 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 abcd, 0000 0002 0021 0023 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Non-cheaty long count (1 and 2):
211: ^(?!.*\b((\d*)1\d* \2[0]|(\d{4}) \3))([01]{4} ){15}[01]{4}$
This should match only the correct string (although I haven't tested it on all ℵ0 strings ;-) ).

@jdashton

This comment has been minimized.

Show comment Hide comment
@jdashton

jdashton Jan 8, 2014

Can we agree on what it means to cheat in this context? I've been assuming it means something like this: "Your solution works for the specific data in the problem, but would fail for other values that fit the larger apparent scheme."

jdashton commented Jan 8, 2014

Can we agree on what it means to cheat in this context? I've been assuming it means something like this: "Your solution works for the specific data in the problem, but would fail for other values that fit the larger apparent scheme."

@sorcio

This comment has been minimized.

Show comment Hide comment
@sorcio

sorcio Jan 8, 2014

Only partially cheating Glob (360):

^(.*)\*(.*) .* \1.+\2$|^(.+) .* \3$|^.*\*(.*)\*.* .* .+\4.+$

sorcio commented Jan 8, 2014

Only partially cheating Glob (360):

^(.*)\*(.*) .* \1.+\2$|^(.+) .* \3$|^.*\*(.*)\*.* .* .+\4.+$
@AlanDeSmet

This comment has been minimized.

Show comment Hide comment
@AlanDeSmet

AlanDeSmet Jan 9, 2014

Alphabetical (-109) "Pure" solution: Should work for any number of words of any length limited to characters [a-z]. I haven't even considered the case where the words are of varying lengths, but my brain is done for tonight. But by my own standards, this is a cheat-free solution, and I haven't seen any others that meet that standard. Pity that it scores so abysmally.

^(?!.*\b((.*)b.*\b\2[a]|(.*)c.*\b\3[ab]|(.*)d.*\b\4[a-c]|(.*)e.*\b\5[a-d]|(.*)f.*\b\6[a-e]|(.*)g.*\b\7[a-f]|(.*)h.*\b\8[a-g]|(.*)i.*\b\9[a-h]|(.*)j.*\b\10[a-i]|(.*)k.*\b\11[a-j]|(.*)l.*\b\12[a-k]|(.*)m.*\b\13[a-l]|(.*)n.*\b\14[a-m]|(.*)o.*\b\15[a-n]|(.*)p.*\b\16[a-o]|(.*)q.*\b\17[a-p]|(.*)r.*\b\18[a-q]|(.*)s.*\b\19[a-r]|(.*)t.*\b\20[a-s]|(.*)u.*\b\21[a-t]|(.*)v.*\b\22[a-u]|(.*)w.*\b\23[a-v]|(.*)x.*\b\24[a-w]|(.*)y.*\b\25[a-x]|(.*)z.*\b\26[a-y]))

Alphabetical (264): Limited to just the letters [ernst], the ones that appear in the tests. I consider this cheating, but your mileage may vary. This is the solution I ended up using for scoring; although I see now that others have trumped me; well played!

^(?!.*\b((.*)n.*\b\2[e]|(.*)r.*\b\3[ne]|(.*)s.*\b\4[rne]|(.*)t.*\b\5[srne]))

Others discovered my other solutions first, so I don't have anything else to add.

@pochmann / @fugyk: Upon seeing a 17-character solution to Powers, my first thought was "WTF?" Upon working through it, my hat is off to you, that is elegant.

Alphabetical (-109) "Pure" solution: Should work for any number of words of any length limited to characters [a-z]. I haven't even considered the case where the words are of varying lengths, but my brain is done for tonight. But by my own standards, this is a cheat-free solution, and I haven't seen any others that meet that standard. Pity that it scores so abysmally.

^(?!.*\b((.*)b.*\b\2[a]|(.*)c.*\b\3[ab]|(.*)d.*\b\4[a-c]|(.*)e.*\b\5[a-d]|(.*)f.*\b\6[a-e]|(.*)g.*\b\7[a-f]|(.*)h.*\b\8[a-g]|(.*)i.*\b\9[a-h]|(.*)j.*\b\10[a-i]|(.*)k.*\b\11[a-j]|(.*)l.*\b\12[a-k]|(.*)m.*\b\13[a-l]|(.*)n.*\b\14[a-m]|(.*)o.*\b\15[a-n]|(.*)p.*\b\16[a-o]|(.*)q.*\b\17[a-p]|(.*)r.*\b\18[a-q]|(.*)s.*\b\19[a-r]|(.*)t.*\b\20[a-s]|(.*)u.*\b\21[a-t]|(.*)v.*\b\22[a-u]|(.*)w.*\b\23[a-v]|(.*)x.*\b\24[a-w]|(.*)y.*\b\25[a-x]|(.*)z.*\b\26[a-y]))

Alphabetical (264): Limited to just the letters [ernst], the ones that appear in the tests. I consider this cheating, but your mileage may vary. This is the solution I ended up using for scoring; although I see now that others have trumped me; well played!

^(?!.*\b((.*)n.*\b\2[e]|(.*)r.*\b\3[ne]|(.*)s.*\b\4[rne]|(.*)t.*\b\5[srne]))

Others discovered my other solutions first, so I don't have anything else to add.

@pochmann / @fugyk: Upon seeing a 17-character solution to Powers, my first thought was "WTF?" Upon working through it, my hat is off to you, that is elegant.

@dcwarwick

This comment has been minimized.

Show comment Hide comment
@dcwarwick

dcwarwick Jan 13, 2014

Alphabetical (286): scan for just the disorders that occur, and use spaces to match spaces:

^(?!.* ((.*)t.* \2[es]|(.*)s.* \3[nr]|(.*)r.* \4[en]))

Alphabetical (286): scan for just the disorders that occur, and use spaces to match spaces:

^(?!.* ((.*)t.* \2[es]|(.*)s.* \3[nr]|(.*)r.* \4[en]))
@dcwarwick

This comment has been minimized.

Show comment Hide comment
@dcwarwick

dcwarwick Jan 13, 2014

@jdashton, good question, I think there are several types of "cheating" we can think about. Erling seems to contrast "cheat" with "exact", so I think he's referring to solutions which don't fully discriminate but which score higher than other solutions that do fully discriminate because of the scoring system -- what we might call a 'tactical failure'. I suggest the following terms:

A solution is exact if it matches all the strings in the left column and does not match all the strings in the right column. The highest-scoring solution may not be exact, because the advantage of being more compact may outweigh the disadvantage of either missing one or more matches or making one or more incorrect matches or both.

A solution is robust if, in addition to being exact, it would continue to be exact if the test data were to include further examples following the pattern implicit in the question. Exactly what the pattern implicit in the question is may sometimes be somewhat ambiguous, or there might be different levels of robustness. For example, a solution for 'Alphabetical' could be robust against additional sequences of words of the form [aenrst]*, or it could be robust against additional sequences of general words using any letters.

For each problem there are then three (or more) goals to pursue:
a) the highest-scoring solution.
b) the highest-scoring exact solution.
c) the highest-scoring robust solution.
In some cases the same solution may occupy two, or all three, of these slots, but in many cases there will be different solutions for these three goals. Obviously, it is the (a) goals which will lead to the highest maximum score, but the (b) goals would presumably be what Erling has in mind for the 'classic' league, and the (c) goals fulfil some deeper satisfaction in the purity of the solution...

@jdashton, good question, I think there are several types of "cheating" we can think about. Erling seems to contrast "cheat" with "exact", so I think he's referring to solutions which don't fully discriminate but which score higher than other solutions that do fully discriminate because of the scoring system -- what we might call a 'tactical failure'. I suggest the following terms:

A solution is exact if it matches all the strings in the left column and does not match all the strings in the right column. The highest-scoring solution may not be exact, because the advantage of being more compact may outweigh the disadvantage of either missing one or more matches or making one or more incorrect matches or both.

A solution is robust if, in addition to being exact, it would continue to be exact if the test data were to include further examples following the pattern implicit in the question. Exactly what the pattern implicit in the question is may sometimes be somewhat ambiguous, or there might be different levels of robustness. For example, a solution for 'Alphabetical' could be robust against additional sequences of words of the form [aenrst]*, or it could be robust against additional sequences of general words using any letters.

For each problem there are then three (or more) goals to pursue:
a) the highest-scoring solution.
b) the highest-scoring exact solution.
c) the highest-scoring robust solution.
In some cases the same solution may occupy two, or all three, of these slots, but in many cases there will be different solutions for these three goals. Obviously, it is the (a) goals which will lead to the highest maximum score, but the (b) goals would presumably be what Erling has in mind for the 'classic' league, and the (c) goals fulfil some deeper satisfaction in the purity of the solution...

@dcwarwick

This comment has been minimized.

Show comment Hide comment
@dcwarwick

dcwarwick Jan 13, 2014

Long count (exact 229): scan for backward counting, groups of more than 4, and duplicated groups. By using \w in the scan for groups of more than 4, this is also exact, also scoring 229, for Long count v2.

^(?!.* ((\d*)1\d* \2[0]|\w{5,}|(\d*) \3))

Long count (exact 229): scan for backward counting, groups of more than 4, and duplicated groups. By using \w in the scan for groups of more than 4, this is also exact, also scoring 229, for Long count v2.

^(?!.* ((\d*)1\d* \2[0]|\w{5,}|(\d*) \3))
@dcwarwick

This comment has been minimized.

Show comment Hide comment
@dcwarwick

dcwarwick Jan 13, 2014

In these terms, the 286 for Alphabetical I posted above is exact but not remotely robust.

In these terms, the 286 for Alphabetical I posted above is exact but not remotely robust.

@sshock

This comment has been minimized.

Show comment Hide comment
@sshock

sshock Jan 13, 2014

My non-cheaty solution to glob:
354: ^(.*) .* \1$|^(.*)\*(.*) .* \2.+\3$|(.*)\*(.*)\*(.*) .* \4.+\5.+\6

I started by putting these together:
^(.*) matches \1$
^(.*)\*(.*) matches \1.+\2$
^(.*)\*(.*)\*(.*) matches \1.+\2.+\3$
^(.*)\*(.*)\*(.*)\*(.*) matches \1.+\2.+\3.+\4$

Then my only actual cheats were changing matches to .* and leaving out the 4th one by getting rid of the anchors ^$ on the 3rd one.

sshock commented Jan 13, 2014

My non-cheaty solution to glob:
354: ^(.*) .* \1$|^(.*)\*(.*) .* \2.+\3$|(.*)\*(.*)\*(.*) .* \4.+\5.+\6

I started by putting these together:
^(.*) matches \1$
^(.*)\*(.*) matches \1.+\2$
^(.*)\*(.*)\*(.*) matches \1.+\2.+\3$
^(.*)\*(.*)\*(.*)\*(.*) matches \1.+\2.+\3.+\4$

Then my only actual cheats were changing matches to .* and leaving out the 4th one by getting rid of the anchors ^$ on the 3rd one.

@ghost

This comment has been minimized.

Show comment Hide comment
@ghost

ghost Jan 14, 2014

abba

[^(.)\1{2}]

[not (any character) \call it capture group 1 {repeated 2 times}]

ghost commented Jan 14, 2014

abba

[^(.)\1{2}]

[not (any character) \call it capture group 1 {repeated 2 times}]

@abjr

This comment has been minimized.

Show comment Hide comment
@abjr

abjr Jan 15, 2014

  1. Long count: ((.+)0\2[1]){8} (255)
  2. Long count v2: ((.+)0\2[1]){8} (255)

Would be shorter if \2[1] could be written as \21

abjr commented Jan 15, 2014

  1. Long count: ((.+)0\2[1]){8} (255)
  2. Long count v2: ((.+)0\2[1]){8} (255)

Would be shorter if \2[1] could be written as \21

@hugetoon

This comment has been minimized.

Show comment Hide comment
@hugetoon

hugetoon Jan 15, 2014

Long count( v2)?: ((.+)0\2+1){8} (256)

Long count( v2)?: ((.+)0\2+1){8} (256)

@abjr

This comment has been minimized.

Show comment Hide comment
@abjr

abjr Jan 15, 2014

@hugetoon: nice! :)

abjr commented Jan 15, 2014

@hugetoon: nice! :)

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Jan 15, 2014

@DiEvAl: Your robust solution schema for Glob is inspiring!

Glob (339) by DiEvAl: ^(\*?)(\w*)(\*?)(\w*)(\*?)(\w*) .* ((?!\1).+|\1)\2((?!\3).+|\3)\4((?!\5).+|\5)\6$

One thought: Could this be improved by replacing all instances of \w with .? I don't see that robustness is affected. The substitution is fine assuming that no asterisks appear to the right of matches, an assumption it appears you've already made.

teukon commented Jan 15, 2014

@DiEvAl: Your robust solution schema for Glob is inspiring!

Glob (339) by DiEvAl: ^(\*?)(\w*)(\*?)(\w*)(\*?)(\w*) .* ((?!\1).+|\1)\2((?!\3).+|\3)\4((?!\5).+|\5)\6$

One thought: Could this be improved by replacing all instances of \w with .? I don't see that robustness is affected. The substitution is fine assuming that no asterisks appear to the right of matches, an assumption it appears you've already made.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Jan 15, 2014

Further thoughts on robustness as introduced by dcwarwick above.

Sometimes the context of a problem is not very clear. Anchors is a perfect example. It's clear that the words to match have been selected for ending in ick but it seems reasonable to assume that words ending in k have been intentionally omitted from the list of strings to avoid. In extending the list of examples, would the string dark fit on the left or the right, or would such a string be considered outside the scope of the problem. Without a complete list or a program able to generate strings to be matched and dodged, the notion of robustness is subjective.

I also like to distinguish between improvements to robustness and "generalisations". Triples highlights the concept well. All the strings given are of length 9. I think a solution which assumed this would be no less robust than one which matched arbitrary multiples of 3, but the latter would certainly be more general. Equally, in Alphabetical, I'd argue that a robust regex can assume any given string is 7 space-separated 6-letter words composed wholly of letters from {a, e, n, r, s, t}. A solution which relaxed on or more of these assumptions would not be any more robust but would be more general.

How far a robust regex must go in checking syntax also follows from the context outlined by the target strings. With Glob for example, a solution which uses .* instead of matches seems no less robust. Of all the strings given there is no example of *ana* munches bananas, such a string is clearly beyond the scope of the problem. In contrast, with Long count and Long count v2, the aim seems to be to create a regex which matches the given string and only that string; anything less is arguably not robust. The domain of strings to be considered is the set of all finite strings.

Finally, I would like to suggest the term schema. Balance is a good example for this. It is well known that there is no regular expression which matches all balanced strings of nested brackets while rejecting malformed such strings. However it is possible to define a rule which takes an argument, say max_depth, and return a regex which, up to a nesting limit of max_depth, matches/misses strings of brackets appropriately. I would say that such a schema is robust and would be tempted to extend the term (perhaps with a prefix of "bounded") to a member of the schema which matched/missed all the strings given as appropriate. I believe that "A man, a plan" and "Glob" are the only other problems which do not have unbounded robust solutions.

Criticisms and/or clearer definitions welcome.

teukon commented Jan 15, 2014

Further thoughts on robustness as introduced by dcwarwick above.

Sometimes the context of a problem is not very clear. Anchors is a perfect example. It's clear that the words to match have been selected for ending in ick but it seems reasonable to assume that words ending in k have been intentionally omitted from the list of strings to avoid. In extending the list of examples, would the string dark fit on the left or the right, or would such a string be considered outside the scope of the problem. Without a complete list or a program able to generate strings to be matched and dodged, the notion of robustness is subjective.

I also like to distinguish between improvements to robustness and "generalisations". Triples highlights the concept well. All the strings given are of length 9. I think a solution which assumed this would be no less robust than one which matched arbitrary multiples of 3, but the latter would certainly be more general. Equally, in Alphabetical, I'd argue that a robust regex can assume any given string is 7 space-separated 6-letter words composed wholly of letters from {a, e, n, r, s, t}. A solution which relaxed on or more of these assumptions would not be any more robust but would be more general.

How far a robust regex must go in checking syntax also follows from the context outlined by the target strings. With Glob for example, a solution which uses .* instead of matches seems no less robust. Of all the strings given there is no example of *ana* munches bananas, such a string is clearly beyond the scope of the problem. In contrast, with Long count and Long count v2, the aim seems to be to create a regex which matches the given string and only that string; anything less is arguably not robust. The domain of strings to be considered is the set of all finite strings.

Finally, I would like to suggest the term schema. Balance is a good example for this. It is well known that there is no regular expression which matches all balanced strings of nested brackets while rejecting malformed such strings. However it is possible to define a rule which takes an argument, say max_depth, and return a regex which, up to a nesting limit of max_depth, matches/misses strings of brackets appropriately. I would say that such a schema is robust and would be tempted to extend the term (perhaps with a prefix of "bounded") to a member of the schema which matched/missed all the strings given as appropriate. I believe that "A man, a plan" and "Glob" are the only other problems which do not have unbounded robust solutions.

Criticisms and/or clearer definitions welcome.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Jan 15, 2014

My robust solution to Alphabetical (282):

^(?!.*\b(.*)(e|(n|r|(s|t))).* \1(a|(?!\3)[en]|(?!\4)[rs]))

teukon commented Jan 15, 2014

My robust solution to Alphabetical (282):

^(?!.*\b(.*)(e|(n|r|(s|t))).* \1(a|(?!\3)[en]|(?!\4)[rs]))

@szymonlopaciuk

This comment has been minimized.

Show comment Hide comment
@szymonlopaciuk

szymonlopaciuk Feb 6, 2014

Six points more in glob (371)

^[bpwl]|\*.*\*.{16}(?!t?i)|rr|tic$|tly|de|^c[orh]

Six points more in glob (371)

^[bpwl]|\*.*\*.{16}(?!t?i)|rr|tic$|tly|de|^c[orh]
@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 14, 2014

474 point robust solution for Triples below (at the very bottom), but first I'd like to give a brief history of how I got there.

I was linked to the Regex Golf game two days ago, and at first was pretty clueless, having only worked with relatively basic regex syntax. It was quite a thrill to discover how powerful it is with the ?= and ?! operators, which I had never used before. Here's what I came up with independently (competing against two friends, but only sharing our scores, not strings) before deciding today to allow myself to see other people's solutions:

  1. Plain strings: foo (207)
  2. Anchors: k$ (208)
  3. Ranges: ^[a-f]*$ (202)
  4. Backrefs: (...).*\1 (201)
  5. Abba: ^((?!(.)(.)\3\2).)*$ (190)
  6. A man, a plan: ^(.)(.).*\2\1$ (176)
  7. Prime: ^(?!(xx+)\1+$) (286), and before that, ^(?!((xx){2,}|(xxx){3,}|x{25})$) (268)
  8. Four: (.)(.\1){3} (199)
  9. Order: ^[a-m].{1,5}$ (197)
  10. Triples: 9[45]7|.17|55|91(04|48)|00($|[369]|12|15)|[254]4(.*[^3])?$ (572)
  11. Glob: ^((\w*)( .+ \2$|\*(\w*)( .+ \2.+\4$|\*.* .+ \2.+\4.))) (366)
  12. Balance: ^(<(<(<(<(<(<<>>)*>)*>)*>)*>)*>)*$ (286)
  13. Powers: ^(?!(x(xx)+)\1*$) (93), and before that, ^(((x|x{16}|x{256})\3?)\2?)\1?$ (79)
  14. Long count: ^((....\b) ?(?!.*\2))+$ (247)
  15. Long count v2: ^((.*).1* (?=\2.0*\b))+1+$ (244)
  16. Alphabetical: ^((.*)(a.*(?=\2[e-t].*)|e.*(?=\2[n-t].*)|n.*(?=\2[r-t].*)|r.*(?=\2[s-t].*)|s.*(?=\2t.*)| (?=\2))\b){6} (238)
    Total: 3912

At this point I'm more interested in general solutions than cheaty solutions that happen to match the tests. I find the Prime and Powers solutions to be rather beautiful, and they happen to be the only nontrivial ones out of the list that are both general solutions and are the highest-scoring anybody's come up with.

Here's my robust 403 point solution for Triples which both works under Javascript/ECMA and on other regex parsers:

^((?=(([^258]*[258]){3})*[^258]*$)(([^147]*[147]){3})*|(?=(([^258]*[258]){3})*[^258]*[258][^258]*$)(([^147]*[147]){3})*[^147]*[147]|(?=(([^258]*[258]){3})*([^258]*[258]){2}[^258]*$)(([^147]*[147]){3})*([^147]*[147]){2})[^147]*$

pretty-printed:

^(
    (?=(([^258]*[258]){3})*[^258]*$)
       (([^147]*[147]){3})*
|
    (?=(([^258]*[258]){3})*[^258]*[258][^258]*$)
       (([^147]*[147]){3})*[^147]*[147]
|
    (?=(([^258]*[258]){3})*([^258]*[258]){2}[^258]*$)
       (([^147]*[147]){3})*([^147]*[147]){2}
)[^147]*$

This works by checking to see if the number of [147] digits modulo 3 equals the number of [258] digits modulo 3. I found this property holds if and only if a sequence of digits represents a multiple of 3. This solution is two characters shorter than the finite state machine solution (which I only allowed myself to see once I had beaten its score of 401), and the only assumptions it makes are that each number has a line to itself (which the FSM solution also assumes), and that there are no non-digit characters. (Removing both assumptions would be trivial, the latter by prefixing the entire construction with (?=\d+$) right after the ^, and the former by sandwiching it between two \b instead of ^ and $, but I feel that this is unnecessary.)

It's a bit annoying that a separate clause has to be used for each number of digits modulo three (0, 1, and 2), so I did some digging. I found out that in perl's flavor of regex, there's a syntax for conditionals, which allows shortening this solution considerably:

^(?=(([^258]*[258]){3})*([^258]*[258]([^258]*[258])?)?[^258]*$)(([^147]*[147]){3})*(?(3)[^147]*[147](?(4)[^147]*[147]))[^147]*$

pretty-printed (all whitespace must be removed to make it work):

^(?=(([^258]*[258]){3})*    ([^258]*[258]    ([^258]*[258])?)?[^258]*$)
    (([^147]*[147]){3})*(?(3)[^147]*[147](?(4)[^147]*[147]))  [^147]*$

EDIT: whittled down even further by using regex subroutines (also a perl/PCRE specific feature):

^(?=((?P<B>[^258]*[258]){3})*((?P>B)((?P>B))?)?[^258]*$)((?P<A>[^147]*[147]){3})*(?(3)(?P>A)(?(4)(?P>A)))[^147]*$

^(?=((?P<B>[^258]*[258]){3})*    ((?P>B)    ((?P>B))?)?[^258]*$)
    ((?P<A>[^147]*[147]){3})*(?(3)(?P>A)(?(4)(?P>A)))  [^147]*$

Then I did some more digging, trying to find a way to emulate conditionals. It turns out there is a way to emulate them, but it doesn't work for ECMA! But then I realized that the very flaw that prevents that from working in ECMA (backreferences to non-participating capturing groups always succeed, matching an empty string) could be turned to my advantage.

So here's my 474 point robust solution that whittles down the 403-point solution by taking advantage of that ECMA quirk to emulate conditionals (kind of):

^(?=((([^258]*[258]){3})*[^258]*))(?=((\1)|((\1[258][^258]*)|(\1([258][^258]*){2})))$)(([^147]*[147]){3})*(\6\6|\5[^147]*[147](\8\8|\7[^147]*[147]))[^147]*$

pretty-printed (all whitespace must be removed to make it work):

^(?=((([^258]*[258]){3})*[^258]*))
 (?=((\1) | ((\1[258][^258]*) | (\1([258][^258]*){2})))$)
 (([^147]*[147]){3})* (\6\6 | \5[^147]*[147](\8\8 | \7[^147]*[147])) [^147]*$

Basically, I make sure that only one choice in a (choice1|choice2) block can succeed, by making one of the choices impossible to match — by filling it with a pattern longer than the current full line (matched by the lookahead clauses) — while the only modification to the other choice is the concatenation of an empty backreference, leaving it unchanged.

474 point robust solution for Triples below (at the very bottom), but first I'd like to give a brief history of how I got there.

I was linked to the Regex Golf game two days ago, and at first was pretty clueless, having only worked with relatively basic regex syntax. It was quite a thrill to discover how powerful it is with the ?= and ?! operators, which I had never used before. Here's what I came up with independently (competing against two friends, but only sharing our scores, not strings) before deciding today to allow myself to see other people's solutions:

  1. Plain strings: foo (207)
  2. Anchors: k$ (208)
  3. Ranges: ^[a-f]*$ (202)
  4. Backrefs: (...).*\1 (201)
  5. Abba: ^((?!(.)(.)\3\2).)*$ (190)
  6. A man, a plan: ^(.)(.).*\2\1$ (176)
  7. Prime: ^(?!(xx+)\1+$) (286), and before that, ^(?!((xx){2,}|(xxx){3,}|x{25})$) (268)
  8. Four: (.)(.\1){3} (199)
  9. Order: ^[a-m].{1,5}$ (197)
  10. Triples: 9[45]7|.17|55|91(04|48)|00($|[369]|12|15)|[254]4(.*[^3])?$ (572)
  11. Glob: ^((\w*)( .+ \2$|\*(\w*)( .+ \2.+\4$|\*.* .+ \2.+\4.))) (366)
  12. Balance: ^(<(<(<(<(<(<<>>)*>)*>)*>)*>)*>)*$ (286)
  13. Powers: ^(?!(x(xx)+)\1*$) (93), and before that, ^(((x|x{16}|x{256})\3?)\2?)\1?$ (79)
  14. Long count: ^((....\b) ?(?!.*\2))+$ (247)
  15. Long count v2: ^((.*).1* (?=\2.0*\b))+1+$ (244)
  16. Alphabetical: ^((.*)(a.*(?=\2[e-t].*)|e.*(?=\2[n-t].*)|n.*(?=\2[r-t].*)|r.*(?=\2[s-t].*)|s.*(?=\2t.*)| (?=\2))\b){6} (238)
    Total: 3912

At this point I'm more interested in general solutions than cheaty solutions that happen to match the tests. I find the Prime and Powers solutions to be rather beautiful, and they happen to be the only nontrivial ones out of the list that are both general solutions and are the highest-scoring anybody's come up with.

Here's my robust 403 point solution for Triples which both works under Javascript/ECMA and on other regex parsers:

^((?=(([^258]*[258]){3})*[^258]*$)(([^147]*[147]){3})*|(?=(([^258]*[258]){3})*[^258]*[258][^258]*$)(([^147]*[147]){3})*[^147]*[147]|(?=(([^258]*[258]){3})*([^258]*[258]){2}[^258]*$)(([^147]*[147]){3})*([^147]*[147]){2})[^147]*$

pretty-printed:

^(
    (?=(([^258]*[258]){3})*[^258]*$)
       (([^147]*[147]){3})*
|
    (?=(([^258]*[258]){3})*[^258]*[258][^258]*$)
       (([^147]*[147]){3})*[^147]*[147]
|
    (?=(([^258]*[258]){3})*([^258]*[258]){2}[^258]*$)
       (([^147]*[147]){3})*([^147]*[147]){2}
)[^147]*$

This works by checking to see if the number of [147] digits modulo 3 equals the number of [258] digits modulo 3. I found this property holds if and only if a sequence of digits represents a multiple of 3. This solution is two characters shorter than the finite state machine solution (which I only allowed myself to see once I had beaten its score of 401), and the only assumptions it makes are that each number has a line to itself (which the FSM solution also assumes), and that there are no non-digit characters. (Removing both assumptions would be trivial, the latter by prefixing the entire construction with (?=\d+$) right after the ^, and the former by sandwiching it between two \b instead of ^ and $, but I feel that this is unnecessary.)

It's a bit annoying that a separate clause has to be used for each number of digits modulo three (0, 1, and 2), so I did some digging. I found out that in perl's flavor of regex, there's a syntax for conditionals, which allows shortening this solution considerably:

^(?=(([^258]*[258]){3})*([^258]*[258]([^258]*[258])?)?[^258]*$)(([^147]*[147]){3})*(?(3)[^147]*[147](?(4)[^147]*[147]))[^147]*$

pretty-printed (all whitespace must be removed to make it work):

^(?=(([^258]*[258]){3})*    ([^258]*[258]    ([^258]*[258])?)?[^258]*$)
    (([^147]*[147]){3})*(?(3)[^147]*[147](?(4)[^147]*[147]))  [^147]*$

EDIT: whittled down even further by using regex subroutines (also a perl/PCRE specific feature):

^(?=((?P<B>[^258]*[258]){3})*((?P>B)((?P>B))?)?[^258]*$)((?P<A>[^147]*[147]){3})*(?(3)(?P>A)(?(4)(?P>A)))[^147]*$

^(?=((?P<B>[^258]*[258]){3})*    ((?P>B)    ((?P>B))?)?[^258]*$)
    ((?P<A>[^147]*[147]){3})*(?(3)(?P>A)(?(4)(?P>A)))  [^147]*$

Then I did some more digging, trying to find a way to emulate conditionals. It turns out there is a way to emulate them, but it doesn't work for ECMA! But then I realized that the very flaw that prevents that from working in ECMA (backreferences to non-participating capturing groups always succeed, matching an empty string) could be turned to my advantage.

So here's my 474 point robust solution that whittles down the 403-point solution by taking advantage of that ECMA quirk to emulate conditionals (kind of):

^(?=((([^258]*[258]){3})*[^258]*))(?=((\1)|((\1[258][^258]*)|(\1([258][^258]*){2})))$)(([^147]*[147]){3})*(\6\6|\5[^147]*[147](\8\8|\7[^147]*[147]))[^147]*$

pretty-printed (all whitespace must be removed to make it work):

^(?=((([^258]*[258]){3})*[^258]*))
 (?=((\1) | ((\1[258][^258]*) | (\1([258][^258]*){2})))$)
 (([^147]*[147]){3})* (\6\6 | \5[^147]*[147](\8\8 | \7[^147]*[147])) [^147]*$

Basically, I make sure that only one choice in a (choice1|choice2) block can succeed, by making one of the choices impossible to match — by filling it with a pattern longer than the current full line (matched by the lookahead clauses) — while the only modification to the other choice is the concatenation of an empty backreference, leaving it unchanged.

@bbarry

This comment has been minimized.

Show comment Hide comment
@bbarry

bbarry Feb 14, 2014

If anyone was wondering, my search (for shortest exact solution on triples) through all alternations of sequences of set items from the set generated by the powerset of [0-9] with the repetition modifiers {a,b} where 0 <= a <= b <= 9 (basically counting through a space containing 14k items) is currently around the sequences *, 7500: [03568]+, 1099: [^059]+ (I say around because it is doing about 200k per second and is operating on 8 sequences at the same time; each thread doing the whole round of 14k before taking the next available one from the queue) which in that case are the 3 item regexes ending in [03568]+[^059]+.

So far the best it has done is 538 - [^07]{4}[47]|[036]{3}$|009|9...$ (I constrained it to search for solutions shorter than 36 characters).

My best that matches for case 5 (0012) and 6 (0015) is 536 - [03-6]{2}[^07][^0146]|[07][^3578]$. These cases are special in that they require a more complex expression than the rest of them (all others can be solved by 2 item sequences where these require 3 items to match). I should be through the 3 item sequences (which I know at least contains a 596 point solution that will cross) in another month or two (based on how long it ran to get to 0,0,1k and a coded theory that as it continues to find better matches it should move through the space faster by being able to discard potential sets earlier). At that point (exhausting the 3 item space) I will be uploading the program for anyone to play with ( ~500 lines C# code+mssql storage, could easily be converted to something else), potentially with the (currently 22k rows) table of best exact matches for subsets of the problem or the (currently 4k rows) best subset-robust rows (the former contains things that are over specified but were best found for that particular subset while the latter contains only rows that could potentially be in a solution). Or I can add the subset of the latter row set that does not contain alternations.

bbarry commented Feb 14, 2014

If anyone was wondering, my search (for shortest exact solution on triples) through all alternations of sequences of set items from the set generated by the powerset of [0-9] with the repetition modifiers {a,b} where 0 <= a <= b <= 9 (basically counting through a space containing 14k items) is currently around the sequences *, 7500: [03568]+, 1099: [^059]+ (I say around because it is doing about 200k per second and is operating on 8 sequences at the same time; each thread doing the whole round of 14k before taking the next available one from the queue) which in that case are the 3 item regexes ending in [03568]+[^059]+.

So far the best it has done is 538 - [^07]{4}[47]|[036]{3}$|009|9...$ (I constrained it to search for solutions shorter than 36 characters).

My best that matches for case 5 (0012) and 6 (0015) is 536 - [03-6]{2}[^07][^0146]|[07][^3578]$. These cases are special in that they require a more complex expression than the rest of them (all others can be solved by 2 item sequences where these require 3 items to match). I should be through the 3 item sequences (which I know at least contains a 596 point solution that will cross) in another month or two (based on how long it ran to get to 0,0,1k and a coded theory that as it continues to find better matches it should move through the space faster by being able to discard potential sets earlier). At that point (exhausting the 3 item space) I will be uploading the program for anyone to play with ( ~500 lines C# code+mssql storage, could easily be converted to something else), potentially with the (currently 22k rows) table of best exact matches for subsets of the problem or the (currently 4k rows) best subset-robust rows (the former contains things that are over specified but were best found for that particular subset while the latter contains only rows that could potentially be in a solution). Or I can add the subset of the latter row set that does not contain alternations.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 15, 2014

Here's a 277 point optimization of AlanDeSmet's "Pure" Alphabetical solution that is just as robust, as long as [aenrst] is considered to be the full set of letters: ^(?!.*( .*)(t.*\1[a-s]|s.*\1[a-r]|r.*\1[a-n]|n.*\1[ae]|e.*\1a))

With the full alphabet, it's 55 points: ^(?!.*( .*)(b.*\1[a]|c.*\1[ab]|d.*\1[a-c]|e.*\1[a-d]|f.*\1[a-e]|g.*\1[a-f]|h.*\1[a-g]|i.*\1[a-h]|j.*\1[a-i]|k.*\1[a-j]|l.*\1[a-k]|m.*\1[a-l]|n.*\1[a-m]|o.*\1[a-n]|p.*\1[a-o]|q.*\1[a-p]|r.*\1[a-q]|s.*\1[a-r]|t.*\1[a-s]|u.*\1[a-t]|v.*\1[a-u]|w.*\1[a-v]|x.*\1[a-w]|y.*\1[a-x]|z.*\1[a-y]))

And here's a 297 point optimization of dcwarwick's exact solution (of academic interest only, since it's beaten by alexandrosm's 317-pointer): ^(?!.*( .*)(t.*\1[es]|s.*\1[nr]|r.*\1[en]))

(EDIT: I made a little mistake here. My optimization of replacing "\b" with " " resulted in lines whose only out-of-order words were the first pair in the line getting a false positive. The tests in the game never have any required non-matching string whose only out-of-order words are the first two in the line, which allowed the bug to get through. I'm leaving the mistake intact in this post.)

Here's a 277 point optimization of AlanDeSmet's "Pure" Alphabetical solution that is just as robust, as long as [aenrst] is considered to be the full set of letters: ^(?!.*( .*)(t.*\1[a-s]|s.*\1[a-r]|r.*\1[a-n]|n.*\1[ae]|e.*\1a))

With the full alphabet, it's 55 points: ^(?!.*( .*)(b.*\1[a]|c.*\1[ab]|d.*\1[a-c]|e.*\1[a-d]|f.*\1[a-e]|g.*\1[a-f]|h.*\1[a-g]|i.*\1[a-h]|j.*\1[a-i]|k.*\1[a-j]|l.*\1[a-k]|m.*\1[a-l]|n.*\1[a-m]|o.*\1[a-n]|p.*\1[a-o]|q.*\1[a-p]|r.*\1[a-q]|s.*\1[a-r]|t.*\1[a-s]|u.*\1[a-t]|v.*\1[a-u]|w.*\1[a-v]|x.*\1[a-w]|y.*\1[a-x]|z.*\1[a-y]))

And here's a 297 point optimization of dcwarwick's exact solution (of academic interest only, since it's beaten by alexandrosm's 317-pointer): ^(?!.*( .*)(t.*\1[es]|s.*\1[nr]|r.*\1[en]))

(EDIT: I made a little mistake here. My optimization of replacing "\b" with " " resulted in lines whose only out-of-order words were the first pair in the line getting a false positive. The tests in the game never have any required non-matching string whose only out-of-order words are the first two in the line, which allowed the bug to get through. I'm leaving the mistake intact in this post.)

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 15, 2014

477 point robust Triples solution that still relies on ECMA non-participating capture group behavior:

^(?=(([^258]*[258]){3})*[^258]*([258][^258]*([258][^258]*)?)?$)(([^147]*[147]){3})*(?=([^147]*))\7((?!\3)(?=([147][^147]*))\9((?!\4)[147][^147]*|\4)|\3)$

^(?=(([^258]*[258]){3})*    [^258]*              ([258][^258]*          ([258][^258]* )?  )? $)
    (([^147]*[147]){3})*(?=([^147]*))\7((?!\3)(?=([147][^147]*))\9((?!\4)[147][^147]*|\4)|\3)$

This solution emulates non-backtracking atomic groups.

And here it is edited to be fully compatible with both ECMA-compliant and non-ECMA regex implementations (by eliminating the use of non-participating capture groups), yielding a 472 point solution:

^(?=(([^258]*[258]){3})*[^258]*(([258][^258]*(([258][^258]*)?))?)$)(([^147]*[147]){3})*(?=([^147]*))\9((?!\3)(?=([147][^147]*))\11((?!\5)[147][^147]*|\5)|\3)$

^(?=(([^258]*[258]){3})*    [^258]*             (([258][^258]*          (([258][^258]* )?) )?)$)
    (([^147]*[147]){3})*(?=([^147]*))\9((?!\3)(?=([147][^147]*))\11((?!\5)[147][^147]*|\5)|\3)$

And, non-ECMA-only version, with emulated conditionals, relying on non-ECMA NPCG behavior:

^(?=(([^258]*[258]){3})*(()[^258]*[258](()[^258]*[258]|())|())[^258]*$)(([^147]*[147]){3})*(\4[^147]*[147](\6[^147]*[147]|\7)|\8)[^147]*$

^(?=(([^258]*[258]){3})*(()[^258]*[258](()[^258]*[258]|())|())[^258]*$)
    (([^147]*[147]){3})*(\4[^147]*[147](\6[^147]*[147]|\7)|\8)[^147]*$

477 point robust Triples solution that still relies on ECMA non-participating capture group behavior:

^(?=(([^258]*[258]){3})*[^258]*([258][^258]*([258][^258]*)?)?$)(([^147]*[147]){3})*(?=([^147]*))\7((?!\3)(?=([147][^147]*))\9((?!\4)[147][^147]*|\4)|\3)$

^(?=(([^258]*[258]){3})*    [^258]*              ([258][^258]*          ([258][^258]* )?  )? $)
    (([^147]*[147]){3})*(?=([^147]*))\7((?!\3)(?=([147][^147]*))\9((?!\4)[147][^147]*|\4)|\3)$

This solution emulates non-backtracking atomic groups.

And here it is edited to be fully compatible with both ECMA-compliant and non-ECMA regex implementations (by eliminating the use of non-participating capture groups), yielding a 472 point solution:

^(?=(([^258]*[258]){3})*[^258]*(([258][^258]*(([258][^258]*)?))?)$)(([^147]*[147]){3})*(?=([^147]*))\9((?!\3)(?=([147][^147]*))\11((?!\5)[147][^147]*|\5)|\3)$

^(?=(([^258]*[258]){3})*    [^258]*             (([258][^258]*          (([258][^258]* )?) )?)$)
    (([^147]*[147]){3})*(?=([^147]*))\9((?!\3)(?=([147][^147]*))\11((?!\5)[147][^147]*|\5)|\3)$

And, non-ECMA-only version, with emulated conditionals, relying on non-ECMA NPCG behavior:

^(?=(([^258]*[258]){3})*(()[^258]*[258](()[^258]*[258]|())|())[^258]*$)(([^147]*[147]){3})*(\4[^147]*[147](\6[^147]*[147]|\7)|\8)[^147]*$

^(?=(([^258]*[258]){3})*(()[^258]*[258](()[^258]*[258]|())|())[^258]*$)
    (([^147]*[147]){3})*(\4[^147]*[147](\6[^147]*[147]|\7)|\8)[^147]*$
@vulcansmith

This comment has been minimized.

Show comment Hide comment
@vulcansmith

vulcansmith Feb 15, 2014

Simpler for ranges with same score: [a-f]{4} (202)

Simpler for ranges with same score: [a-f]{4} (202)

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 16, 2014

teukon commented

My robust solution to Alphabetical (282):

^(?!.*\b(.*)(e|(n|r|(s|t))).* \1(a|(?!\3)[en]|(?!\4)[rs]))

This is brilliant. Somehow I missed it in my first pass through the thread, but it is indeed robust, and is the shortest robust solution anybody's found, as far as I know. What teukon didn't mention is that this solution is ECMA-only, since it relies on non-participating capture groups always being a match for an empty string when backreferenced. In negative lookahead, this results in a NPCG always being a non-match, since (?!) is always a non-match (the empty string is "everywhere", to the left and right of every character, so there is nowhere that the empty string is not a match).

Here is a whole-alphabet version of it (107 points): ^(?!.*\b(.*)(b|(c|d|(e|f|(g|h|(i|j|(k|l|(m|n|(o|p|(q|r|(s|t|(u|v|(w|x|(y|z))))))))))))).* \1(a|(?!\3)[bc]|(?!\4)[de]|(?!\5)[fg]|(?!\6)[hi]|(?!\7)[jk]|(?!\8)[lm]|(?!\9)[no]|(?!\10)[pq]|(?!\11)[rs]|(?!\12)[tu]|(?!\13)[vw]|(?!\14)[xy]))

teukon commented

My robust solution to Alphabetical (282):

^(?!.*\b(.*)(e|(n|r|(s|t))).* \1(a|(?!\3)[en]|(?!\4)[rs]))

This is brilliant. Somehow I missed it in my first pass through the thread, but it is indeed robust, and is the shortest robust solution anybody's found, as far as I know. What teukon didn't mention is that this solution is ECMA-only, since it relies on non-participating capture groups always being a match for an empty string when backreferenced. In negative lookahead, this results in a NPCG always being a non-match, since (?!) is always a non-match (the empty string is "everywhere", to the left and right of every character, so there is nowhere that the empty string is not a match).

Here is a whole-alphabet version of it (107 points): ^(?!.*\b(.*)(b|(c|d|(e|f|(g|h|(i|j|(k|l|(m|n|(o|p|(q|r|(s|t|(u|v|(w|x|(y|z))))))))))))).* \1(a|(?!\3)[bc]|(?!\4)[de]|(?!\5)[fg]|(?!\6)[hi]|(?!\7)[jk]|(?!\8)[lm]|(?!\9)[no]|(?!\10)[pq]|(?!\11)[rs]|(?!\12)[tu]|(?!\13)[vw]|(?!\14)[xy]))

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 16, 2014

Thanks. I guess my solution was easy to miss because it doesn't look robust at first glance. I didn't mention the qwerk but only because it's a qwerk of the ECMA standard rather than a particular implementation. It is worth mentioning though if we're considering solutions for different engines.

Just for variety, here is my own whole-alphabet generalisation. It only scores 95 but it allows for words of different lengths (i.e. a generalisation to strings of the form ^([a-z]+( |$))*$).

^(?!.*\b(.*)(a|b|(c|d|(e|f|(g|h|(i|j|(k|l|(m|n|(o|p|(q|r|(s|t|(u|v|(w|x|(y|z))))))))))))).* \1( |$|(?!\2)a|(?!\3)[bc]|(?!\4)[de]|(?!\5)[fg]|(?!\6)[hi]|(?!\7)[jk]|(?!\8)[lm]|(?!\9)[no]|(?!\10)[pq]|(?!\11)[rs]|(?!\12)[tu]|(?!\13)[vw]|(?!\14)[xy]))

I'll have to study your robust Triples solution in more detail. I also missed the finite state approach at first and was satisfied with simply checking that the number of matches of [147] and of [258] were equal modulo 3. However, I only scored 473 so well done!

^(?=[^147]*([147]?)[^147]*([147]?)[^147]*(([147][^147]*){3})*$)[^258]*((?=.*$\1)|(?!.*$\1)[258])[^258]*((?=.*$\2)|(?!.*$\2)[258])[^258]*(([258][^258]*){3})*$

My robust solution for "Long count" scores 227 (not posted here). Please post your solution if you beat this.

I've also thought up a few custom Regex Golf challenges with a focus on robustness and solution-elegance. Let me know if you're interested (or if you have some puzzles of your own to contribute).

teukon commented Feb 16, 2014

Thanks. I guess my solution was easy to miss because it doesn't look robust at first glance. I didn't mention the qwerk but only because it's a qwerk of the ECMA standard rather than a particular implementation. It is worth mentioning though if we're considering solutions for different engines.

Just for variety, here is my own whole-alphabet generalisation. It only scores 95 but it allows for words of different lengths (i.e. a generalisation to strings of the form ^([a-z]+( |$))*$).

^(?!.*\b(.*)(a|b|(c|d|(e|f|(g|h|(i|j|(k|l|(m|n|(o|p|(q|r|(s|t|(u|v|(w|x|(y|z))))))))))))).* \1( |$|(?!\2)a|(?!\3)[bc]|(?!\4)[de]|(?!\5)[fg]|(?!\6)[hi]|(?!\7)[jk]|(?!\8)[lm]|(?!\9)[no]|(?!\10)[pq]|(?!\11)[rs]|(?!\12)[tu]|(?!\13)[vw]|(?!\14)[xy]))

I'll have to study your robust Triples solution in more detail. I also missed the finite state approach at first and was satisfied with simply checking that the number of matches of [147] and of [258] were equal modulo 3. However, I only scored 473 so well done!

^(?=[^147]*([147]?)[^147]*([147]?)[^147]*(([147][^147]*){3})*$)[^258]*((?=.*$\1)|(?!.*$\1)[258])[^258]*((?=.*$\2)|(?!.*$\2)[258])[^258]*(([258][^258]*){3})*$

My robust solution for "Long count" scores 227 (not posted here). Please post your solution if you beat this.

I've also thought up a few custom Regex Golf challenges with a focus on robustness and solution-elegance. Let me know if you're interested (or if you have some puzzles of your own to contribute).

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 17, 2014

teukon, your Triples solution gave me an idea for how to get 477 points without needing to use non-participating capture groups! (This makes the previous 477 and 472 point solutions obsolete.)

^(?=(([^258]*[258]){3})*[^258]*([258]?)[^258]*([258]?)[^258]*$)(([^147]*[147]){3})*(?=([^147]*))\7((?!\3)(?=([147][^147]*))\9((?!\4)[147][^147]*|\4)|\3)$

^(?=(([^258]*[258]){3})*    [^258]*              ([258]?)[^258]*          ([258]?)[^258]*        $)
    (([^147]*[147]){3})*(?=([^147]*))\7((?!\3)(?=([147]  [^147]*))\9((?!\4)[147]  [^147]*|\4)|\3)$

And, very cool that we both came up with the same idea. (Note, I did try the FSM approach first, but didn't finish it, because it was getting very long, which made me stop and try alternative ways of looking at the problem.)

And, I am indeed interested in your custom challenges.

teukon, your Triples solution gave me an idea for how to get 477 points without needing to use non-participating capture groups! (This makes the previous 477 and 472 point solutions obsolete.)

^(?=(([^258]*[258]){3})*[^258]*([258]?)[^258]*([258]?)[^258]*$)(([^147]*[147]){3})*(?=([^147]*))\7((?!\3)(?=([147][^147]*))\9((?!\4)[147][^147]*|\4)|\3)$

^(?=(([^258]*[258]){3})*    [^258]*              ([258]?)[^258]*          ([258]?)[^258]*        $)
    (([^147]*[147]){3})*(?=([^147]*))\7((?!\3)(?=([147]  [^147]*))\9((?!\4)[147]  [^147]*|\4)|\3)$

And, very cool that we both came up with the same idea. (Note, I did try the FSM approach first, but didn't finish it, because it was getting very long, which made me stop and try alternative ways of looking at the problem.)

And, I am indeed interested in your custom challenges.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 17, 2014

Clever! I'd not thought of using the greediness of the * operator by placing it in a positive look-ahead. After matching as far as \7, the next character (if there is one) is guaranteed to be a 2, 5, or 8. However, unless I'm misunderstanding something, that does suggest an automatic reduction of your solution by replacing the two [258] terms towards the end of your regex with .. Thus

Triples (485):
^(?=(([^258]*[258]){3})*[^258]*([258]?)[^258]*([258]?)[^258]*$)(([^147]*[147]){3})*(?=([^147]*))\7((?!\3)(?=(.[^147]*))\9((?!\4).[^147]*|\4)|\3)$

I used the trick to improve my own solution which now also scores 485. I include it here only because I feel the lack of nesting at the end gives the regex more symmetry.

Triples (485):
^(?=[^147]*([147]?)[^147]*([147]?)[^147]*(([147][^147]*){3})*$)(?=([^258]*))\5(\1|(?!\1).)(?=([^258]*))\7(\2|(?!\2).)[^258]*(([258][^258]*){3})*$

^(?=      [^147]*      ([147]?)          [^147]*      ([147]?)      [^147]*(([147][^147]*){3})*$)
      (?=([^258]*))\5  (\1|(?!\1).)  (?=([^258]*))\7  (\2|(?!\2).)  [^258]*(([258][^258]*){3})*$

teukon commented Feb 17, 2014

Clever! I'd not thought of using the greediness of the * operator by placing it in a positive look-ahead. After matching as far as \7, the next character (if there is one) is guaranteed to be a 2, 5, or 8. However, unless I'm misunderstanding something, that does suggest an automatic reduction of your solution by replacing the two [258] terms towards the end of your regex with .. Thus

Triples (485):
^(?=(([^258]*[258]){3})*[^258]*([258]?)[^258]*([258]?)[^258]*$)(([^147]*[147]){3})*(?=([^147]*))\7((?!\3)(?=(.[^147]*))\9((?!\4).[^147]*|\4)|\3)$

I used the trick to improve my own solution which now also scores 485. I include it here only because I feel the lack of nesting at the end gives the regex more symmetry.

Triples (485):
^(?=[^147]*([147]?)[^147]*([147]?)[^147]*(([147][^147]*){3})*$)(?=([^258]*))\5(\1|(?!\1).)(?=([^258]*))\7(\2|(?!\2).)[^258]*(([258][^258]*){3})*$

^(?=      [^147]*      ([147]?)          [^147]*      ([147]?)      [^147]*(([147][^147]*){3})*$)
      (?=([^258]*))\5  (\1|(?!\1).)  (?=([^258]*))\7  (\2|(?!\2).)  [^258]*(([258][^258]*){3})*$
@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 17, 2014

As for my custom challenges, I've started to turn the ideas into playable custom levels. I'll be keeping notes of my progress here: https://gist.github.com/teukon/9060863. If you have any ideas for puzzles, or any ways in which my puzzles or sample sets of strings can be improved, please comment there.

teukon commented Feb 17, 2014

As for my custom challenges, I've started to turn the ideas into playable custom levels. I'll be keeping notes of my progress here: https://gist.github.com/teukon/9060863. If you have any ideas for puzzles, or any ways in which my puzzles or sample sets of strings can be improved, please comment there.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 18, 2014

Oh wow, nice optimization! That can be done to the other half, too, yielding 493 points...

^(?=(([^258]*[258]){3})*[^258]*(.?)[^258]*(.?)[^258]*$)(([^147]*[147]){3})*(?=([^147]*))\7((?!\3).|\3)(?=([^147]*))\9((?!\4).|\4)[^147]*$

^(?=(([^258]*[258]){3})*     [^258]*           (.?)       [^258]*           (.?)   [^258]*$)
    (([^147]*[147]){3})* (?=([^147]*))\7 ((?!\3).|\3) (?=([^147]*))\9 ((?!\4).|\4) [^147]*$

and with your ordering:

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)(?=([^258]*))\5(\1|(?!\1).)(?=([^258]*))\7(\2|(?!\2).)[^258]*(([258][^258]*){3})*$

^(?=     [^147]*     (.?)             [^147]*     (.?)         [^147]*(([147][^147]*){3})*$)
     (?=([^258]*))\5 (\1|(?!\1).) (?=([^258]*))\7 (\2|(?!\2).) [^258]*(([258][^258]*){3})*$

and here's another way to do it, with the same length:

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)(?=([^258]*))\5(?=(\1|.))\6(?=([^258]*))\7(?=(\2|.))\8[^258]*(([258][^258]*){3})*$

^(?=     [^147]*           (.?)       [^147]*           (.?)   [^147]*(([147][^147]*){3})*$)
     (?=([^258]*))\5 (?=(\1|.))\6 (?=([^258]*))\7 (?=(\2|.))\8 [^258]*(([258][^258]*){3})*$

Oh wow, nice optimization! That can be done to the other half, too, yielding 493 points...

^(?=(([^258]*[258]){3})*[^258]*(.?)[^258]*(.?)[^258]*$)(([^147]*[147]){3})*(?=([^147]*))\7((?!\3).|\3)(?=([^147]*))\9((?!\4).|\4)[^147]*$

^(?=(([^258]*[258]){3})*     [^258]*           (.?)       [^258]*           (.?)   [^258]*$)
    (([^147]*[147]){3})* (?=([^147]*))\7 ((?!\3).|\3) (?=([^147]*))\9 ((?!\4).|\4) [^147]*$

and with your ordering:

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)(?=([^258]*))\5(\1|(?!\1).)(?=([^258]*))\7(\2|(?!\2).)[^258]*(([258][^258]*){3})*$

^(?=     [^147]*     (.?)             [^147]*     (.?)         [^147]*(([147][^147]*){3})*$)
     (?=([^258]*))\5 (\1|(?!\1).) (?=([^258]*))\7 (\2|(?!\2).) [^258]*(([258][^258]*){3})*$

and here's another way to do it, with the same length:

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)(?=([^258]*))\5(?=(\1|.))\6(?=([^258]*))\7(?=(\2|.))\8[^258]*(([258][^258]*){3})*$

^(?=     [^147]*           (.?)       [^147]*           (.?)   [^147]*(([147][^147]*){3})*$)
     (?=([^258]*))\5 (?=(\1|.))\6 (?=([^258]*))\7 (?=(\2|.))\8 [^258]*(([258][^258]*){3})*$
@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 18, 2014

I was playing around, and found a regex that I think exposes a bug in Opera (matching all triples and not matching any non-triple, i.e. functioning as a 493 point robust Triples solution, even though it shouldn't) that isn't in Firefox, Chrome, perl, or pcregrep:

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)[^258]*(.?)[^258]*(.?)[^258]*(([258][^258]*){3})*$(\1\5|(?!\1|\5))(\2\6|(?!\2|\6))

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)
    [^258]*(.?)[^258]*(.?)[^258]*(([258][^258]*){3})*$(\1\5|(?!\1|\5))
                                                      (\2\6|(?!\2|\6))

The bug is that Opera's JavaScript regex engine does not backtrack to satisfy the (?!\5) or (?!\6) if possible. This version works in all engines, but is worth only 486 points:

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)(?=[^258]*(.?)[^258]*(.?)[^258]*(([258][^258]*){3})*$).*$(\1\5|(?!\1|\5))(\2\6|(?!\2|\6))

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)
 (?=[^258]*(.?)[^258]*(.?)[^258]*(([258][^258]*){3})*$).*$(\1\5|(?!\1|\5))
                                                          (\2\6|(?!\2|\6))

I was playing around, and found a regex that I think exposes a bug in Opera (matching all triples and not matching any non-triple, i.e. functioning as a 493 point robust Triples solution, even though it shouldn't) that isn't in Firefox, Chrome, perl, or pcregrep:

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)[^258]*(.?)[^258]*(.?)[^258]*(([258][^258]*){3})*$(\1\5|(?!\1|\5))(\2\6|(?!\2|\6))

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)
    [^258]*(.?)[^258]*(.?)[^258]*(([258][^258]*){3})*$(\1\5|(?!\1|\5))
                                                      (\2\6|(?!\2|\6))

The bug is that Opera's JavaScript regex engine does not backtrack to satisfy the (?!\5) or (?!\6) if possible. This version works in all engines, but is worth only 486 points:

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)(?=[^258]*(.?)[^258]*(.?)[^258]*(([258][^258]*){3})*$).*$(\1\5|(?!\1|\5))(\2\6|(?!\2|\6))

^(?=[^147]*(.?)[^147]*(.?)[^147]*(([147][^147]*){3})*$)
 (?=[^258]*(.?)[^258]*(.?)[^258]*(([258][^258]*){3})*$).*$(\1\5|(?!\1|\5))
                                                          (\2\6|(?!\2|\6))
@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 18, 2014

Wow, more winnowing yielded a 504 point robust Triples solution. Might as well reject non-empty lines while we're at it: ^(?=(([^147]*[147]){3})*[^147]*(.?)[^147]*(.?))(?=(([^258]*[258]){3})*[^258]*(.?)[^258]*(.?)).+$(\3\7|\4\8(?!\3|\7)|(?!\4|\8))

^(?=(([^147]*[147]){3})*[^147]*(.?)[^147]*(.?))
 (?=(([^258]*[258]){3})*[^258]*(.?)[^258]*(.?))
 .+$ (\3\7 | \4\8(?!\3|\7) | (?!\4|\8))

This makes it cost very little to modify the meaning of "robust" to require excluding non-digits (503 points): ^(?=(([^147]*[147]){3})*[^147]*(.?)[^147]*(.?))(?=(([^258]*[258]){3})*[^258]*(.?)[^258]*(.?))\d+$(\3\7|\4\8(?!\3|\7)|(?!\4|\8))

^(?=(([^147]*[147]){3})*[^147]*(.?)[^147]*(.?))
 (?=(([^258]*[258]){3})*[^258]*(.?)[^258]*(.?))
 \d+$ (\3\7 | \4\8(?!\3|\7) | (?!\4|\8))

Wow, more winnowing yielded a 504 point robust Triples solution. Might as well reject non-empty lines while we're at it: ^(?=(([^147]*[147]){3})*[^147]*(.?)[^147]*(.?))(?=(([^258]*[258]){3})*[^258]*(.?)[^258]*(.?)).+$(\3\7|\4\8(?!\3|\7)|(?!\4|\8))

^(?=(([^147]*[147]){3})*[^147]*(.?)[^147]*(.?))
 (?=(([^258]*[258]){3})*[^258]*(.?)[^258]*(.?))
 .+$ (\3\7 | \4\8(?!\3|\7) | (?!\4|\8))

This makes it cost very little to modify the meaning of "robust" to require excluding non-digits (503 points): ^(?=(([^147]*[147]){3})*[^147]*(.?)[^147]*(.?))(?=(([^258]*[258]){3})*[^258]*(.?)[^258]*(.?))\d+$(\3\7|\4\8(?!\3|\7)|(?!\4|\8))

^(?=(([^147]*[147]){3})*[^147]*(.?)[^147]*(.?))
 (?=(([^258]*[258]){3})*[^258]*(.?)[^258]*(.?))
 \d+$ (\3\7 | \4\8(?!\3|\7) | (?!\4|\8))
@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 18, 2014

Ha. This isn't golf; this is tennis. What a beautiful improvement! Only 19 character longer than the FSM solution.

Personally, I dislike the idea of mixing syntax checking with value checking. I see each problem as having an implied domain just as well as an implied pattern. For Triples, the implied pattern that the strings to be matched are precisely the multiples of 3 is as strong as the implied domain that all string to be tested are of the form ^\d{9}$. I wouldn't see a regex that failed to handle 91 or twelve as any less robust. As you noted, our solutions do work for a very natural extended domain, ^\d+$, and there's something attractive about that, but I really feel this is better as an accompanying note and not a syntax-checking prelude.

Yes, it does seem that Opera is at fault here. I'm working against the ECMAScript language specification (sections 15.10.2.4 through 15.10.2.6 apply here: http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.2.4) so I'm not worried about browser bugs.

For those that don't want to dig through such dense definitions, let me clarify the relationship between operator greediness and positive look-ahead scope, with an example.

(1) ^(?=(x*)(x*))\1$ matches x;
(2) ^(?=(x*)(x*))\2$ does not match x.

This is because the greediness of the first * takes precedence over the greediness of the second *.

(3) ^(x*)(x*)\2$ matches x.

This happens because while the first * operator is greedy, it will look ahead and not be so greedy that it breaks a potential match. This same mechanic is in action in example (2), but an operator cannot see outside a positive look-ahead. Example (2) does not match due to the first * operators blind greed.

teukon commented Feb 18, 2014

Ha. This isn't golf; this is tennis. What a beautiful improvement! Only 19 character longer than the FSM solution.

Personally, I dislike the idea of mixing syntax checking with value checking. I see each problem as having an implied domain just as well as an implied pattern. For Triples, the implied pattern that the strings to be matched are precisely the multiples of 3 is as strong as the implied domain that all string to be tested are of the form ^\d{9}$. I wouldn't see a regex that failed to handle 91 or twelve as any less robust. As you noted, our solutions do work for a very natural extended domain, ^\d+$, and there's something attractive about that, but I really feel this is better as an accompanying note and not a syntax-checking prelude.

Yes, it does seem that Opera is at fault here. I'm working against the ECMAScript language specification (sections 15.10.2.4 through 15.10.2.6 apply here: http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.2.4) so I'm not worried about browser bugs.

For those that don't want to dig through such dense definitions, let me clarify the relationship between operator greediness and positive look-ahead scope, with an example.

(1) ^(?=(x*)(x*))\1$ matches x;
(2) ^(?=(x*)(x*))\2$ does not match x.

This is because the greediness of the first * takes precedence over the greediness of the second *.

(3) ^(x*)(x*)\2$ matches x.

This happens because while the first * operator is greedy, it will look ahead and not be so greedy that it breaks a potential match. This same mechanic is in action in example (2), but an operator cannot see outside a positive look-ahead. Example (2) does not match due to the first * operators blind greed.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 18, 2014

Tennis! Hah, yes, that is an apt analogy.

Just wanted to clarify something, in case it came across the wrong way:

This makes it cost very little to modify the meaning of "robust" to require excluding non-digits

What I meant was, "This would makes it cost very little to modify the meaning of "robust" to require excluding non-digits." (Or to put it another way, it would make it cost very little to modify the domain of the problem to include non-digits.) I wasn't advocating that we change the primary problem. I just find it interesting to analyze the more general problem as well, instead of limiting ourselves to solving only the one on http://regex.alf.nu/. You've done the same thing for Alphabetical. ;)

I also found it mildly annoying that the FSM solution rejects non-digits whereas ours didn't.

I'm working against the ECMAScript language specification (sections 15.10.2.4 through 15.10.2.6 apply here: http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.2.4)

How can you work solely against the specification? At some point of complexity you need to actually test your regexes on the computer, don't you? I've had to test mine extensively, just like when programming. It's very rare to write a program that works perfectly on the first run.

Even if I had complete knowledge of the specification, that wouldn't automatically mean I'd anticipate everything that it implies, in how a particular regex of sufficient complexity will be interpreted.

Only 19 character longer than the FSM solution.

What do you mean? What FSM solution are you looking at? Our best solution so far is 126 characters. The FSM solution is 229 characters — 103 chars longer than our solution, not 19 chars shorter.

Tennis! Hah, yes, that is an apt analogy.

Just wanted to clarify something, in case it came across the wrong way:

This makes it cost very little to modify the meaning of "robust" to require excluding non-digits

What I meant was, "This would makes it cost very little to modify the meaning of "robust" to require excluding non-digits." (Or to put it another way, it would make it cost very little to modify the domain of the problem to include non-digits.) I wasn't advocating that we change the primary problem. I just find it interesting to analyze the more general problem as well, instead of limiting ourselves to solving only the one on http://regex.alf.nu/. You've done the same thing for Alphabetical. ;)

I also found it mildly annoying that the FSM solution rejects non-digits whereas ours didn't.

I'm working against the ECMAScript language specification (sections 15.10.2.4 through 15.10.2.6 apply here: http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.2.4)

How can you work solely against the specification? At some point of complexity you need to actually test your regexes on the computer, don't you? I've had to test mine extensively, just like when programming. It's very rare to write a program that works perfectly on the first run.

Even if I had complete knowledge of the specification, that wouldn't automatically mean I'd anticipate everything that it implies, in how a particular regex of sufficient complexity will be interpreted.

Only 19 character longer than the FSM solution.

What do you mean? What FSM solution are you looking at? Our best solution so far is 126 characters. The FSM solution is 229 characters — 103 chars longer than our solution, not 19 chars shorter.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 18, 2014

Ok, it seems like we're thinking similarly on the domain issue so no worries.

By "working against the ECMAScript", I just mean that when implementations disagree or I don't understand something, I refer to these rules (as though they were a highly detailed cheat-sheet). I certainly allow myself to hack around a bit, but if I find unexpected behaviour, I like to understand what's going on.

Alok's FSM solution is far from optimal. With care, the expression can be simplified to the following 523-point solution.

^([0369]|[147][0369]*[258]|([258]|[147][0369]*[147])([0369]|[258][0369]*[147])*([147]|[258][0369]*[258]))*$

Even so, I'm learning a lot by trying to improve the modular arithmetic approach. I've still got a few ideas on how it can be improved.

teukon commented Feb 18, 2014

Ok, it seems like we're thinking similarly on the domain issue so no worries.

By "working against the ECMAScript", I just mean that when implementations disagree or I don't understand something, I refer to these rules (as though they were a highly detailed cheat-sheet). I certainly allow myself to hack around a bit, but if I find unexpected behaviour, I like to understand what's going on.

Alok's FSM solution is far from optimal. With care, the expression can be simplified to the following 523-point solution.

^([0369]|[147][0369]*[258]|([258]|[147][0369]*[147])([0369]|[258][0369]*[147])*([147]|[258][0369]*[258]))*$

Even so, I'm learning a lot by trying to improve the modular arithmetic approach. I've still got a few ideas on how it can be improved.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 18, 2014

Alok's FSM solution is far from optimal. With care, the expression can be simplified to the following 523-point solution.

^([0369]|[147][0369]*[258]|([258]|[147][0369]*[147])([0369]|[258][0369]*[147])*([147]|[258][0369]*[258]))*$

Wow, you say that so casually, as if it doesn't matter. This is big news, and makes me feel really silly for thinking that I had accomplished anything special. Was it not obvious from my posts that I thought I was holding the world record in shortest Triples regex? Even back when it was scoring only 403 points I thought I was.

I failed to find anything better than Alok's version in my previous searches. Now that you've given me the exact string, I can see that it's been published before. Do you know who did this simplification, and when?

It looks like Zingler meant to put it in this thread, but failed to escape it properly and didn't even bother to check that it got posted correctly. Because of the carelessness involved, I had previously assumed that his post was completely botched, and that the string was truncated as well as being mangled by stray characters being interpreted as formatting. I did not bother to try to reconstruct it, because I thought there was no way it could have been simplified to something so much short. (bbarry's follow-up post reinforced this assumption.) Well, I'll do it now.

When the one you posted just now is not escaped properly, it looks like this:

^([0369]|[147][0369][258]|([258]|[147][0369][147])([0369]|[258][0369][147])([147]|[258][0369][258]))$

The one in Zingler's post looks like this:

([0369]|[147][0369][258]|([258]|[147][0369][147])([0369]|[258][0369][147])([147]|[258][0369][258]))*

Quoting this properly, it becomes: ([0369]|[147][0369]*[258]|([258]|[147][0369]*[147])([0369]*|[258][0369]*[147])*([147]|[258][0369]*[258]))*
This is slightly different. It has an extra asterisk after one of the [0369], and is missing the anchors.

Alok's FSM solution is far from optimal. With care, the expression can be simplified to the following 523-point solution.

^([0369]|[147][0369]*[258]|([258]|[147][0369]*[147])([0369]|[258][0369]*[147])*([147]|[258][0369]*[258]))*$

Wow, you say that so casually, as if it doesn't matter. This is big news, and makes me feel really silly for thinking that I had accomplished anything special. Was it not obvious from my posts that I thought I was holding the world record in shortest Triples regex? Even back when it was scoring only 403 points I thought I was.

I failed to find anything better than Alok's version in my previous searches. Now that you've given me the exact string, I can see that it's been published before. Do you know who did this simplification, and when?

It looks like Zingler meant to put it in this thread, but failed to escape it properly and didn't even bother to check that it got posted correctly. Because of the carelessness involved, I had previously assumed that his post was completely botched, and that the string was truncated as well as being mangled by stray characters being interpreted as formatting. I did not bother to try to reconstruct it, because I thought there was no way it could have been simplified to something so much short. (bbarry's follow-up post reinforced this assumption.) Well, I'll do it now.

When the one you posted just now is not escaped properly, it looks like this:

^([0369]|[147][0369][258]|([258]|[147][0369][147])([0369]|[258][0369][147])([147]|[258][0369][258]))$

The one in Zingler's post looks like this:

([0369]|[147][0369][258]|([258]|[147][0369][147])([0369]|[258][0369][147])([147]|[258][0369][258]))*

Quoting this properly, it becomes: ([0369]|[147][0369]*[258]|([258]|[147][0369]*[147])([0369]*|[258][0369]*[147])*([147]|[258][0369]*[258]))*
This is slightly different. It has an extra asterisk after one of the [0369], and is missing the anchors.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 18, 2014

Oh, I'm sorry. I honestly thought you knew. This is even mentioned on Alok's page (at the very bottom he credits someone called joel for the simplification). I'm fairly sure this was found by a number of people independently.

From the beginning I engaged you on your Triples solution because I'd taken the same approach as I had before I discovered the FSM solution. I thought we were just pushing this approach as far as it could go.

On that note: Triples (512)

^(?=((.*?[147]){3})*[^147]*(.?)[^147]*(.?))(?=((.*?[258]){3})*[^258]*(.?)[^258]*(.?)).*$(\3\7|(?!\3|\7)\4\8|(?!\4|\8))

^(?=((.*?[147]){3})*[^147]*(.?)[^147]*(.?))
 (?=((.*?[258]){3})*[^258]*(.?)[^258]*(.?)).*$(\3\7|(?!\3|\7)\4\8|(?!\4|\8))

with your natural extension to the domain of all strings, matching only strings of the form ^\d+$ which are multiples of 3 base 10, scoring 511:

^(?=((.*?[147]){3})*[^147]*(.?)[^147]*(.?))
 (?=((.*?[258]){3})*[^258]*(.?)[^258]*(.?))\d+$(\3\7|(?!\3|\7)\4\8|(?!\4|\8))

Your move. :p

teukon commented Feb 18, 2014

Oh, I'm sorry. I honestly thought you knew. This is even mentioned on Alok's page (at the very bottom he credits someone called joel for the simplification). I'm fairly sure this was found by a number of people independently.

From the beginning I engaged you on your Triples solution because I'd taken the same approach as I had before I discovered the FSM solution. I thought we were just pushing this approach as far as it could go.

On that note: Triples (512)

^(?=((.*?[147]){3})*[^147]*(.?)[^147]*(.?))(?=((.*?[258]){3})*[^258]*(.?)[^258]*(.?)).*$(\3\7|(?!\3|\7)\4\8|(?!\4|\8))

^(?=((.*?[147]){3})*[^147]*(.?)[^147]*(.?))
 (?=((.*?[258]){3})*[^258]*(.?)[^258]*(.?)).*$(\3\7|(?!\3|\7)\4\8|(?!\4|\8))

with your natural extension to the domain of all strings, matching only strings of the form ^\d+$ which are multiples of 3 base 10, scoring 511:

^(?=((.*?[147]){3})*[^147]*(.?)[^147]*(.?))
 (?=((.*?[258]){3})*[^258]*(.?)[^258]*(.?))\d+$(\3\7|(?!\3|\7)\4\8|(?!\4|\8))

Your move. :p

@bbarry

This comment has been minimized.

Show comment Hide comment
@bbarry

bbarry Feb 18, 2014

Sorry about that, I didn't even think about it because I wasn't concerned with that solution (I find it boring). This backref monster you have here is nonetheless interesting. All I did was repost Zingler's with some added formatting.

I am interested in yours because it exposes limitations and bugs of implementations (and is turning into something rather elegant) and I like andersk's [02-5][123][257]|[07][0269]+3?$|55 because it is so simple and brutal in how it avoids the possibilities of being robust.

bbarry commented Feb 18, 2014

Sorry about that, I didn't even think about it because I wasn't concerned with that solution (I find it boring). This backref monster you have here is nonetheless interesting. All I did was repost Zingler's with some added formatting.

I am interested in yours because it exposes limitations and bugs of implementations (and is turning into something rather elegant) and I like andersk's [02-5][123][257]|[07][0269]+3?$|55 because it is so simple and brutal in how it avoids the possibilities of being robust.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 18, 2014

I'm nothing short of impressed with the work people have done in finding the highest scoring solutions. One can easily appreciate the work that's gone into a solution as fine as [02-5][123][257]|[07][0269]+3?$|55.

Do you happen to know the best known total score these days bbarry?

teukon commented Feb 18, 2014

I'm nothing short of impressed with the work people have done in finding the highest scoring solutions. One can easily appreciate the work that's gone into a solution as fine as [02-5][123][257]|[07][0269]+3?$|55.

Do you happen to know the best known total score these days bbarry?

@bbarry

This comment has been minimized.

Show comment Hide comment
@bbarry

bbarry Feb 18, 2014

The best scores I know of were apparently posted by andersk in the post containing that triples solution:

https://gist.github.com/jonathanmorley/8058871/#comment-981796

I don't know what his glob or balance solutions are (mine are 397 and 289 respectively, also in that thread). Thus my score is currently 4078 and the best that I know of (but don't know the answers exactly) is 4091.

I think I know how he came up with the first and 3rd part of that solution:

  1. check every 2-3 digit number as a string (finding 012, 015, 022, 025, 217, 222, 227 and many others)
  2. ignore combinations that have nothing in common with any other
  3. combine as many of them as possible to find the smallest triple sequence that matches the most (crossing things like 0[12][25] and eventually settling on the one he has)
  4. simplify

I have no clue how he found the second part of that solution (and I've been brute forcing the problem for a month now; perhaps he did as well with a smaller alphabet; it would be another way to find the first part).

I suspect he did something similar to solve Glob (and so I don't doubt his claimed score). With a decent program that should output possible partial answers in seconds and I haven't bothered to try and find them. Brute forcing that one with only the alphabet of letters and letter combinations with the modifiers ., .+, .? and .* would take a very long time so I doubt he did that (even with all the optimizations I have come up with so far my program would take years to get through the 3 combination sequences).

bbarry commented Feb 18, 2014

The best scores I know of were apparently posted by andersk in the post containing that triples solution:

https://gist.github.com/jonathanmorley/8058871/#comment-981796

I don't know what his glob or balance solutions are (mine are 397 and 289 respectively, also in that thread). Thus my score is currently 4078 and the best that I know of (but don't know the answers exactly) is 4091.

I think I know how he came up with the first and 3rd part of that solution:

  1. check every 2-3 digit number as a string (finding 012, 015, 022, 025, 217, 222, 227 and many others)
  2. ignore combinations that have nothing in common with any other
  3. combine as many of them as possible to find the smallest triple sequence that matches the most (crossing things like 0[12][25] and eventually settling on the one he has)
  4. simplify

I have no clue how he found the second part of that solution (and I've been brute forcing the problem for a month now; perhaps he did as well with a smaller alphabet; it would be another way to find the first part).

I suspect he did something similar to solve Glob (and so I don't doubt his claimed score). With a decent program that should output possible partial answers in seconds and I haven't bothered to try and find them. Brute forcing that one with only the alphabet of letters and letter combinations with the modifiers ., .+, .? and .* would take a very long time so I doubt he did that (even with all the optimizations I have come up with so far my program would take years to get through the 3 combination sequences).

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 19, 2014

This is even mentioned on Alok's page (at the very bottom he credits someone called joel for the simplification).

I got caught in an elaborate self-reinforcing delusion. I was told about http://quaxio.com/triple/ secondhand, by a friend, so that I could work on Triples without seeing spoilers. My friend told me that the solution there was 401 points. So I didn't load the page until I had beaten that, with my 403 point solution. When I did read the page, I was reading it with the preconception that the best robust solution known so far was 401 points. Believing that this was the best that could be done with an FSM solution, even after being carefully optimized, made the whole thing less interesting to me, and less worth reading about in depth. I skimmed it.

You know what I thought that note at the bottom was? I thought Alok was crediting the person who found the solution which was now the current solution being shown on the page, and that originally it had been an even longer one, worth even less than 401 points. (The alternative, that he would leave the full long solution being shown on the page un-updated, especially given that the page was presented in such an elegant, well-thought-out-looking way, made no sense so I discarded it as a possibility without any conscious consideration.) This is because the way he posts joel's solution is not a regex, it's shorthand for one. Because I skimmed over Alok's explanation, especially the "converting the state machine" section, I didn't know his shorthand schema. I assumed that the very short /^(0|20*1|(1|20*2)(0|10*2)*(2|10*1))*$/, which I only glanced at briefly, was shorthand for the much longer full regex currently being displayed under "solution", and that expanding it to that would require multiplying some permutations together (or something like that), not simply replacing 0, 1, and 2 with [0369], [147] and [258].

Now, in light of knowing now that there's a 523-point solution, I understand that the shorthand scheme is quite simple. This would have been obvious had I not skimmed the page very quickly, or had I taken another real look at the page. If it were my page, though, I would have replaced the full regex under "solution" with the best one currently known (joel's), and move the old long one under a section called "obsolete solutions" or something like that. The cognitive dissonance resulting from the fact that Alok didn't do this fed into my delusion.

All I did was repost Zingler's with some added formatting.

But that's not what you did. You posted a formatted version of Alok's 401-point monster, as if Zingler had not said anything at all. So this only reinforced my assumption that Zingler's hit-and-run post was totally botched and the real regex was quite long.

On that note: Triples (512)

Awesome! Nicely done. I was trying to do something similar, but didn't try putting lazy quantifiers in the first group.

This is even mentioned on Alok's page (at the very bottom he credits someone called joel for the simplification).

I got caught in an elaborate self-reinforcing delusion. I was told about http://quaxio.com/triple/ secondhand, by a friend, so that I could work on Triples without seeing spoilers. My friend told me that the solution there was 401 points. So I didn't load the page until I had beaten that, with my 403 point solution. When I did read the page, I was reading it with the preconception that the best robust solution known so far was 401 points. Believing that this was the best that could be done with an FSM solution, even after being carefully optimized, made the whole thing less interesting to me, and less worth reading about in depth. I skimmed it.

You know what I thought that note at the bottom was? I thought Alok was crediting the person who found the solution which was now the current solution being shown on the page, and that originally it had been an even longer one, worth even less than 401 points. (The alternative, that he would leave the full long solution being shown on the page un-updated, especially given that the page was presented in such an elegant, well-thought-out-looking way, made no sense so I discarded it as a possibility without any conscious consideration.) This is because the way he posts joel's solution is not a regex, it's shorthand for one. Because I skimmed over Alok's explanation, especially the "converting the state machine" section, I didn't know his shorthand schema. I assumed that the very short /^(0|20*1|(1|20*2)(0|10*2)*(2|10*1))*$/, which I only glanced at briefly, was shorthand for the much longer full regex currently being displayed under "solution", and that expanding it to that would require multiplying some permutations together (or something like that), not simply replacing 0, 1, and 2 with [0369], [147] and [258].

Now, in light of knowing now that there's a 523-point solution, I understand that the shorthand scheme is quite simple. This would have been obvious had I not skimmed the page very quickly, or had I taken another real look at the page. If it were my page, though, I would have replaced the full regex under "solution" with the best one currently known (joel's), and move the old long one under a section called "obsolete solutions" or something like that. The cognitive dissonance resulting from the fact that Alok didn't do this fed into my delusion.

All I did was repost Zingler's with some added formatting.

But that's not what you did. You posted a formatted version of Alok's 401-point monster, as if Zingler had not said anything at all. So this only reinforced my assumption that Zingler's hit-and-run post was totally botched and the real regex was quite long.

On that note: Triples (512)

Awesome! Nicely done. I was trying to do something similar, but didn't try putting lazy quantifiers in the first group.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 19, 2014

Okay, here you go! 520 points. Getting dangerously close to beating the FSM solution! ^(?=((.*?[147]){3})*(((.*?[147])?){2}))(?=((.*?[258]){3})*(((.*?[258])?){2})).*$(\3\8|\4\9(?!\3|\8)|(?!\4|\9))

^(?=((.*?[147]){3})*(((.*?[147])?){2}))
 (?=((.*?[258]){3})*(((.*?[258])?){2}))
 .*$ (\3\8 | \4\9(?!\3|\8) | (?!\4|\9))

Okay, here you go! 520 points. Getting dangerously close to beating the FSM solution! ^(?=((.*?[147]){3})*(((.*?[147])?){2}))(?=((.*?[258]){3})*(((.*?[258])?){2})).*$(\3\8|\4\9(?!\3|\8)|(?!\4|\9))

^(?=((.*?[147]){3})*(((.*?[147])?){2}))
 (?=((.*?[258]){3})*(((.*?[258])?){2}))
 .*$ (\3\8 | \4\9(?!\3|\8) | (?!\4|\9))
@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 19, 2014

Yes, I too found it strange that Alok didn't update his post.

His final solution contains something akin to ax|bx|ay|by and all he needed to do to obtain the shorter solution was to factor this to [ab][xy]. After he'd been made aware of the shorter solution, it would have been easy enough to add this final step to his algebra and give credit at that point.

As for Triples, do you think there's room for further improvement of our solution?

teukon commented Feb 19, 2014

Yes, I too found it strange that Alok didn't update his post.

His final solution contains something akin to ax|bx|ay|by and all he needed to do to obtain the shorter solution was to factor this to [ab][xy]. After he'd been made aware of the shorter solution, it would have been easy enough to add this final step to his algebra and give credit at that point.

As for Triples, do you think there's room for further improvement of our solution?

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 19, 2014

Looks like you posted while I was composing.

520 points!? Awesome!

teukon commented Feb 19, 2014

Looks like you posted while I was composing.

520 points!? Awesome!

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 19, 2014

@bbarry:

Thanks for the update.

I'm currently working on a set of problems of my own: https://gist.github.com/teukon/9060863. I was wondering if you'd be able to put my current Dominoes problem http://regex.alf.nu/?set=9e055be1a701c710ca9ed731af041fa2d6a9d478 through a solver with light settings and let me know how many points it scores. I'm hoping to design this particular puzzle so that the intended robust solution is also the highest scoring.

teukon commented Feb 19, 2014

@bbarry:

Thanks for the update.

I'm currently working on a set of problems of my own: https://gist.github.com/teukon/9060863. I was wondering if you'd be able to put my current Dominoes problem http://regex.alf.nu/?set=9e055be1a701c710ca9ed731af041fa2d6a9d478 through a solver with light settings and let me know how many points it scores. I'm hoping to design this particular puzzle so that the intended robust solution is also the highest scoring.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 19, 2014

I've been studying your latest improvements Davidebyzero and I must say that they are remarkably elegant. I'm afraid I won't be able to return your volley. I hope you're able to nudge our solution over the line.

All I can offer is my opinion on style. I would like to bring the start-string assertion forward.

    (?=((.*?[147]){3})*(((.*?[147])?){2}))
    (?=((.*?[258]){3})*(((.*?[258])?){2}))
^.*$
    (\3\8|\4\9(?!\3|\8)|(?!\4|\9))

I prefer (?!\3|\8)\4\9 here myself but it's not a big deal.

teukon commented Feb 19, 2014

I've been studying your latest improvements Davidebyzero and I must say that they are remarkably elegant. I'm afraid I won't be able to return your volley. I hope you're able to nudge our solution over the line.

All I can offer is my opinion on style. I would like to bring the start-string assertion forward.

    (?=((.*?[147]){3})*(((.*?[147])?){2}))
    (?=((.*?[258]){3})*(((.*?[258])?){2}))
^.*$
    (\3\8|\4\9(?!\3|\8)|(?!\4|\9))

I prefer (?!\3|\8)\4\9 here myself but it's not a big deal.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 19, 2014

Thank you, teukon!

I prefer (?!\3|\8)\4\9 here myself but it's not a big deal.

Well, I'm optimizing for regex length first, but second for computational speed. I prefer (?!\3|\8)\4\9 aesthetically, but chose to use \4\9(?!\3|\8) because that should allow the regex engine to notice a non-match sooner (sort of like short-circuit boolean evaluation).

That's pretty neat that the start-anchor can be moved forward like that. Not sure I like it stylistically though.

Thank you, teukon!

I prefer (?!\3|\8)\4\9 here myself but it's not a big deal.

Well, I'm optimizing for regex length first, but second for computational speed. I prefer (?!\3|\8)\4\9 aesthetically, but chose to use \4\9(?!\3|\8) because that should allow the regex engine to notice a non-match sooner (sort of like short-circuit boolean evaluation).

That's pretty neat that the start-anchor can be moved forward like that. Not sure I like it stylistically though.

@bbarry

This comment has been minimized.

Show comment Hide comment
@bbarry

bbarry Feb 19, 2014

Regarding the dominoes problem, my solver is pretty tailored to the triples problem right now and every spare cycle I have is working on it. Looking at it briefly the best solution I am coming up with by hand is ^((.)(.) ?(?=$|.?(\3|\2).?))*$ which I think slightly should outscore the sliding window solution of alternations that match each row: 15 1|3 41|2 62 |33 03... (I didn't bother completing this because it should score less than 100 if it continues at that rate).

bbarry commented Feb 19, 2014

Regarding the dominoes problem, my solver is pretty tailored to the triples problem right now and every spare cycle I have is working on it. Looking at it briefly the best solution I am coming up with by hand is ^((.)(.) ?(?=$|.?(\3|\2).?))*$ which I think slightly should outscore the sliding window solution of alternations that match each row: 15 1|3 41|2 62 |33 03... (I didn't bother completing this because it should score less than 100 if it continues at that rate).

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 19, 2014

Thanks for taking a quick look at Dominoes and good luck with your work on Triples.

teukon commented Feb 19, 2014

Thanks for taking a quick look at Dominoes and good luck with your work on Triples.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 22, 2014

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 25, 2014

Triples (524) (robust):

(?=((.*?[147]){3})*((.*?[147]|){2}))(?=((.*?[258]){3})*((.*?[258]|){2}))^.*$(\3\7|(?!\3|\7)\4\8|(?!\4|\8))

teukon commented Feb 25, 2014

Triples (524) (robust):

(?=((.*?[147]){3})*((.*?[147]|){2}))(?=((.*?[258]){3})*((.*?[258]|){2}))^.*$(\3\7|(?!\3|\7)\4\8|(?!\4|\8))
@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 25, 2014

That is awesome beyond words I have to describe. Thank you, teukon! (For those who haven't been following the discussion — this beats the FSM solution, which scores 523, previously the shortest known robust solution!)

I think I'll format it this way:

(?=((.*?[147]){3})*((.*?[147]|){2}))
(?=((.*?[258]){3})*((.*?[258]|){2}))
^.*$(\3\7 | \4\8(?!\3|\7) | (?!\4|\8))

That is awesome beyond words I have to describe. Thank you, teukon! (For those who haven't been following the discussion — this beats the FSM solution, which scores 523, previously the shortest known robust solution!)

I think I'll format it this way:

(?=((.*?[147]){3})*((.*?[147]|){2}))
(?=((.*?[258]){3})*((.*?[258]|){2}))
^.*$(\3\7 | \4\8(?!\3|\7) | (?!\4|\8))
@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 25, 2014

Thanks, that's much easier for people to see.

Don't lay too much praise on me; this was a joint effort if ever there was.

While we're here: Did you ever get around to forging a robust solution for Long Count?

teukon commented Feb 25, 2014

Thanks, that's much easier for people to see.

Don't lay too much praise on me; this was a joint effort if ever there was.

While we're here: Did you ever get around to forging a robust solution for Long Count?

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 25, 2014

Well I hadn't until now, but here goes.

Long Count (232): ^(?!.*\S{5})((.*)0(1*) (?=\2[1]+)){15}

This and ^0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111$ are functionally identical, thus this solution is robust for Long Count 2 as well.

Edit: Indeed, not robust for Long Count 2, assuming suffusion of yellow means the domain is ^.*$.
^0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111(\s.{0,4})*$ is what I was actually matching.

Well I hadn't until now, but here goes.

Long Count (232): ^(?!.*\S{5})((.*)0(1*) (?=\2[1]+)){15}

This and ^0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111$ are functionally identical, thus this solution is robust for Long Count 2 as well.

Edit: Indeed, not robust for Long Count 2, assuming suffusion of yellow means the domain is ^.*$.
^0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111(\s.{0,4})*$ is what I was actually matching.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 25, 2014

Ah, I only scored 230 with: (removed - md5sum: 8148692a8f2a87d75e9f71f48c410369). I'll have to study your solution to work out what I've missed.

Edit: Wait a moment: Your regex matches:
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 hi

Edit: I just realised I can save 2 characters (232 points): md5sum: 780c231919a2e0d261c507feca8cf7c1. One was inspired by your regex.

teukon commented Feb 25, 2014

Ah, I only scored 230 with: (removed - md5sum: 8148692a8f2a87d75e9f71f48c410369). I'll have to study your solution to work out what I've missed.

Edit: Wait a moment: Your regex matches:
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 hi

Edit: I just realised I can save 2 characters (232 points): md5sum: 780c231919a2e0d261c507feca8cf7c1. One was inspired by your regex.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 25, 2014

Oh, of course. Silly me. Long Count (232) (robust): ^(?!.*\S{5})((.*)01* (?=\2()1)){15}1+$

Edit: I have to think about whether ^(?!.*\S{5})((.*)01* (?=\2+1)){15}1+$ (233) is robust. I think it doesn't match any sequences where a number is skipped.

Oh, of course. Silly me. Long Count (232) (robust): ^(?!.*\S{5})((.*)01* (?=\2()1)){15}1+$

Edit: I have to think about whether ^(?!.*\S{5})((.*)01* (?=\2+1)){15}1+$ (233) is robust. I think it doesn't match any sequences where a number is skipped.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 25, 2014

The 233 idea is no good. It matches:
0000 0001 0010 0011 0011 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

My solution (232): ^((?=.{4} )(\d*)01* (?=\2[1])){15}\2.$

Edit: I think I prefer your solution. It seems simpler. Nice work.

teukon commented Feb 25, 2014

The 233 idea is no good. It matches:
0000 0001 0010 0011 0011 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

My solution (232): ^((?=.{4} )(\d*)01* (?=\2[1])){15}\2.$

Edit: I think I prefer your solution. It seems simpler. Nice work.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 25, 2014

it doesn't match any sequences where a number is skipped.

0011 0101

teukon commented Feb 25, 2014

it doesn't match any sequences where a number is skipped.

0011 0101

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 25, 2014

Yeah, just realized that I hadn't completely thought it out. Need to think some more.

Edit: I think I prefer your solution. It seems simpler. Nice work.

Thanks. But on the other hand, I like that yours does everything in one pass. That's a definite plus. Between two regexes of the same length, if one gives the regex engine less work, I'll tend to prefer it. I'll look at yours in detail then get back to you as to which I prefer.

Yeah, just realized that I hadn't completely thought it out. Need to think some more.

Edit: I think I prefer your solution. It seems simpler. Nice work.

Thanks. But on the other hand, I like that yours does everything in one pass. That's a definite plus. Between two regexes of the same length, if one gives the regex engine less work, I'll tend to prefer it. I'll look at yours in detail then get back to you as to which I prefer.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 25, 2014

Between two regexes of the same length, if one gives the regex engine less work, I'll tend to prefer it.

Great. Maybe we can swap.

Edit: I may have found a 235-point robust solution. md5sum: abc5fe6e47d924c5c788fcbca2a817a9

teukon commented Feb 25, 2014

Between two regexes of the same length, if one gives the regex engine less work, I'll tend to prefer it.

Great. Maybe we can swap.

Edit: I may have found a 235-point robust solution. md5sum: abc5fe6e47d924c5c788fcbca2a817a9

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 25, 2014

Found a 236 point one I'm pretty sure is robust. md5sum = f740b1454c7c482a3c8cd2a0b1592b6e

And another 236 point one I think is robust: md5sum = b971969d83eb7f02e62b46571f12c2f3

And a 237 point, probably robust: md5sum = e06ddc5efec2e3feb5498b98e0f2f59d

Found a 236 point one I'm pretty sure is robust. md5sum = f740b1454c7c482a3c8cd2a0b1592b6e

And another 236 point one I think is robust: md5sum = b971969d83eb7f02e62b46571f12c2f3

And a 237 point, probably robust: md5sum = e06ddc5efec2e3feb5498b98e0f2f59d

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 25, 2014

Ok, my 235-pointer was ^((?=(\S*)0).{4} (?=\2[1])){15}\2.$.

teukon commented Feb 25, 2014

Ok, my 235-pointer was ^((?=(\S*)0).{4} (?=\2[1])){15}\2.$.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 25, 2014

You're on the right track. Does your posting it mean that you've verified its robustness?

Ok, now I think I really do have a robust 237-pointer: md5sum = dfc41ef98caf8013b344907e394619ec

For real now? 236-pointer, probably robust: md5sum = 09e8ee85a3e3a4e7ca434f36b0d43f85

You're on the right track. Does your posting it mean that you've verified its robustness?

Ok, now I think I really do have a robust 237-pointer: md5sum = dfc41ef98caf8013b344907e394619ec

For real now? 236-pointer, probably robust: md5sum = 09e8ee85a3e3a4e7ca434f36b0d43f85

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 25, 2014

My turn to play catch up it seems.

Long Count (236): ^((?=(\S*)01* \2[1]).... ){15}\2.$ (Davidebyzero shows this is not robust below)

teukon commented Feb 25, 2014

My turn to play catch up it seems.

Long Count (236): ^((?=(\S*)01* \2[1]).... ){15}\2.$ (Davidebyzero shows this is not robust below)

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 25, 2014

Does your posting it mean that you've verified its robustness?

I've not verified any of these. I quickly look for a counter-example, and if I don't find one, I post it. If you can kill one of my regexes with a counter-example, please do.

If we get to a stage where neither of us can find a counter-example to the highest scoring solution, I'll set about proving it.

teukon commented Feb 25, 2014

Does your posting it mean that you've verified its robustness?

I've not verified any of these. I quickly look for a counter-example, and if I don't find one, I post it. If you can kill one of my regexes with a counter-example, please do.

If we get to a stage where neither of us can find a counter-example to the highest scoring solution, I'll set about proving it.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 25, 2014

Long Count (236): ^((?=(\S*)01* \2[1]).... ){15}\2.$

Nice try getting me to post my 236. Yours breaks with the following string:

0 10 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Edit: Mine wasn't robust either. Damn. Still stuck at 235, with md5sum 8b691a76274f2fbbf4d61fa859bd59ac.

Long Count (236): ^((?=(\S*)01* \2[1]).... ){15}\2.$

Nice try getting me to post my 236. Yours breaks with the following string:

0 10 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Edit: Mine wasn't robust either. Damn. Still stuck at 235, with md5sum 8b691a76274f2fbbf4d61fa859bd59ac.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 25, 2014

Well, I did ask for that. There doesn't seem to be an easy way to gain a point here though. Care to share your solution?

teukon commented Feb 25, 2014

Well, I did ask for that. There doesn't seem to be an easy way to gain a point here though. Care to share your solution?

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 26, 2014

Okay, I've convinced myself that your 235 is robust, so here's mine: ^((?=(\S*)01* \2()10*\b)\S{4} )+1+$

The cool thing about this is that you can replace the {4} with any number of digits and it'll just work. For example with {3} it will match ^000 001 010 011 100 101 110 111$.

So with {7} or higher, it beats your solution because yours would require a {127}, making it a character longer.

Okay, I've convinced myself that your 235 is robust, so here's mine: ^((?=(\S*)01* \2()10*\b)\S{4} )+1+$

The cool thing about this is that you can replace the {4} with any number of digits and it'll just work. For example with {3} it will match ^000 001 010 011 100 101 110 111$.

So with {7} or higher, it beats your solution because yours would require a {127}, making it a character longer.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 26, 2014

Unfortunately, this matches 1110 1111.

teukon commented Feb 26, 2014

Unfortunately, this matches 1110 1111.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 26, 2014

Oh, right. Silly, silly me. You confused me for a second though — I didn't realize at first that you meant ^1110 1111$.

Oh, right. Silly, silly me. You confused me for a second though — I didn't realize at first that you meant ^1110 1111$.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 26, 2014

Don't worry about it. Can you find a counter-example for my 235-point regex?

teukon commented Feb 26, 2014

Don't worry about it. Can you find a counter-example for my 235-point regex?

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 26, 2014

I'm pretty sure your Long Count (235) ^((?=(\S*)0).{4} (?=\2[1])){15}\2.$ is robust.

My regex still beats yours, though. I present to you, Very Long Count: ^(?=0+ )((?=(\S*)01* \2()10*\b)\S{30} )+1+$

^((?=(\S*)0).{30} (?=\2[1])){1073741823}\2.$ would be yours.

I'm pretty sure your Long Count (235) ^((?=(\S*)0).{4} (?=\2[1])){15}\2.$ is robust.

My regex still beats yours, though. I present to you, Very Long Count: ^(?=0+ )((?=(\S*)01* \2()10*\b)\S{30} )+1+$

^((?=(\S*)0).{30} (?=\2[1])){1073741823}\2.$ would be yours.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 26, 2014

Damn it!

teukon commented Feb 26, 2014

Damn it!

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 26, 2014

I think nobody has been keeping an up-to-date list of the best solutions so far, so:
Best known Regex Golf solutions

Please let me know if I got anything wrong.

I think nobody has been keeping an up-to-date list of the best solutions so far, so:
Best known Regex Golf solutions

Please let me know if I got anything wrong.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 26, 2014

In all the excitement to get a robust Long Count v2, I think we've lost track of making a robust Long Count.

I would say the domain of Long Count is ^([01]{4}([1 ](?=.)|$)){16}$. Would you agree?
All we actually see is ^[01]{4} ([01]{4}[1 ]([01]{4} ){2}){3}[01]{4}[1 ]([01]{4}( (?=.)|$)){5}$, but it's a limited sample. Arguably the domain could be ^([01]{4}([01 ](?=.)|$)){16}$.

In all the excitement to get a robust Long Count v2, I think we've lost track of making a robust Long Count.

I would say the domain of Long Count is ^([01]{4}([1 ](?=.)|$)){16}$. Would you agree?
All we actually see is ^[01]{4} ([01]{4}[1 ]([01]{4} ){2}){3}[01]{4}[1 ]([01]{4}( (?=.)|$)){5}$, but it's a limited sample. Arguably the domain could be ^([01]{4}([01 ](?=.)|$)){16}$.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Feb 27, 2014

I've beaten the published exact-match Balance record:

Balance (292): ^$|.{37}|^<(.(?!><*.>$).)*>$

This also exposes another bug in Opera's ECMAScript engine. Under Opera, only ^$|.{37}|^<(.(?!><*.>$).)*?>$ works, and ^<(?:..)*>$ is equivalent to ^<.*>$.

I've beaten the published exact-match Balance record:

Balance (292): ^$|.{37}|^<(.(?!><*.>$).)*>$

This also exposes another bug in Opera's ECMAScript engine. Under Opera, only ^$|.{37}|^<(.(?!><*.>$).)*?>$ works, and ^<(?:..)*>$ is equivalent to ^<.*>$.

@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Feb 27, 2014

Arguably the domain could be ^([01]{4}([01 ](?=.)|$)){16}$.

Ha, I just asserted the possibility of this on your solutions gist. Unfortunately, I'm not much interested in Long Count anymore so I'll leave you to work on this one alone.

Balance (292): ^$|.{37}|^<(.(?!><*.>$).)*>$

Nice work!

teukon commented Feb 27, 2014

Arguably the domain could be ^([01]{4}([01 ](?=.)|$)){16}$.

Ha, I just asserted the possibility of this on your solutions gist. Unfortunately, I'm not much interested in Long Count anymore so I'll leave you to work on this one alone.

Balance (292): ^$|.{37}|^<(.(?!><*.>$).)*>$

Nice work!

@bbarry

This comment has been minimized.

Show comment Hide comment
@bbarry

bbarry Mar 4, 2014

tweaking 2 of my top 100 candidates has generated these 596 and 595 point solutions for triples:

00(3|6|9|15|$)|.1.+4|.17|[04].2|55
[07][0269]+3?$|0.2|1.4|3.*7.|58|015

Going off on a tangent and having my machine crash this past week has gotten me off to exhausting an additional 1/14th of my intended search space, but I have increased the speed it traverses it to approximately 1/56ths a day... (as opposed to 1/98ths) or perhaps more (the crash happened sometime during the weekend after I finished the latest improvements when the IT staff decided to update my machine; I am not sure exactly how much faster it is, only that it is somewhere between 1.25x and 3.5x; will know more tomorrow). My database of best matches found so far for any subset is currently at 147,949 solutions after pruning.

bbarry commented Mar 4, 2014

tweaking 2 of my top 100 candidates has generated these 596 and 595 point solutions for triples:

00(3|6|9|15|$)|.1.+4|.17|[04].2|55
[07][0269]+3?$|0.2|1.4|3.*7.|58|015

Going off on a tangent and having my machine crash this past week has gotten me off to exhausting an additional 1/14th of my intended search space, but I have increased the speed it traverses it to approximately 1/56ths a day... (as opposed to 1/98ths) or perhaps more (the crash happened sometime during the weekend after I finished the latest improvements when the IT staff decided to update my machine; I am not sure exactly how much faster it is, only that it is somewhere between 1.25x and 3.5x; will know more tomorrow). My database of best matches found so far for any subset is currently at 147,949 solutions after pruning.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Mar 4, 2014

tweaking 2 of my top 100 candidates has generated these 596 and 595 point solutions for triples:

00(3|6|9|15|$)|.1.+4|.17|[04].2|55
[07][0269]+3?$|0.2|1.4|3.*7.|58|015

Nice work. Makes it look hopeful that you might beat 596.

Here it is compared with the similar published 596 point solution:

00($|3|6|9|15)|[04].2|.1.+4|.17|55
00($|3|6|9|12|15)|4.2|.1.+4|.17|55

And here is your 595 compared with the other published 596:

[07][0269]+3?$|[02-5][123][257]|55
[07][0269]+3?$|0.2|1.4|3.*7.|58|015

tweaking 2 of my top 100 candidates has generated these 596 and 595 point solutions for triples:

00(3|6|9|15|$)|.1.+4|.17|[04].2|55
[07][0269]+3?$|0.2|1.4|3.*7.|58|015

Nice work. Makes it look hopeful that you might beat 596.

Here it is compared with the similar published 596 point solution:

00($|3|6|9|15)|[04].2|.1.+4|.17|55
00($|3|6|9|12|15)|4.2|.1.+4|.17|55

And here is your 595 compared with the other published 596:

[07][0269]+3?$|[02-5][123][257]|55
[07][0269]+3?$|0.2|1.4|3.*7.|58|015
@teukon

This comment has been minimized.

Show comment Hide comment
@teukon

teukon Mar 4, 2014

Nice. Making progress bbarry.

I hope this computer search does yield a saving in the end. It certainly seems possible.

teukon commented Mar 4, 2014

Nice. Making progress bbarry.

I hope this computer search does yield a saving in the end. It certainly seems possible.

@Davidebyzero

This comment has been minimized.

Show comment Hide comment
@Davidebyzero

Davidebyzero Mar 23, 2014

Balance (454): .{37}|^(<(..(?!<.>$))*>)*$

Edit: Used to score 294, now 454 with new scoring

Balance (454): .{37}|^(<(..(?!<.>$))*>)*$

Edit: Used to score 294, now 454 with new scoring

@hkdobrev

This comment has been minimized.

Show comment Hide comment
@hkdobrev

hkdobrev May 7, 2014

Order (198): ^[^o].{1,5}$

Same points, but a better using shorter negating syntax and numbers instead of multiple dots.

hkdobrev commented May 7, 2014

Order (198): ^[^o].{1,5}$

Same points, but a better using shorter negating syntax and numbers instead of multiple dots.

@hkdobrev

This comment has been minimized.

Show comment Hide comment
@hkdobrev

hkdobrev May 7, 2014

Triples (543): ^(0+[0369]|0{7}1[25]|06.*|1.*[60]|3[1269].*[478]|32.*|4[04].*|7[14].*|8[17].*|9[05].*)$

hkdobrev commented May 7, 2014

Triples (543): ^(0+[0369]|0{7}1[25]|06.*|1.*[60]|3[1269].*[478]|32.*|4[04].*|7[14].*|8[17].*|9[05].*)$

@hkdobrev

This comment has been minimized.

Show comment Hide comment
@hkdobrev

hkdobrev May 7, 2014

Triples (574): ^(.*[07](6|1[25]|[056][^58])|([349][^378]|[78][147]).*)$

hkdobrev commented May 7, 2014

Triples (574): ^(.*[07](6|1[25]|[056][^58])|([349][^378]|[78][147]).*)$

@71104

This comment has been minimized.

Show comment Hide comment
@71104

71104 Jun 15, 2014

Triples (523): ^([0369]|[258][0369]*[147]|([147]|[258][0369]*[258])([0369]|[147][0369]*[258])*([147][0369]*[147]|[258]))*$

(I designed a DFA and converted it using Kleene's algorithm, it's the most correct way to do it although not the one that scores the most)

71104 commented Jun 15, 2014

Triples (523): ^([0369]|[258][0369]*[147]|([147]|[258][0369]*[258])([0369]|[147][0369]*[258])*([147][0369]*[147]|[258]))*$

(I designed a DFA and converted it using Kleene's algorithm, it's the most correct way to do it although not the one that scores the most)

@ranneyd

This comment has been minimized.

Show comment Hide comment
@ranneyd

ranneyd Jul 3, 2014

Balance (444): ^(<(<(<(<(<(<(<>)>)>)>)>)>)>)$

ranneyd commented Jul 3, 2014

Balance (444): ^(<(<(<(<(<(<(<>)>)>)>)>)>)>)$

@71104

This comment has been minimized.

Show comment Hide comment
@71104

71104 Jul 11, 2014

Balance (446): ^(<(<(<(<(<(<.*>)*>)*>)*>)*>)*>)*$

71104 commented Jul 11, 2014

Balance (446): ^(<(<(<(<(<(<.*>)*>)*>)*>)*>)*>)*$

@wapiflapi

This comment has been minimized.

Show comment Hide comment
@wapiflapi

wapiflapi Jul 25, 2014

Balance (447): ^(<(<(<(<(<(...)*>)*>)*>)*>)*>)*$

Balance (447): ^(<(<(<(<(<(...)*>)*>)*>)*>)*>)*$

@aeieli

This comment has been minimized.

Show comment Hide comment
@aeieli

aeieli Jan 29, 2015

Powers 2(82): ^(x{81})+$|^x$|^(x{3}){1,9}$

aeieli commented Jan 29, 2015

Powers 2(82): ^(x{81})+$|^x$|^(x{3}){1,9}$

@welwood08

This comment has been minimized.

Show comment Hide comment
@welwood08

welwood08 Feb 20, 2015

Powers 2: ^(.|(...)+)$ (88)

Matches 1 test that it shouldn't, but still the most points I could get within reasonable time. Several ways to get 86pts otherwise.

Powers 2: ^(.|(...)+)$ (88)

Matches 1 test that it shouldn't, but still the most points I could get within reasonable time. Several ways to get 86pts otherwise.

@wsergent

This comment has been minimized.

Show comment Hide comment
@wsergent

wsergent Oct 30, 2015

Long Count = 256 points: ((.+)0\2+1){8}

Long Count = 256 points: ((.+)0\2+1){8}

@nicklister

This comment has been minimized.

Show comment Hide comment
@nicklister

nicklister Feb 25, 2016

561 on Triples... might need optimized/refined

[0562][04]$|[30]03$|0[4629]$|12$|015$|[48]7$|9005$|7[6825]$|31$|58$

561 on Triples... might need optimized/refined

[0562][04]$|[30]03$|0[4629]$|12$|015$|[48]7$|9005$|7[6825]$|31$|58$

@mlindheimmarx

This comment has been minimized.

Show comment Hide comment
@mlindheimmarx

mlindheimmarx Jun 5, 2016

glob-- ^(*(er|f|i|p|t|v)|b|c(?!a)|d(?!i)|l|mi|p|re|w)

glob-- ^(*(er|f|i|p|t|v)|b|c(?!a)|d(?!i)|l|mi|p|re|w)

@Allypost

This comment has been minimized.

Show comment Hide comment
@Allypost

Allypost Jun 6, 2016

Triples (589): 003|009|0.2|015|00$|06|1.4|4.2|55|.17|819

Allypost commented Jun 6, 2016

Triples (589): 003|009|0.2|015|00$|06|1.4|4.2|55|.17|819

@donle-backup

This comment has been minimized.

Show comment Hide comment
@donle-backup

donle-backup Dec 7, 2016

Power 2: ^(?!xx$|(x(xxx)+)\1*$)

Power 2: ^(?!xx$|(x(xxx)+)\1*$)

@leiserfg

This comment has been minimized.

Show comment Hide comment
@leiserfg

leiserfg Jan 26, 2017

Power 2:^(?!((xxx)+x|xx)\1*$)

Power 2:^(?!((xxx)+x|xx)\1*$)

@vinodselvin

This comment has been minimized.

Show comment Hide comment
@vinodselvin

vinodselvin Sep 8, 2017

It never ends: u\b

It never ends: u\b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment