Last active — forked from jpsim/answers.md

Embed URL

HTTPS clone URL

SSH clone URL

You can clone with HTTPS or SSH.

Download Gist

Best possible answers collected so far for [Regex golf](http://regex.alf.nu/). === WARNING: SPOILERS ===

View regex_golf.md

See Also:

Normal

Level Score Regex Credit
Warmup 207 foo
Anchors 208 k$
Ranges 202 ^[a-f]*$
Backrefs 201 (...).*\1 gorhill (HN)
Abba 195 ^(?!.*(.)\1)|ef BeniaminK (GH)
A man, a plan 177 ^(.)[^p].*\1$ hyp0 (HN)
Prime 286 ^(?!(..+)\1+$) josephlord (HN)
Four 199 (.)(.\1){3} chrismorgan (HN) / MereInterest (HN)
Order 199 ^.{5}[^e]?$ ekke (HN)
Triples 596 00($|3|6|9|12|15)|4.2|.1.+4|55|.17 alexandrosm (GH)
Glob 397 ai|c$|^p|[bcnrw][bnopr] nwellnhof (HN)
Balance 454 .{37}|^(<(..(?!<.>$))*>)*$ Davidebyzero (GH)
Powers 93 ^(?!(.(..)+)\1*$) plby (GH)
Long count 256 ((.+)0\2+1){8} Davidebyzero (GH)
Alphabetical 317 .r.{32}r|a.{10}te|n.n.. alexandrosm (GH)
Powers 2 88 ^(?!((xxx)+x|xx|)\1*$) muxrwc (GH)
Total 4075
Subtraction 180 ^(.+)(.+) - \1 = \2$ jonathanmorley (GH)
Typist 181 ^[adresbtcfxzgvw]+$ jonathanmorley (GH)

Hard Mode

Level Score Regex Credit
Warmup 207 foo
Anchors 206 ick$ Davidebyzero (GH)
Ranges 202 ^[a-f]*$
Backrefs 201 (...).*\1 gorhill (HN)
Abba 193 ^(?!.*(.)(.)\2\1) chingjun (HN) / josephlord (HN)
A man, a plan 150 ^(.?)(.?)(.?)(.?)(.?)(.?).?\6\5\4\3\2\1$ jonathanmorley (GH)
Prime 284 ^(?!(xx+)\1+$)xx Davidebyzero (GH)
Four 199 (.)(.\1){3} chrismorgan (HN) / MereInterest (HN)
Order 156 ^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$ Davidebyzero (GH)
Triples 524 (?=((.*?[147]){3})*((.*?[147]|){2}))(?=((.*?[258]){3})*((.*?[258]|){2}))^.*$(\3\7|(?!\3|\7)\4\8|(?!\4|\8)) teukon (GH)
Glob 297 ai|c$|^p|[bcnrw][bnopr] nwellnhof (HN)
Balance 443 ^(<(<(<(<(<(<(<>)*>)*>)*>)*>)*>)*>)*$ gkucmierz (GH)
Powers 93 ^(?!(.(..)+)\1*$) plby (GH)
Long count 239 ^((?=(\S*)0).{4} (?=\2[1]))+1+$ Davidebyzero (GH)
Alphabetical 282 ^(?!.*\b(.*)(e|(n|r|(s|t))).* \1(a|(?!\3)[en]|(?!\4)[rs])) teukon (GH)
Powers 2 -12 ^(?!((xxx)+x|xx|)\1*$) muxrwc (GH)
Total 3664
Subtraction 180 ^(.+)(.+) - \1 = \2$ jonathanmorley (GH)
Typist 180 ^[adresbtcfxzgvwq]+$ jonathanmorley (GH)

13. Powers (69): ^(..?|(.{4}){1,2}|(.{16}){1,2}|(.{64})+)$

13. Powers (93): ^(?!(.(..)+)\1*$)
Doesn't even "cheat".

Thanks, updated

#10 Tripes (586): ^[387][12479]|00($|[369]|1[25])|5[54]|2[437]

#10 Triples (587): 00($|[369]|1[25])|5[58]|24|4.2|9..5|878|819

#12 Balance (288): ^(<(<(<(..)>)>)>)$
It doesn't pass all tests, but score is greater=)

10 Triples (588): 00($|[369]|1[25])|5[58]|24|4.2|9..5|^8[17]

I feel triples, being the longest, still has room for improvement.

#10 Triples (590): 00($|[369]|1[25])|(^.|9|3)..5|4.2|^[38]1

aww yeah!
triples (590): 32|.[25][345].|00([369]|$)|01[25]|[07]2$

wait... how did i not see the comment before mine? i swear i refreshed.

make that 591: 32|.[25][345].|00([369]|1[25]|$)|[07]2$

594 for #10: 00([369]|1[25]|$)|.1.+4|3.*7.|4.2|55

596 for #10: 00($|3|6|9|12|15)|4.2|.1.+4|55|.17

I had to write code for this one, the previous were by hand and excel. While there's still more space to search, I'm starting to think this is as good as a solution can get. At this point, after the preamble 00($|3|6|9|12|15) we're using 17 characters |4.2|.1.+4|55|.17 to match 13 lines, including the lead(ing) pipe.

Possible room for improvement:

  1. Maybe the preamble can be improved (17 chars for 8 lines)
  2. Maybe there's more regex tricks to add when looking for solutions
  3. My greedy algo is getting trapped in some local maximum
  4. I don't take into consideration cross-optimisations between expressions in the cost function
  5. I only use two numbers to find an ensemble member, maybe with three there's better possibilities?

I'll go try #5 now.

16: 262 points

^(?!.*(rne|eet |e.r...t|er.{6}r$))(a\S+\s)*(e\S+\s)*(r\S+\s)*([st]\S+(\s|$))*$

16: 294
[^et] ren|[er]( \w+)\1|tate r|a t| ae|rt r|e e

16: 295
[^et] ren|[er]( \w+)\1|(tat|r). r|a t| ae|e e

16: 299
s ren|[er]( \w+)\1|(tat|r). r|a t| ae|e e

16: 303
r sn|( t\w+)\1|(tat|r). r|a t| ae|e e

16: 304
( .+[ts]..)\1|(tat|r). r|a t| ae|e e

nice one @bbarry. love the technique you used. I'll try and beat it of course, but very nice :)

16: 307 -- phew!
( .+[ts]..)\1|(tat|r). r|a t|e .r

16: 317
.r.{32}r|a.{10}te|n.n..

Thought I'd give code a try. Nothing too elegant but definitely produced something no human would come up with.

That is a crazy solution, I am impressed. Mind sharing your code? I'm curious how you did it.

hey @jonathanmorley , here's my code it's coffeescript. It just iterates through some predefined pattern families to produce subexpressions that match many lines. Then I hand-tune and combine with other expressions. As I said, nothing too impressive, but it's fun to play with. I used a variation of this for #10 as well.

https://gist.github.com/alexandrosm/8204709

A solution for 12 that gets 288 points, same as the current one, and is a perfect match at the same time:

^(<(<(<(<(<>)*>|.{9})*>)*>)*>)*$

This one is by hand.

289 for 12. Minor-ish tweak on the last one
^(<(<(<(<<?>?>|.{9})*>)*>)*>)*$

According to the site’s built-in high scores list, which isn’t published yet, the first discoverers of the best solutions (minus obvious cheaters with impossible scores) were

  1. Plain strings (207) by alok
  2. Anchors (208) by alok
  3. Ranges (202) by alok
  4. Backrefs (201) by alok
  5. Abba (195) by cesium
  6. A man, a plan (177) by Scott
  7. Prime (286) by alok
  8. Four (199) by alok
  9. Order (199) by andersk
  10. Triples (596) by andersk
  11. Glob (403) by andersk
  12. Balance (296) by andersk
  13. Powers (93) by andersk
  14. Long count (256) by timloh
  15. Long count v2 (256) by timloh
  16. Alphabetical (317) by Jon

With the exception of Abba (195) and the last three new levels, these all predate the HN post.

My Order (199) and Triples (596) solutions were different:
^[^o]?.{5}$
[02-5][123][257]|[07][0269]+3?$|55
I’ll leave Glob (403) and Balance (296) as a challenge for now.

I am convinced I can do better with Triples (and glob but I haven't started on that one yet), I have nearly 1 billion regexes that are potential parts of the answer in a database and I am only part way through searching with a program similar to @alexandrosm's using an alphabet that is the power set of [0123456789] and some fancy footwork to avoid duplicates and shorten potential regexes (and a few more generation schemes).

Already it has revealed interesting candidates such as: [03][^1348]..+[23679] and [03].[1258][2578] (it hasn't crossed the [06] first letter yet or [02-5]).

14 and 15 with 254:
((.+)0 \2?1 ){7}

I rewrote my program nearly 4 weeks ago because it was running out of memory and had to deal with server restarts and issues like that so now it is simply counting in a base of around 14k and then translating each number into a regex and is about 1/14 of the way done with the 3 "digit" regexes.


If anyone was wondering, my search (for shortest exact solution on triples) through all alternations of sequences of set items from the set generated by the powerset of [0-9] with the repetition modifiers {a,b} where 0 <= a <= b <= 9 (basically counting through a space containing 14k items) is currently around the sequences *, 7500: [03568]+, 1099: [^059]+ (I say around because it is doing about 200k per second and is operating on 8 sequences at the same time; each thread doing the whole round of 14k before taking the next available one from the queue) which in that case are the 3 item regexes ending in [03568]+[^059]+.

So far the best it has done is 538 - [^07]{4}[47]|[036]{3}$|009|9...$ (I constrained it to search for solutions shorter than 36 characters).

My best that matches for case 5 (0012) and 6 (0015) is 536 - [03-6]{2}[^07][^0146]|[07][^3578]$. These cases are special in that they require a more complex expression than the rest of them (all others can be solved by 2 item sequences where these require 3 items to match). I should be through the 3 item sequences (which I know at least contains a 596 point solution that will cross) in another month or two (based on how long it ran to get to 0,0,1k and a coded theory that as it continues to find better matches it should move through the space faster by being able to discard potential sets earlier). At that point (exhausting the 3 item space) I will be uploading the program for anyone to play with ( ~500 lines C# code+mssql storage, could easily be converted to something else), potentially with the (currently 22k rows) table of best exact matches for subsets of the problem or the (currently 4k rows) best subset-robust rows (the former contains things that are over specified but were best found for that particular subset while the latter contains only rows that could potentially be in a solution). Or I can add the subset of the latter row set that does not contain alternations.

For the new "Power 2"
^(.|...|.{9}|(.{27})+)$ gives 87
^(.|(...)+)$ (non-perfect) gives 88

Abba: ^(?!.*(.)\1)|ef gives 195

Power 2 : 88

^(?!((xxx)+x|xx|)\1*$)

would be interesting to have the best scores for hard mode too

Balance 336 points
^<.*>$|^$
Matches add but also matches some of the others sides.

Is Balance input updated?
^(<(<(<(<(<(<(<>)>)>)>)>)>)>)*$
443 points

Powers 2(hard mode): ^(?!((xxx)+(x|xx)|xx|)\1*$)
83 points

Typist 189 points
^[^h-puy]+$

altoo commented

Powers 2(hard mod): ^(?!((xxx)+xx?|xx|)\1*$)
86 points

Addition 165 points
^(x*)(x*)...(x*)(x*)...\1\3...\2\4$

I can't figure out how you're supposed to cancel formatting inline; any pointers?

Glob (336 - hard)

^(.*)(\*?)(.*)(\*?)(.*)(\*?)(.*) .* \1((?!\2).+|\2)\3((?!\4).+|\4)\5((?!\6).+|\6)\7$

Powers 2 (86 - hard)

^(?!((...)+..?|..|)\1*$)

It seems like

^(?!((...)*..?|)\1*$)

should work, but it doesn't match anything and I can't figure out why.

Note: the following are obsoleted by https://gist.github.com/Davidebyzero/9090628, so I won't bother reporting any more solutions, but I'll leave these here anyways since they represent solutions I developed myself or tweaks/optimizations I made to others' solutions.

Subtraction (182 - both)

^(.*)(.*)- \2= \1$

Addition (167 - both)

^(.*)(.+) .(.+)(.*) =\3\1 . \2\4$

or

^(.+)(.*) .(.*)(.+) =\3\1 . \2\4$

or

^(.+)(.*) .(.+)(.*) =\3\1 . \2\4$

or

^(.+)(.*) .(.+)(.+) =\3\1 . \2\4$

Or if you're boring, you can just do this :tongue:

^(.*)(.*) .(.*)(.*) =\3\1 . \2\4$

Anyway (168 - normal; 68 - hard)

^[aeiouy]?(.[aeiouy])+[^aeiou]?$

This works for normal mode, but fails the extended cases for hard. I feel like I'm missing something obvious here.

Tic-tac-toe (142 - normal; 42 - hard)

(([OX])\2\2|([OX])...\3...\3|([OX])....\4....\4)

This doesn't match "OXX OX. XO.", but that doesn't look like a won/winnable tic-tac-toe game.

Edit: My current totals, after reading through a lot of comments, are: default levels: 4075 (normal), 3801 (hard); bonus levels: 1736 (normal), 1731 (hard). As far as I can tell, these are the best possible scores given currently known regexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.