{{ message }}

Instantly share code, notes, and snippets.

# Davidebyzero/gist:9090628

Last active Nov 25, 2020
Solutions to teukon's Draft Regex Golf Bonus Levels (SPOILERS! SPOILERS!) and discussion of mathematical regexes

### Davidebyzero commented Feb 19, 2014

#### 1.0) Subtraction

182 points: `^(x*)(.+)- \2= \1\$`
180 points: `^(x+)(x+) - \2 = \1\$`

#### 2.0) Anyway

171 points (robust): `^((^|[aeiouy])([^aeiou]|\$))*\$`
161 points (robust): `^[^aeiou]?([aeiouy][^aeiou])*[aeiouy]?\$`
176 points (exact): `^[ae]?(.[aeiouy])*[^e]?\$`
174 points (exact): `^.?([aeiouy][^e])*[aeoy]?\$`

177 points: `^(.*)(.*)(.*) = \1\3\2\$` or `^(.*)(.*)(.*) = \2\1\3\$`
160 points: `^(x*)(x*)( . )(x*)(x*) = \1\3?\2\4\3?\5\$`

#### 4.0) Coprime

182 points: `^(?!(xx+)\1* \1+\$)`

#### 5.0) Typist

189 points: `^[^yuh-p]+\$`

#### 6.1) Dominoes

172 points: `^.?((.).? (\2|(?=.\2)))*\S+\$`
152 points: `^((.)(.) (?=(.?)(\2|\3)(.?)(\$| .?(\4\6).?)))*..\$`

#### 11.1) Dominoes 2

172 points: `^.?((.).? (\2|(?=.\2)))*\S+\$`

#### 7.0) Modulus

170 points: `^(?=.*% (x+))(?=\1*(x*)).* \2\$`
169 points: `^(?=.* (x+) = (x*))\1*(?!\1)\2 `
167 points: `^(?=.* (x+) )\1*(?!\1)(x*) .* \2\$`
161 points: `^(?=.* (x+) )(?=(\1*))\2(x*) % \1 = \3\$`
152 points: `^(x+)((?=(\1*))\3(x*) % \1 = \4| % \1(x+) = \1)\$`

#### 8.0) Euclid

174 points: `(?=\((x+)\1*, \1*\)).* \1\$`
173 points: `(?=(\((x+)(\2|\W)+= ))\1\2\$`
172 points: `(?=(\W(x+)\2*, \2+\W+))\1\2\$`
170 points: `\((?=((x+)\2*, \2+\)))\1 = \2\$`

#### 9.0) Latin squares

159 points (robust): `((.)(?!(.{5})*.{4}\2)(?=\S* \S*\2)| ){15}`
153 points (robust): `^(?!.*(\S)(.{5})*.{4}\1)((.)(?=(.* .*\4){3}))+ `
141 points (robust): `^(?!.*(\S)((.{5})*.{4}|\S*)\1)(.)(.)(.)(.)( |\4|\5|\6|\7)+\$`
134 points (robust): `^(?!.*(\S)(.{5})*.{4}\1)(.)(.)(.)(.)( ((\3|\4|\5|\6)(?!\w*\9))+)+\$`
143 points (exact): `^(.)(.)(.)(.)( ((\1|\2|\3|\4)(?!(\w*|(.{5})*.{4})\7))+)+\$`

#### 10.1) Matryoshka

168 points: `^((\w*)(\w*) (?=\2\w+\3\b))*\w+\$`

#### 12.0) Xor

162 points: `^(?!.*(.).{12}(0.{10}(?!\1)|1.{10}\1))`
160 points: `((.)(?=.{12}(\2.{10}0|(?!\2).{11}1))){8}`

#### 13.0) Tic-tac-toe

169 points: `(\w)(\1|..(\1|.\1.|..\1..)..)\1`

### teukon commented Feb 19, 2014

 Nice work. I have a 171-point robust solution for Anyway and a 165 point robust solution for Addition. I can post them if you want them but first wanted to let you know that improvements are possible.

### Davidebyzero commented Feb 19, 2014

 I'd like to try to find those solutions myself. I do have two questions regarding Anyway, though: Am I right about the conditions for robustness? If a `y` is phonologically a consonant, does that disqualify it from fitting into a vowel slot? Would `may`, `honey`, `holiday`, `aye`, etc., belong on the left or right? (i.e., if a `y` is phonologically a vowel, does that disqualify it from fitting into a consonant slot?) I'm guessing since you put `anyway` and `key` on the left side, the answer to #3 would be "on the left", and by extension, the answer to #2 would be the same. (Also the fact that it'd be impossible to solve with a regex otherwise.) But I thought it was an interesting point to bring up. Edit: I think there may actually be no words that satisfy #2.

### teukon commented Feb 19, 2014

 Yes that was the intent. Do you think the example strings need work?

### Davidebyzero commented Feb 19, 2014

 I think I've beaten your Addition solution. Pretty sure `^(.*)(.*)(.*) = \1\3\2\$` (177) is robust.

### Davidebyzero commented Feb 19, 2014

 Anyway: `^((^|[aeiouy])([^aeiou]|\$))*\$` (171) Cool trick :)

### teukon commented Feb 19, 2014

 Beautiful solution to Addition. Well done! I had `^(x*)(x*) . (x*)(x*) = \1\3 . \2\4\$`. Yes, that's what I have for Anyway. I think it's worth keeping just for the pretty golf.

### Davidebyzero commented Feb 19, 2014

 Dominos: `^((.)(.) (?=(.?)(\2|\3)(.?)(\$| .?(\4\6).?)))*..\$` (152) I have a feeling the solution you have in mind is shorter...

### Davidebyzero commented Feb 19, 2014

 Modulus: `^(x+)((?=(\1*))\3(x*) % \1 = \4| % \1(x+) = \1)\$` (152) For now, it seems like I have to treat a divisor greater than the dividend as a special case... I wonder if there is a way around that.

### Davidebyzero commented Feb 19, 2014

 Euclid: `\((?=((x+)\2*, \2+\)))\1 = \2\$` (170)

### teukon commented Feb 19, 2014

 You've been busy I see. I have 158 156 for Dominoes. My solution is very similar to yours so you can gain some points with a few tweaks. I'm sure you can do much better than that with Modulus. Your Euclid solution is excellent. I completely overlooked using greediness in this way and ended up with only 157 points. Combining the tricks of my solution with yours allows me to score 173 points.

### teukon commented Feb 19, 2014

 It's been a while so I'll state that the intended robust solution to Typist was `^[a-gq-tv-xz]*\$` (185 points). These are the letters under the left hand on the qwerty keyboard layout. All the words to be matched are difficult to touch-type on a qwerty keyboard.

### Davidebyzero commented Feb 19, 2014

 I started out with `^[a-gq-tv-xz]*\$` for Typist, and then inverted it, assuming that the domain was alphabetical characters only. I thought the inversion was what you intended. Actually I started out with `^[qwertasdfgzxcvb]*\$`, then sorted it, collapsed the contiguous ranges, then inverted it. Your Euclid solution is excellent. I completely overlooked using greediness in this way and ended up with only 157 points. Combining the tricks of my solution with yours allows me to score 173 points. Oooh.

### teukon commented Feb 19, 2014

 Doh! Somehow I had it in my mind that inverting made it longer. You're quite right and your solution is better than mine. You may assume that all strings to be tested in Typist are of the form `^[a-z]+\$` so your solution is robust.

### Davidebyzero commented Feb 19, 2014

 Latin squares: `^(.)(.)(.)(.)( ((\1|\2|\3|\4)(?!(\w*|(.{5})*.{4})\7))+)+\$` (143)

### teukon commented Feb 19, 2014

 Nice solution to Latin squares. My solution was 12 characters longer: `^(?!.*((\w)\w*(\2| (?!\w*\2))|\b(\w)(.)(.)(.).* (\4|.\5|..\6|...\7)))`

### Davidebyzero commented Feb 20, 2014

 Thanks! Modulus improvement: `^(?=.* (x+) )(?=(\1*))\2(x*) % \1 = \3\$` (161)

### teukon commented Feb 20, 2014

 Much better. There's still plenty of room for improvement though. My current solution scores 169 (and ends with a space character).

### Davidebyzero commented Feb 20, 2014 • edited

 Modulus: `^(?=.* (x+) = (x*))\1*(?!\1)\2 ` (169) gist.github.com's Markdown can't handle code blocks beginning or ending with spaces. It ate the trailing space. I had to use ...

### teukon commented Feb 20, 2014

 Well done.

### Davidebyzero commented Feb 20, 2014

 Thanks. You too! Matryoshka: `^((\w*)(\w*) (?=\2\w+\3\b))*\w+\$` (168) I'm having a lot of trouble getting the Euclid solution to 173 points.Can't seem to improve on `(?=(\W(x+)\2*, \2+\W+))\1\2\$` (172)

### Davidebyzero commented Feb 20, 2014

 Latin squares (robust): `^(?!.*(\S)(.{5})*.{4}\1)(.)(.)(.)(.)( ((\3|\4|\5|\6)(?!\w*\9))+)+\$` (134) 9 characters longer. So sad.

### teukon commented Feb 20, 2014

 My 173-point solution is fairly similar to your 172-pointer. Would you like my solution? or a hint? My Matryoshka solution is practically identical to yours. It's not so difficult to solve, but it was a royal pain to construct. I noticed that neither of our Latin squares solutions were robust so I tweaked the solution strings. I spent some time looking for a good robust solution (164 points), tweaking to stop cheating as I went.

### Davidebyzero commented Feb 20, 2014

 Still can't get 173 on Euclid, but don't want a hint.Found a silly alternative 172, though: `(?=(\((x+)(\2|\W+)+= ))\1\2\$` - would make a good cat toy.

### Davidebyzero commented Feb 20, 2014

 Euclid: `(?=(\((x+)(\2|\W)+= ))\1\2\$` (173)

### Davidebyzero commented Feb 20, 2014

 My Matryoshka solution is practically identical to yours. Not 100% identical, really? I don't see anything that can differ, except for the last two `*` and `+`, and changing some or all of the `\w` to `\S`. It's not so difficult to solve, but it was a royal pain to construct. The hard part is creating the right column, right? You've got to put in lots of entries that are almost, but not quite, matches. Anyway, I've really enjoyed your challenges so far. Very nicely done.

### teukon commented Feb 20, 2014

 Not 100% identical, really? I don't see anything that can differ, except for the last two * and +, and changing some or all of the \w to \S. Exactly. You know I like `*`. Anyway, I've really enjoyed your challenges so far. Very nicely done. Thanks. And thank you for helping me debug them.

### teukon commented Feb 20, 2014

 Well done on making it to 173 points. Interestingly, my solution was significantly different. Euclid (173): `\((?=((x+)(\2*. ){3}))\1\2\$`

### Davidebyzero commented Feb 20, 2014

 My cat helped me get 141 on Latin squares. `^(?!.*(\S)((.{5})*.{4}|\S*)\1)(.)(.)(.)(.)( |\4|\5|\6|\7)+\$`

### Davidebyzero commented Feb 20, 2014

 My other cat pitched in and we came up with `^(?!.*(\S)(.{5})*.{4}\1)((.)(?=(.* .*\4){3}))+ ` (153)

### Davidebyzero commented Feb 20, 2014

 Here's an exact solution for 9.2 that only shows its flaws in 9.0 and 9.1: `^(((\S)(?!(.{5})*.{4}\3)(?=.* .*\3))+ )+\S+\$` (156 on 9.2, 136 on 9.0 and 9.1)

### teukon commented Feb 20, 2014

 Nice progress, although it does seem that "other cat" is carrying the team. I really need to be careful with the strings in Latin squares. I'll study your latest solution and see if I can tweak things without breaking other natural flawed solutions.

### Davidebyzero commented Feb 20, 2014

 Why not allow yourself more than 20 test cases per column? I have space on my monitor for at least 53 test cases per column, at the default font size and landscape orientation. Not that I'm saying you should be that lavish (maybe even 25, 30 or 40 would be fine), but even those who don't have enough space on their monitor can scroll down or zoom out if need be.

### teukon commented Feb 20, 2014

 It's not about whether or not you'd have to scroll (I'm already scrolling at 20). I just picked a nice number and decided to stick to it. Even with 50 test cases I'd miss things. On looking at the problem with your almost-robust solution, I discovered that mine had a similar bug. A fix cost me 5 characters (so 159 points).

### Davidebyzero commented Feb 20, 2014

 Latin squares: `((.)(?!(.{5})*.{4}\2)(?=\S* \S*\2)| ){15}` (159) Note that `((.)(?!(.{5})*.{4}\2)(?=\S* \S*\2)| ){13}` and `^((.)(?!(.{5})*.{4}\2)(?=\S* \S*\2)| ){11}` also work. Not sure if you want to bother fixing that, because they don't score any better that with `{15}`. But this makes me wonder if you came up with the same solution.

### teukon commented Feb 20, 2014

 My solution was `((.)(?=\w* \w*\2)(?!.{4}(.{5})*\2) ?){12}` which is very similar to yours. I believe this and your `{15}` solution really are robust. Thanks for the note, and yes, I'm not so worried about the `{11}` and `{13}` exact solutions. There are probably many ways of breaking things like that, e.g. `((.)(?!.{4}\2|.{9}\2)(?=\S* \S*\2)| ){15}`. How are you getting on with the latest problems? I guess Dominoes 2 is a more appropriate difficulty for you (you've just stormed through the easier ones). I'm of half a mind to challenge you with what I believe is roughly a 9-star problem.

### Davidebyzero commented Feb 20, 2014

 Hope you don't mind if I do these a little out-of-order. Xor: `((.)(?=.{12}(\2.{10}0|(?!\2).{11}1))){8}` (160) Tic-tac-toe: `(\w)(\1|..(\1|.\1.|..\1..)..)\1` (169) Now for Dominoes 2...

### teukon commented Feb 20, 2014

 Not at all. Your Tic-tac-toe solution is character-for-character identical to mine. I have 162 points for Xor. I changed the difficulty for Dominoes 2 to 6 stars because my solution is long. I only have 89 points. Hopefully you can do better.

### teukon commented Feb 20, 2014

 Just an update. I'm making slow but steady progress on Dominoes 2: 92 95 97 98 points.

### Davidebyzero commented Feb 21, 2014

 Did you model Dominoes 2 after a real set of Dominoes rules that is actually played? Whatever they are, they seem to be really convoluted. Really, `64 46 46 66 26 26 26 32 23 23 23` is a match but `56 66 26 26 26 62 22 62 62 62` is not? Whaaaaat. The closest I've been able to get is `^((.)(.) (?!(?:\3\2|\2\3) (?!\3\2)\2\3 ((?!\2)).\3|(?:\2\3|\3\2) (?!\2\3)\3\2 ((?!\3)).\2)(?=(.?)(\2|\3)(.?)(\$| .?(\6\8).?)))*..\$`, and that is merely trying to find what the rules are, not bothering to optimize yet. Also, perhaps you should add more entries to the right side that are false positives when the regular Dominoes regex is used. Right now, your best Dominoes regex cheatily scores higher (118) on Dominoes 2 than your best robust one (98), because there are only four false positives. (Even my Dominoes regex scores higher, at 112.) Your Tic-tac-toe solution is character-for-character identical to mine. This is awesome.

### teukon commented Feb 21, 2014

 Ah, I didn't realise that the pattern was not clear. The idea of the dominoes problem was to determine if a given string of dominoes could be transformed to a valid dominoes position (with one natural simplifying assumption for how doubles are sometimes treated; no spinners) by rotating pieces in place. For example: `01 21 20` should match because the `21` can be rotated so that each domino touches its neighbours with matching values. `01 12 21 20` should not match because there is no way of rotating the pieces in place to make a valid dominoes position. I hadn't realised that the Dominoes solution scores better in Dominoes 2 with false positives than my Dominoes 2 solution. This really is ugly. Combining this with how subtle the parity issue is makes it clear that the strings need to be completely redone. I'll focus on more short strings at the beginning, drawing attention to the parity issue, and build very gently in complexity, making sure there are plenty of false positives for natural Dominoes solutions.

### Davidebyzero commented Feb 21, 2014

 The idea of the dominoes problem was to determine if a given string of dominoes could be transformed to a valid dominoes position (with one natural simplifying assumption for how doubles are treated) by rotating pieces in place. That... actually only confuses me even more. By "rotating a piece" I assume you mean flipping the first and second digits? That has absolutely no effect on whether a sequence is valid Dominoes or not, whether it's a single piece that's rotated, or all the pieces, or an arbitrary subset of pieces. For example: `01 21 20` should match because the `21` can be rotated so that each domino touches its neighbours with matching values. `01 12 21 20` should not match because there is no way of rotating the pieces in place to make a valid dominoes position. What the hell. My Dominoes regex already works this way, and it had to, otherwise it would have failed to be a robust or even an exact solution for Dominoes. If this were the rule for Dominoes 2, then those would be exactly the same rules as Dominoes.

### teukon commented Feb 21, 2014

 That... actually only confuses me even more. By "rotating a piece" I assume you mean flipping the first and second digits? That has absolutely no effect on whether a sequence is valid Dominoes or not, whether it's a single piece that's rotated, or all the pieces, or an arbitrary subset of pieces. Yes, perhaps "flipping" is a better word than rotating. I consider a valid dominoes sequence as being something like: `01 15 56`. Each piece is touching its neighbours at only matching values. `01 15 65` is not legal in real dominoes, but it would be legal if the `65` piece was flipped. This was the idea behind the Dominoes puzzle: can the given string be transformed into a legal dominoes position just by flipping some of the pieces. With Dominoes 2, I introduced the idea that the same domino could appear twice (in the first puzzle, each sequence is just a subset of a standard 28-piece dominoes set; each domino is different).

### teukon commented Feb 21, 2014

 What the hell. My Dominoes regex already works this way, and it had to, otherwise it would have failed to be a robust or even an exact solution for Dominoes. If this were the rule for Dominoes 2, then those would be exactly the same rules as Dominoes. Your solution to Dominoes matches both `01 12 21 10` and `01 12 21 20`, even though the second string cannot be resolved into a valid domino chain. You can test this here I've hand-crafted a set of strings for Dominoes 2 now to help make things clearer. The first set was computer generated.

### Davidebyzero commented Feb 21, 2014

 Oh, I understand now. In Dominoes 1, only every window of 3 consecutive pieces must be a valid sequence. I didn't even realize at the time that pieces further away could influence whether a sequence was valid, so my Dominoes solution wasn't actually even robust for what I thought the rules for Dominoes 1 were.

### teukon commented Feb 21, 2014

 I see, and I'm glad you worked this out. As the one creating the problems, I have an idea of what I want to do but find it difficult to convey my thoughts with a title and example strings alone. Do you think Dominoes 2 can be salvaged or should it just be scrapped? Are the new new strings better? Are there example strings you would like to see added to the set? Perhaps Dominoes 1 could be tweaked/scrapped. I could give the puzzles more helpful names or I could include textual hints à la Erling's original set.

### Davidebyzero commented Feb 21, 2014

 I think I should defer answering that question until I figure out what's involved in solving Dominoes 2. Right now I can't see how it can be done without faking recursion, which means setting a maximum run length. I got a whopping 6 points! Yay. `^((.)(.) (?=(.?)(\2|\3)(.?)(\$| (.?)(\4\6)(.?)(\$| (.?)(\8\10)(.?)(\$| (.?)(\12\14)(.?)(\$| (.?)(\16\18)(.?)(\$| (.?)(\20\22)(.?)(\$| (.?)(\24\26)(.?)(\$| (.?)(\28\30)(.?)(\$| .?(\32\34).?))))))))))*..\$` Oh, and for your Dominoes 2 update, I get 27 points. `^((.)(.) (?=(.?)(\2|\3)(.?)(\$| (.?)(\4\6)(.?)(\$| (.?)(\8\10)(.?)(\$| (.?)(\12\14)(.?)(\$| (.?)(\16\18)(.?)(\$| (.?)(\20\22)(.?)(\$| (.?)(\24\26)(.?)(\$| .?(\28\30).?)))))))))*..\$`

### teukon commented Feb 21, 2014

 Great. Making progress. Just to clarify: Recursion is not required. But, like Powers from the original set, it might be difficulty to find a way around it. I gave this problem 6 stars for a reason.

### Davidebyzero commented Feb 21, 2014

 I think I've just pulverized your solution to a fine mist :) It even beats our previous Dominoes 1 solutions. Dominoes 2: `^.?((.).? (\2|(?=.\2)))*\S+\$` (172) Dominoes 2: `^.?((.).?(\$| \2| (?=.\2)))*\$` (172 alternative) I'd be very curious to see what your solution was (including the optimization from 89 -> 92 -> 95 -> 97 -> 98, if you saved a record of that).

### Davidebyzero commented Feb 21, 2014

 I think solving the 3-piece-window version of Dominoes first actually made it harder to subsequently solve the general case of Dominoes — because it warped my expectations of how the solution was to be done. I'd imagine the same might apply to other people. But I totally had fun with having it happen that way, and it made solving the general form of Dominoes all the more satisfying! So by all means, keep both versions! It's delightfully evil. But I would indeed recommend putting in little textual hints. Like maybe, "Gaze through a window" in the first one, and "Gaze into infinity" in the second one. Or something like that.

### teukon commented Feb 21, 2014

 Wow, I'll have to look at that. I've just been working on my Dominoes 2 solution and worked it up to 118 points, but it seems I've missed something.

### teukon commented Feb 21, 2014

 Ha! how elegant. How it didn't occur to me to do it this way is beyond me. This realisation makes me want to scrap the first problem and downgrade the second one to 5 stars. It hardly seems worth posting now but here's my old solution. Dominoes 2 (118): `^((.)(.) (?=((.?)(\2|\3)(\d?))(?=((( (\4|\7\6\5)?){2})*))\8(\$|\d?\6| .?\5\7)))*..\$` While it employs a number of tricks, it's undoubtedly ugly compared to your solution. Well done indeed!

### teukon commented Feb 21, 2014

 As you expressed interest in my progress: (89) `^((.)(.) (?=((.?)(\2|\3)(.?))( \4| \7\6\5)?(( \4| \7\6\5){2})*(\$| (?!\4|\7\6\5)((?=\8).?\5\7|(?!\8).?\6))))*..\$` (92) `^((.)(.) (?=((.?)(\2|\3)(.?))(?=((( \4| \7\6\5){2})*( \4| \7\6\5)?))\8(\$| ((?=\11).?\5\7|(?!\11).?\6))))*..\$` (95) `^((.)(.) (?=((.?)(\2|\3)(.?))(?=(((( \4| \7\6\5)?){2})*))\8(\$| (\10(?!\8)).?\6| ((?!\10)|\8).?\5\7)))*..\$` (97) `^((.)(.) (?=((.?)(\2|\3)(.?))(?=((( \4| \7\6\5){2})*( \4| \7\6\5)?))\8(\$| \11.?\5\7| (?!\11).?\6)))*..\$` (97) `^((.)(.) (?=(.?)(\2|\3)(.?)(?=(((( \4\5\6| \6\5\4)?){2})*))\7(\$| (?!\8)\9.?\5| (\8|(?!\9)).?\4\6)))*..\$` (98) `^((.)(.) (?=(.?)(\2|\3)(.?)(?=(((( \4\5\6| \6\5\4)?){2})*))\7(\$| .?((?!\8)\9\5|(\8|(?!\9))\4\6))))*..\$` (110) `^((.)(.) (?=(.?)(\2|\3)(.?)(?=\$| )(?=((( (\4\5\6|\6\5\4)?){2})*))\7(\$|\d?\5| .?\4\6)))*..\$` (116) `^((.)(.) (?=(.?)(\2|\3)(\d?)(?=((( (\4\5\6|\6\5\4)?){2})*))\7(\$|\d?\5| .?\4\6)))*..\$` (118) `^((.)(.) (?=((.?)(\2|\3)(\d?))(?=((( (\4|\7\6\5)?){2})*))\8(\$|\d?\6| .?\5\7)))*..\$` It's almost a shame that your solution works; there's a lot of fun golf here.

### teukon commented Feb 21, 2014

 I think I have everything I need now to publish a decent complete set of bonus levels. I'll be cutting some stuff and re-ordering what's left. If you have a great idea for a puzzle which you think would go well with the set we have, now would be the time to share it.

### Davidebyzero commented Feb 21, 2014

 Here is one of my puzzle ideas: Powers 2 My solution has md5sum bdf0e3052c867d1a884cea5751ab3c88.

### teukon commented Feb 21, 2014

 May I ask how many points that scores?

### Davidebyzero commented Feb 21, 2014

 To say so might be a spoiler.

### teukon commented Feb 21, 2014

 Hmm... ok I've got 48 points. I can't hit your md5sum though. Edit: Also, my computer pauses for nearly 2 seconds every time I try to change my regex. :(

### Davidebyzero commented Feb 21, 2014

 Nice, you beat mine. I got `^((?=(x+)(\2{5})\$)\3)*x\$` (46) Would your technique beat 93 in Powers 1?

### teukon commented Feb 21, 2014

 I had `^((x+)\2{4}(?=\2\$))*x\$` so I guess it's just a bit tweaked. Anyway, nice problem! Counter-attack: Prime powers

### Davidebyzero commented Feb 21, 2014

 Nice! But I tried that in Powers 1 and it didn't work, so I didn't think of trying it on Powers 2... I don't understand why `^((x+)(?=\2\$))*x\$` is not a solution for Powers 1. Can you explain it?

### teukon commented Feb 21, 2014

 Your solution has a pretty way of using the `{5}` twice. I tried to tweak it to beat my own but I can't seem to. Also, I just checked, and my solution, tailored to Powers, just scores 93 (same as the usual method).

### teukon commented Feb 21, 2014

 I don't understand why `^((x+)(?=\2\$))*x\$` is not a solution for Powers 1. Can you explain it? What Powers 1 are you looking at? It works for me. Maybe the site doesn't like you.

### Davidebyzero commented Feb 21, 2014

 Holy crap, looks like I found a major bug in Opera 12.16's ECMAScript engine. `^((x+)(?=\2\$))*x\$` does not work as a solution for Powers. I tried it on another browser and it did work. Do you already have a solution in mind for Prime powers? If so have you implemented it yet?

### teukon commented Feb 21, 2014

 I have a solution for Prime powers now. md5sum: 58bfe7c876dcf4d53aee8b121eecb556. I'll try and tweak it. Imagine a problem which asks you to separate square numbers from non-squares. >:D Update: Yes, I was working on it while you were fighting with Opera. It's not too hard really.

### Davidebyzero commented Feb 21, 2014

 I don't know how you did Prime powers so fast. I'm having a lot of trouble stemming from the fact that there's no non-atomic lookahead. BTW, did you see my message about how my solution for Dominoes 2 gave me the idea to use that technique on Powers (and non-prime powers)? It was up for a short time, and then I deleted it because it occurred to me that I could use that as a puzzle for you.

### teukon commented Feb 21, 2014

 I didn't see that comment no. I did see one comment which disappeared but it didn't mention anything about powers. Actually, I thought your problem was more difficult. I started on it immediately, thinking of Powers, Modulus, and Euclid. This wasn't getting me anywhere but I very quickly realised that I could do Prime Powers. I spent a few more minutes thinking about the powers of 6, and asked for your score at this point (wondering if the solution was really long) when I somehow stumbled into your intended approach. I guess this was because I'd just been studying your solution to Dominoes 2 and it put me in the right frame of mind. While I was waiting for your replies, I put a very rough Prime Powers problem together (this takes 2 minutes thanks to OEIS and three lines in the python interpreter). But I didn't have time to try and solve it before you responded.

### teukon commented Feb 21, 2014

 I don't know how you did Prime powers so fast. I'm having a lot of trouble stemming from the fact that there's no non-atomic lookahead. I guess it all depends on your frame of mind. Look what happened to me with Dominoes 2. What does non-atomic lookahead mean?

### Davidebyzero commented Feb 21, 2014

 What does non-atomic lookahead mean? Lookahead with backtracking. Such that the regex engine tries all possible matches to the lookahead group until it find one that causes the expression after the lookahead group to match.

### teukon commented Feb 21, 2014

 I see. While I see how useful that would be in general, I'm interested in an approach to Prime powers that would use this. Hmm...

### teukon commented Feb 22, 2014

 I've just been looking at Powers 2 again and it got me wondering: was there some special reason for preferring 6 over 4? I was having trouble playing around with it because of the 46656 character string so I built a leaner set and just thought I'd share it. Despite only having 5 strings to match, and 20 to miss, I can't find an exact solution which is shorter than the robust one.

### Davidebyzero commented Feb 22, 2014

 Well, I wanted it to be too hard to do the test by using prime divisibility tests. Power of 4 could be tested just by seeing if it was divisible by 2 an even number of times, and no other primes. But power of 6 would have to be tested by having the number of times it's divisible by 3 equal to the number of times it's divisible by 2, and divisible by no other primes. The latter seems more complex than the former.

### teukon commented Feb 22, 2014

 I see. However, the "even number of times" bit seems quite hard to me. Is there some trick I'm missing.

### Davidebyzero commented Feb 22, 2014

 I don't know how to do either, but "equal number of times" seemed like it'd be harder. After you asked me about it 45 minutes ago, though, I had second thoughts. Now I think "even number of times" might actually be the harder one. P.S. Maybe on a subconscious level, I wanted to see what you'd come up with if you were crazy enough, and brilliant enough, to try for the "equal number of times" test ;)

### Davidebyzero commented Feb 22, 2014

 I finally solved Prime Powers. But you made a mistake and put 1 in the "not prime powers" column. Here is a correctly done Prime powers. And here is my solution: `^(?!((xx+)(?=\2*\$)x+)(?=\1*\$)\1*(?!\$)\2*(?!\2|\$))` (151) Now I'm dying to see what yours is. Or, at least having a hint, and/or your score.

### teukon commented Feb 22, 2014

 Here's mine: `^(?=(xx+?)\1*\$)(?!(xx+)\2*(?!\1*\$)\2\$)` (162). Since you've added 1 to the left-side, it only scores 152. However, I've never regarded 1 to be a prime power myself. Some mathematicians take 1 as a prime power but it's unusual in my experience. Not that these sources count for much, but both Wikipedia and Wolfram start from 2, and while OEIS does start with 1, the first comment on the page is that 1 is sometimes omitted. Perhaps it would be best to omit 1 from both sides.

### Davidebyzero commented Feb 22, 2014

 That's very nice. I see you used the same basic tactic that I ended up using. But I'd like to see your md5sum 58bfe7c876dcf4d53aee8b121eecb556 one please. The one you had 4 hours ago. Also, combining a trick from yours with mine, `^(?!((xx+)(?=\2*\$)x+)(?=\1*\$)\1*(?!\2*\$)\1\$)` (156)

### teukon commented Feb 22, 2014

 That's the one. Unless I'm making some horrible error. Maybe I just don't know what I'm doing on the command line. I'm running `echo '^(?=(xx+?)\1*\$)(?!(xx+)\2*(?!\1*\$)\2\$)' | md5sum` in bash.

### Davidebyzero commented Feb 22, 2014

 Oh, you didn't use `echo -n`. You did the md5sum with the newline at the end.

### teukon commented Feb 22, 2014

 Doh! I can be such a noob at times. :p

### teukon commented Feb 22, 2014

 I still haven't managed to work out how your solution works. Can you help me out?

### Davidebyzero commented Feb 22, 2014

 It's almost the same as yours: PrimePower(N): It is not true that for any two factors of N, a and b, such that a

### teukon commented Feb 22, 2014

 `^(?!((xx+)(?=\2*\$)x+)\1*(?!\2*\$)\1\$)` (164)

### Davidebyzero commented Feb 22, 2014

 `^(?!((xx+)(?=\2*\$)x+)\1*(?!\2*\$)\1\$)` (164) Very nice! It heartens me to see that a simpler (shorter) solution than both of our original ones made the match with 1 fall in line. But I still think you're probably right and 1 should probably be omitted from both columns. If for no other reason than to potentially see more variety in people's solutions.

### teukon commented Feb 22, 2014

 Aha, I'm with you. Nice approach!

### teukon commented Feb 22, 2014

 Well, I've really got to go to bed. I have a load of work to get done tomorrow.

### Davidebyzero commented Feb 22, 2014

 Goodnight! Today was a good day.

### teukon commented Feb 22, 2014

 It heartens me to see that simplying your solution made the match with 1 fall in line. But I still think you're probably right and 1 should probably be omitted from both columns. If for no other reason than to potentially see more variety in people's solutions. Indeed. I think these two different approaches emulate how different people think about prime powers, and hence why there is ambiguity with 1 in the first place.

### teukon commented Feb 22, 2014

 Goodnight! Today was a good day. Yes, the most fun I've had in a while. Cheers!

### Davidebyzero commented Feb 22, 2014

 Interestingly, it works even without excluding 1 as a factor: `^(?!((x+)(?=\2*\$)x+)\1*(?!\2*\$)\1\$)` (165) Heck, it works even without explicitly excluding 0 as a factor, and without explicitly making the second factor larger.

### teukon commented Feb 22, 2014

 Very nice. It probably wouldn't have occured to me to allow 0 as a factor, but then again, my name's not Davidebyzero. It would be a shame to have made so much progress and leave my primary task unfinished, so I took an hour to clean up the problem set I'd like to publish. You can see it here. I felt that things were getting a bit mathematics-heavy so I've not included our most recent stuff and have cut out Coprimes, Xor, and the evil Dominoes warm-up. Please feel free to reserve the first couple of comments if you'd like to present these extras and/or your own problems (I liked the "Powers of six" problem), or would like to post md5sums of your solutions. It would be good to get a bit of padding so that people with lots of vertical resolution don't see other's solutions when they visit the gist.

### Davidebyzero commented Feb 22, 2014

 Imagine a problem which asks you to separate square numbers from non-squares. >:D I've done more than imagine it! Perfect squares My solution has md5sum 7d0406080811edd44523b03247bcf201. I won't tell you its exact score yet, but it's between 80 and 220.

### Davidebyzero commented Feb 22, 2014

 I felt that things were getting a bit mathematics-heavy Aw, but that's the best part. And I still haven't matched or beaten your Xor score, which speaks well for that problem. But I definitely don't think it'd be complete without Coprimes. Also, I thought that when your problem set was ready, you'd make a clone page of http://regex.alf.nu with your problems on it, so that people's scores are saved and they don't have to open a separate page for each problem. I really like that you included the domain for each problem, though! That's very cool. (I liked the "Powers of six" problem) Actually I think you were right, powers of four would be better. It would be good to get a bit of padding so that people with lots of vertical resolution don't see other's solutions when they visit the gist. Good thinking. But should I put md5sums or scores? Or both? Maybe it'd be a good idea to post this on a forum that supports spoiler tags? Perhaps xkcd forums, though I'm not sure what subforum it should go under (Logic Puzzles? Computer Science? Coding?). You've probably seen the Regex Golf comic.

### teukon commented Feb 22, 2014

 I've done more than imagine it! Perfect squares My solution has md5sum 7d0406080811edd44523b03247bcf201. I won't tell you its exact score yet, but it's between 80 and 220. Ha! I thought this was possible but taxing. Alas, I won't have the time to look at this until at least Thursday of next week. Well done. Can you imagine how "Squares" would have been received if it were included alongside "Primes" and "Powers" in the original set. I think many people would have been claiming the problem impossible to do without cheating (heck, some were claiming that of Powers).

### Davidebyzero commented Feb 22, 2014

 BTW, `x{7560}` and `x{7920}` break pcregrep, when testing my Perfect Squares regex, as do all perfect squares `x{5625}` or higher. ``````bash-3.1\$ printf '%*s' 7920 ''|tr ' ' x|pgrep 'CENSORED' pcregrep: pcre_exec() gave error -8 while matching text that starts: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx pcregrep: Error -8, -21 or -27 means that a resource limit was exceeded. pcregrep: Check your regex for nested unlimited loops. `````` Another version, running on a different machine, says `grep: exceeded PCRE's backtracking limit` at the same thresholds, so it's probably being limited by a built-in limit, not available memory. EDIT: Using the --match-limit option allows preventing this error from occurring.

### Davidebyzero commented Feb 22, 2014

 Okay, a naive exact solution to Perfect squares as I presented it is `^(x|x{4}|x{9}|x{16}|x{25}|x{36}|x{49}|x{64}|x{81}|x{100}|x{121}|x{144}|x{169}|x{196}|x{225}|x{256}|x{289}|x{324}|x{361}|x{400}|x{441}|x{484}|x{529}|x{576}|x{625}|x{676}|x{729}|x{784}|x{841}|x{900})\$` which is worth 102 points. So I will reveal that my robust solution scores between 160 and 220. Edit: I really need to improve the right column. `^(x|x{4}|x{9}|x{16}|x{25}|x{36}|x{49}|x{64}|x{81}|x{100}|x{121}x*)\$` scores 233. Edit: Here we go. Perfect squares, fixed. Now scores 20 points higher with robust solutions. My unchanged solution now scores between 180 and 240.

### teukon commented Feb 22, 2014

 Nice to see that you managed to address your problem. It's also nice to see that you're not using any really long strings. I look forward to tackling it myself.

### Davidebyzero commented Feb 22, 2014

 This is a plus in favour of Xor but, in comparison with the other problems, the solution seemed a bit clunky/ugly. Hmm. Xor: `^(?!.*(.).{12}(0.{10}(?!\1)|1.{10}\1))` (162) Don't know why I didn't do it that way immediately. Well, I still think it would still make a good easy-level one, but I'm no longer broken-up about seeing it dropped.

### teukon commented Feb 22, 2014

 Character-for-character identical again! I don't think Xor is a bad problem. I was just cutting what I felt was silver to maximise the visibility of the gold. I wanted to make the bonus set as fun as possible so that more people feel motivated to make their own problems.

### Davidebyzero commented Feb 23, 2014

 Well, I wrote a program to simulate the algorithm my perfect-square regex uses. You see, I had a long version of it which I was sure was robust, but then I found that removing a huge portion of the string seemed not to compromise its robustness — but I was no longer sure it was truly robust with this portion removed. Turned out I was right to wonder. The short version indeed isn't robust. It gives false positives on two numbers greater than 9500 — too large for the regex engine to test. (I'm not revealing exactly which numbers, to avoid giving too many spoilers about how my algorithm works.) I've tested up to 170,000 so far, and it's still just those two false positives, but I've every reason to expect that as it goes to infinity there will be an infinite number of false positives. So I will go back to saying it scores between 100 and 240 points... hopefully I'll be able to find a robust way of shortening it. Also, I'm going to need to create a smaller test sample. My latest one (with 32 perfect squares on the left, and 100 non-matches on the right) can get pretty laggy. Edit: Oh dear! The larger of the two false positives happens even in the long version... I'm beginning to wonder if maybe this is impossible.

### teukon commented Feb 23, 2014

 Edit: Oh dear! The larger of the two false positives happens even in the long version... I'm beginning to wonder if maybe this is impossible. Really? I'm quite confident it's possible, just very difficult. Ah, if only I had the time. :( If you do discover that the problem is impossible, please don't tell me.

### Davidebyzero commented Feb 23, 2014

 Actually I have another idea now :) It might work... Though I if it does, I'm not sure I'll be able to prove it. It'd probably help if I knew more number theory. Edit: Oh wow. My idea made the algorithm much faster and much more robust (up to 28,400,000 so far, and counting [EDIT: as of 17 days after this comment was made]). And it's "short" again. Though I'm still going to have to prove that it's truly robust (or isn't). Which I have no idea how to do. For all I know it might be as hard to prove as Fermat's Last Theorem. Edit: Made a regex string out of this improved algorithm. It allows the regex engine to work up to larger numbers than before, with the same --match-limit.

### teukon commented Feb 23, 2014

 Right. I got up 2 hours early this morning to have a go at this problem. I was going around in circles for a long time but finally found what I think is a solution. md5sum: feda71258877293543e5cad3896bb13c (no trailing newline this time). I've got to move now but I expect this will prey on my mind and I'll want to come back to test it's validity further and/or tweak it for length/speed.

### Davidebyzero commented Feb 23, 2014

 I think this is of sufficient complexity that we're highly unlikely to have character-identical strings. But FWIW, my shortest candidate is 27a3f7250f873fdf097d902fd81b89a3. I see no reason not to share scores now; this solution is 101 characters long and scores 219 points on this test set (the last one I posted).

### Davidebyzero commented Feb 23, 2014

 Reduced it to 91 characters (229 points) with the algorithm identical as before: 69aafe0481051e148a6c8eafa25dd3fe

### teukon commented Feb 23, 2014

 It looks like I'm about to have my revenge for Dominoes. My solution was 65 characters long, but I've managed to compress it to 52 characters (268 points): d0532dd568a9189c32477bfce3d6d659

### Davidebyzero commented Feb 23, 2014

 Nice!! And funny how simply seeing a score can give hints. I now have an idea for an obviously robust way of doing it! :) Now all that remains is implementation... Edit: Er, no. That won't work.

### teukon commented Feb 23, 2014

 Thanks. I've just realised that your test set starts with 1 rather than 0. If I'm allowed to ignore the empty string, I can save one more character.

### Davidebyzero commented Feb 23, 2014

 Are you sure your solution is robust? i.e. have you only been able to test it empirically, or is it obviously robust? BTW, my regex in its minimal form matches zero. It would have to be made longer to not match zero.

### teukon commented Feb 23, 2014

 I'm not certain, no. It seems like the algorithm should work to me, but I've not proven that it works. It certainly copes with the test set, but you already mentioned that you've experienced problems with some 4-digit numbers, so it's possible mine has such a flaw. It would be good to know what these numbers were so I could try them. I've got a bit of time now so I'll try and prove that my algorithm works. I'll get my laptop to check some larger numbers in the background.

### Davidebyzero commented Feb 23, 2014

 I'm not certain, no. It seems like the algorithm should work to me, but I've not proven that it works. In that case perhaps you shouldn't examine my solution yet. I've edited it out. But I will tell you, some potential problem numbers are 5265, 17920, 216513, 393472, 690625, 1184625, 1792000, if your algorithm is anything like mine.

### teukon commented Feb 23, 2014

 Ah, my algorithm doesn't take a 2n+1 style route. I did spend some time thinking about biting off pieces of x in such a way that the property of being a square was preserved at each step, but my own grasp of number theory was insufficient for me to make much progress. When I say I'm not certain that mine works, I'm just being very cautious. The mathematical part is straight-forward, but there may be something in the regex that doesn't behave as I expect. A careful inspection of my solution has largely convinced me that it is robust. My regex doesn't match the first few of your problem numbers. It works for the first 10000 numbers at least.

### Davidebyzero commented Feb 23, 2014

 When I say I'm not certain that mine works, I'm just being very cautious. The mathematical part is straight-forward, but there may be something in the regex that doesn't behave as I expect. Ah, well that's very different from my situation. I'm not sure my mathematics are sound, but I'm 100% sure that the regex is working how I think it is.

### teukon commented Feb 23, 2014

 I did catch a glimpse of your solution with the algebraic robustness condition. Could you post that up again given that our algorithms are different. I'd like to work out what it's doing.

### Davidebyzero commented Feb 23, 2014 • edited

 `^(x\$|(x(x+))(?=\2+\$)(?=\3+\$)(?=x(x\2)*x\$)(?!(x+)(\2\5)+(?=\6\$)(?!(xx+)(?=\7*\$)\5\7*\$))\3)*\$` The robustness of this hinges on a question I haven't solved yet. Does there exist a b = n(n(nd+1)-d) such that its factor n is not coprime with any larger factor of b, such that there exists no e>n that is also a factor of b and for which b-1 is divisible by both e-1 and e+1 and e is not coprime with any factor of b larger than e, where b-2n+1 is a perfect square? (Where all variables are of course all natural numbers.) If it does exist, then my solution is not robust. Edit: d does not have to be an integer. It can be of the form f/2 where f is an integer, iff the resulting b is divisible by n.

### teukon commented Feb 23, 2014

 I don't understand. If n is not coprime with any larger factor of b then, in particular, it's not coprime with ((n^2-1)d+n). But ((n^2-1)d+n) is larger than n, so n cannot be the largest such factor. I'm probably just not parsing this right Edit: Don't worry, you cleared up my confusion with your edit.

### Davidebyzero commented Feb 23, 2014

 b=216513, n=33, d=6 is an example that satisfies all the above conditions except for b-2n+1 being a perfect square.

### teukon commented Feb 23, 2014

 Hmm... I have no intuition about whether or not such a collection of numbers exists, although it does seem plausible that if such numbers do exist that the smallest example is quite large. As for a proof that such numbers don't exist, might you start by ruling out b being a prime power (to kill the "not coprime" elements and focus on the rest)?

### teukon commented Feb 23, 2014

 Edit: d does not have to be an integer. It can be of the form f/2 where f is an integer, iff the resulting b is divisible by n. The iff doesn't seem necessary here. I think that if d is f/2 for some integer f then n will divide b in all cases. My mistake. Iff n is odd.

### Davidebyzero commented Feb 23, 2014

 As for a proof that such numbers don't exist, might you start by ruling out b being a prime power (to kill the "not coprime" elements and focus on the rest)? I don't know what you mean. Here's a list of my search for possible robustness-breaking numbers so far: b0=5265, n=15, d=3/2, b1=5236 b0=216513, n=33, d=6, b1=216448 b0=17920, n=10, d=18, b1=17901 b0=393472, n=58, d=2, b1=393357 b0=690625, n=65, d=5/2, b1=690496 b0=7997440, n=110, d=6, b1=7997221 b0=22576401, n=111, d=33/2, b1=22576180 b0=142277472, n=228, d=12, b1=142277017 b0=365056000, n=230, d=30, b1=365055541 b0=439238656, n=332, d=12, b1=439237993 b0=308093625, n=345, d=15/2, b1=308092936 b0=294299136, n=366, d=6, b1=294298405 b0=1016961561, n=231, d=165/2, b1=1016961100 b0=1480161600, n=282, d=66, b1=1480161037 b0=1686970881, n=321, d=51, b1=1686970240 b0=937086976, n=406, d=14, b1=937086165 b0=172896256, n=442, d=2, b1=172895373 If b1 were a perfect square in any of these, it would mean my algorithm was not robust.

### teukon commented Feb 23, 2014

 As for a proof that such numbers don't exist, might you start by ruling out b being a prime power (to kill the "not coprime" elements and focus on the rest)? I don't know what you mean. Sorry, that wasn't clear. I just meant, if you aren't able to prove it directly, could you prove a similar result where, in addition, b always had to be a prime power. That's where I would start, but then again, I don't know much number theory. I'm guessing you're better equipped to tackle this problem than I.

### Davidebyzero commented Feb 23, 2014

 But this is golf. I want the string to be as short as possible, and if it's robust as it is, why make it longer but adding extra conditions? :) I don't know much number theory either, BTW. Edit: Actually I still don't understand you. Most perfect squares are not prime powers, and b0 is the number being tested for perfect-squareness (and n would be the square root if b were a perfect square, and b1 is the next candidate for perfect square that it passes back to itself pseudo-recursively by eating away the rest). So how does it help testing if b is a prime power?

### teukon commented Feb 23, 2014

 Edit: Actually I still don't understand you. Most perfect squares are not prime powers, and b0 is the number being tested for perfect-squareness (and n would be the square root if b were a perfect square, and b1 is the next candidate for perfect square that it passes back to itself pseudo-recursively by eating away the rest). So how does it help testing if b is a prime power? Oh, it doesn't. I'm not suggesting a modification to the algorithm, or even that such an assumption would be at all useful in a proof of the condition. I was just thinking out loud, about where I'd start in looking for a proof (I'd need to build up my understanding of how everything fits together first). Alas, I'm mired in my own mathematics at the moment and am so tired (shouldn't have gotten up so early), otherwise I'd have a crack at proving this. Would you like me to post my solution?

### Davidebyzero commented Feb 24, 2014

 I'd like to spend some more time thinking about alternative solutions. Maybe I can come up with yours myself. Have to research what other properties perfect squares have. Seems to me you must have used a property that I'm not aware of.

### Davidebyzero commented Feb 24, 2014

 Okay, here's one that is mathematically very simple (and thus obviously robust), and 83 characters long (8 characters shorter than my extremely-hard-to-prove one): `^(x\$|(?=((x+)\3+)\2+\$)(?!(x+)(?=(\4*)\4\$)(?!\5(xx+)\6+\$|.*(?=\2\$)\4+\$)).*(?=\3\$))*\$` This can be easily modified to match other powers; can yours, too? Examples: (cubes, fourth powers, fifth powers) `^(x\$|(?=(((x+)\4+)\3+)\2+\$)(?!(x+)(?=(\5*)\5\$)(?!\6(xx+)\7+\$|.*(?=\3\$)\5+\$)).*(?=\4\$))*\$` `^(x\$|(?=((((x+)\5+)\4+)\3+)\2+\$)(?!(x+)(?=(\6*)\6\$)(?!\7(xx+)\8+\$|.*(?=\4\$)\6+\$)).*(?=\5\$))*\$` `^(x\$|(?=(((((x+)\6+)\5+)\4+)\3+)\2+\$)(?!(x+)(?=(\7*)\7\$)(?!\8(xx+)\9+\$|.*(?=\5\$)\7+\$)).*(?=\6\$))*\$` I don't see any way to generalize it to match all powers, though. I still have no clue how you could have matched perfect squares with only 52 characters.

### teukon commented Feb 24, 2014

 I'd like to spend some more time thinking about alternative solutions. Maybe I can come up with yours myself. Have to research what other properties perfect squares have. Seems to me you must have used a property that I'm not aware of. Ok, good luck!

### Davidebyzero commented Feb 24, 2014

 Thanks. Fifth powers and higher are shorter with this generalization. Execution is also faster, because backtracking is reduced.  5: `^(x\$|((?=(x+)(\3+)\$)(?!(x+)(?=(\5*)\5\$)(?!\6(xx+)\7+\$|.*(?=\3\$)\5+\$))\4){4}(?=(x+)(\8+)\$)\9)*\$` 10: `^(x\$|((?=(x+)(\3+)\$)(?!(x+)(?=(\5*)\5\$)(?!\6(xx+)\7+\$|.*(?=\3\$)\5+\$))\4){9}(?=(x+)(\8+)\$)\9)*\$`

### teukon commented Feb 24, 2014

 Clever.

### Davidebyzero commented Feb 24, 2014

 I think I've hit your original 65 character solution. Would be pretty cool if this were character-for-character identical. `^(x\$|(?=((x+)\3+)\2+\$)(?=(xx+?)\4+\$)(?=.*(?=\2\$)\4+\$).*(?=\3\$))*\$` (65 characters) Except, you said you didn't do it in such a way that eats away the string while keeping a perfect square at each step. I did spend some time thinking about biting off pieces of x in such a way that the property of being a square was preserved at each step, but my own grasp of number theory was insufficient for me to make much progress. Hmmm. You also said removing the requirement that zero be matched made your string shorter. So I guess this is not your string. Optimized some more: `^(x\$|(?=((x+)(\3+))(\2+)\$)(?=(xx+?)\6+\$)\5(?=\6+\$)\4)*\$` (55 characters) Generalized for higher powers, where N is the power minus one: `^(x\$|((?=(x+)(\3+)\$)(?=(xx+?)\5+\$)\4(?=\5+\$)){N}(?=(x+)(\6+)\$)\7)*\$` (67 characters) teukon, please show me your 52 character string now. I'm still stumped as to how you did it, and am dying to see.

### teukon commented Feb 24, 2014

 Except, you said you didn't do it in such a way that eats away the string while keeping a perfect square at each step. I did spend some time thinking about biting off pieces of x in such a way that the property of being a square was preserved at each step, but my own grasp of number theory was insufficient for me to make much progress. Would you like me to clarify this? I don't want to give hints if you don't want them. Also, I have a 50 character solution: c58c03a671fda234c3fb15b504ff6236

### Davidebyzero commented Feb 24, 2014

 Sure, that'd be fun. Please give me hints as to how you did it and I'll see if I can do what you did. My solution is currently just 5 characters longer than yours, but it sounds like you did it a completely different way.

### teukon commented Feb 25, 2014

 Phew. Except, you said you didn't do it in such a way that eats away the string while keeping a perfect square at each step. I did spend some time thinking about biting off pieces of x in such a way that the property of being a square was preserved at each step, but my own grasp of number theory was insufficient for me to make much progress. My sincere apologies, I didn't express myself well here at all. I certainly didn't mean to mislead you. Ah, my algorithm doesn't take a 2n+1 style route. I did spend some time thinking about biting off pieces of x in such a way that the property of being a square was preserved at each step, but my own grasp of number theory was insufficient for me to make much progress. I was speaking in the context of thinking about how two squares differ by 2n - 1. I wanted to try something along the lines of your 91-character solution but I didn't have the number theory to back this up. My solution does indeed bite pieces off, preserving perfect squareness at each step. Now, for a proper hint: Our algorithms are essentially the same. It was one of your recent posts that caused me to slap my forehead and make an easy 2-character saving. You can save at least 5 characters through good golf. Let me know if you'd like another hint.

### Davidebyzero commented Feb 25, 2014

 Well, now I see how you dropped the zero match. I don't like it, but I suppose it's good golf. `^((?=((x+)(\3+))(\2+)\$)(?=(xx+?)\6+\$)\5(?=\6+\$)\4)*x\$` (53 characters) Though it's possible to keep the zero match and still save a character. `^((?=((x+)(\3+))(\2+)\$)(?=(xx+?)\6+\$)\5(?=\6+\$)\4)*x?\$` (54 characters) Edit: Generalized powers optimization `^((?=(xx+?)\2+\$)((?=(x+)(\4+)\$)\5(?=\2+\$)){N}(?=(x+)(\6+)\$)\7)*x?\$` (66 characters)

### teukon commented Feb 25, 2014

 I didn't drop the zero match. If I did, I'd be at 49 characters (which would be cool).

### Davidebyzero commented Feb 25, 2014

 Well, I've got to say, well done, well done indeed. You pared this down to a very small form much faster than I did. Not to mention hitting upon this way of doing it much faster than I did (I went down the blind alley of trying the 2n-1 approach). Of course I'm going to try to get it down to 50 myself too. I didn't drop the zero match. If I did, I'd be at 49 characters (which would be cool). Because 49 is a perfect square? :D

### teukon commented Feb 25, 2014

 Not to mention hitting upon this way of doing it much faster than I did (I went down the blind alley of trying the 2n-1 approach). Once again, my cowardliness is to my credit. Of course I'm going to try to get it down to 50 myself too. Respect. Because 49 is a perfect square? :D You know it.

### teukon commented Feb 25, 2014

 AAAHHH!!

### Davidebyzero commented Feb 25, 2014

 What happened?

### teukon commented Feb 25, 2014

 I have a gift for you. I think you're really going to like it. :)

### Davidebyzero commented Feb 25, 2014

 You made a regex that matches all powers >=2?

### teukon commented Feb 25, 2014

 Oh, much better than that.

### Davidebyzero commented Feb 25, 2014

 You found a way to divide the current number by a backreference and get the quotient as the new current number?

### teukon commented Feb 25, 2014

 Hmm... not quite that good. Edit: You may well disagree. Nah. It's better!

### Davidebyzero commented Feb 25, 2014

 Well, can I have a hint? :D

### teukon commented Feb 25, 2014

 Triples (524): ``````(?=((.*?[147]){3})*((.*?[147]|){2}))(?=((.*?[258]){3})*((.*?[258]|){2}))^.*\$(\3\7|(?!\3|\7)\4\8|(?!\4|\8)) ``````

### Davidebyzero commented Feb 25, 2014

 HOLY SHIT I THOUGHT IT COULDN'T BE DONE

### teukon commented Feb 25, 2014

 Happy Birthday (I hope it's your birthday). 1 character better than the best FSM solution. Naturally, feel free to rearrange terms to your taste.

### teukon commented Feb 25, 2014

 I THOUGHT IT COULDN'T BE DONE It couldn't. Well, not by mortals anyway.

### Davidebyzero commented Feb 25, 2014

 All I can seem to do at the moment is this cute but crappy single character shortening of the higher-powers version: `^((?=(xx+?)\2+\$)((?=(x+)(\4+)\$)\5(?=\2+\$)){N}(?=(x+)(\6+)\$)\7)*x?\$` from this `^((?=(xx+?)\2+\$)((?=(\2*(x*))(\4+)\$)\6){N}(?=(x+)(\7+)\$\5)\8)*x?\$` to this It results in more than an order of magnitude of slowdown. Hardly worth it.

### teukon commented Feb 25, 2014

 That's a bit ugly. Although with golf, I feel length has to be taken as absolute, and a situation like this just suggests a bad problem.

### Davidebyzero commented Feb 25, 2014

 The fact that this happens with an obviously elegant problem suggests that that is a bad bad-problem schema ;-) But what concerns me is: when your best golf string is generalized for higher powers, does it beat my ugly slow one? And does incorporating this nasty length-optimization result in the shortest length even for your golf string? And a more general question — is your 50 character squares string slower than my 54 character string?

### teukon commented Feb 25, 2014

 Looking for more hints eh? yes, no, and no.

### teukon commented Feb 25, 2014

 I had another thought about your powers of 6 problem. Have you considered powers of 3? I know this can be solved by checking that every factor is a multiple of 3, but I don't think there's a way of doing this that is as short as the loop (imagine: "How the hell are you able to score 181 on Powers of three!?!"). I think I prefer powers of 4 to the other options, but just wanted to share this thought.

### teukon commented Feb 25, 2014

 Oh, and have you noticed? Someone's trying the bonus levels.

### Davidebyzero commented Feb 25, 2014

 Looking for more hints eh? Not exactly. I was feeling demoralized from coming up with the ugly slowness-inducing length-optimization, and was very worried that the only way to get any shorter was that same sort of thing. yes, no, and no. Now that is very encouraging.

### Davidebyzero commented Feb 25, 2014

 I had another thought about your powers of 6 problem. Have you considered powers of 3? (imagine: "How the hell are you able to score 181 on Powers of three!?!") Ooh, I like that. Sort of like the evil Dominoes 1. And an added bonus is that there can be more left-column matches in the problem set.

### teukon commented Feb 25, 2014

 Modulus (170): `^(?=.*% (x*))(?=\1*(x*)).* \2\$` Euclid (174): `(?=\((x+)\1*, \1*\)).* \1\$`

### Davidebyzero commented Feb 25, 2014

 Wow, you're cleaning up. Literally. Those solutions are a lot cleaner. Very nice. I hope you don't mind if I change the divisor to `x+` in your Modulus. Even though my nickname is Davidebyzero, I still like a change that gives the regex engine a little less work, and/or rejects strings that don't fit the logic of the problem (even for strings that are outside the problem's stated domain, as long as doing so doesn't change the length of the regex). This does both, rejecting the pattern `^ *% += +\$` which would otherwise be accepted, and short-circuiting a non-match of the pattern `^x+ % += x*\$` sooner. Modulus (170): `^(?=.*% (x+))(?=\1*(x*)).* \2\$`

### teukon commented Feb 25, 2014

 Sure, style them as you like. I certainly appreciate the extra motivations of runtime efficiency and extra robustness. Personally, I place my own idea of simplicity/readability over both of these, but each to his own. I recall Peter Norvig being pleased with his solution `f.o` to "Plain strings", because it used less ink than `foo`. I believe the total of the best known robust scores for the 10 bonus problems is now 1729 (a seemingly dull number). Just one more point and we'll have the average solution length down to 27 characters.

### Davidebyzero commented Feb 25, 2014

 Your problem set is proving to be quite hardy in the face of our raw maneuvering on.

### teukon commented Feb 25, 2014

 Your problem set is proving to be quite hardy in the face of our raw maneuvering on. Sorry, I don't understand.

### Davidebyzero commented Feb 25, 2014

 Your problem set is proving to be quite hardy in the face of our raw maneuvering on. Sorry, I don't understand. I hope that's not an unfavourable omen.

### teukon commented Feb 26, 2014

 It's the "raw maneuvering on" that threw me. How does that reference Ramanujan?

### Davidebyzero commented Feb 26, 2014

 "Raw ma new zhan" vs. "Raw ma neuvring on". Not quite the right number of syllables, but it's the best I could manage. It's hard to turn his name into a sequence of words that make sense, and I thought it wouldn't be fair to use only Hardy's name. Edit: Just checked out it's pronounced, and I guess I wasn't doing it right. I was saying "Ra mah NEW zhan", but apparently it's pronounced "Raw MAH new zhan". Is that the reason you didn't get it?

### teukon commented Feb 26, 2014

 Aha. I see. :D

### Davidebyzero commented Feb 26, 2014

 Also, I meant "raw maneuvering on" to idiomatically mean "forging onward", i.e., our efforts at coming up with the best solutions — where "raw" means "no holds barred", i.e. we're not holding back. It is "hardy" in the face of our efforts, in that it took more than 6 days before you found a way to improve two solutions by 1 character each.

### Davidebyzero commented Feb 27, 2014

 Hahahaha it's so obvious now that I realize what it is. `^((?=(xx+?)\2+\$)((?=\2+\$)(?=(x+)(\4+)\$)\5){N})*x?\$` (50 characters) So awesome. The best golf optimization automatically generalizes it for higher powers. And makes it faster than the former perfect square golf string (by reducing backtracking). And the power doesn't have to have 1 subtracted from it.

### teukon commented Feb 27, 2014

 Well done indeed! Mine was practically identical, I just preferred `*` to `+` in a few places. ``````^((?=(xx+?)\2*\$)((?=\2*\$)(?=(x*)(\4+)\$)\5){2})*x?\$ `````` It may seem simple now, but I maintain that this is a very difficult problem, one likely to make people argue its possibility (supposing they're not given warm-up problems such as Powers of four). Another bonus for there being such a short squares string is that a problem doesn't need a lot of strings for the robust solution to also be the highest scoring solution. I can't construct particularly short regexes which match the first few square numbers and miss all non-squares; for example, I used 41 characters for squares 0 through 10.

### Davidebyzero commented Feb 27, 2014

 I think the reason the last length-optimization didn't occur to me for almost 3 days might be that the speed-optimizer in me automatically discarded the idea of putting a test at the beginning of an inner loop that'd be useless every first iteration. But it turns out that in practice, the difference in speed is negligible (the short version is on the order of being a few percent slower for large squares, which is a sacrifice I make gladly; for non-squares, the difference is less than a percent). May I see your 65 and 52 character ones too please? Your 52 was the one with md5sum d0532dd568a9189c32477bfce3d6d659.

### teukon commented Feb 27, 2014

 Sure. 65 points. md5sum: feda71258877293543e5cad3896bb13c ``````^((?=(xx+?)\2+\$)(?=((xx+)\4*)\4\$)\3(?=\2+\$)(?=((x+)\6*)\6\$)\5)*x\$ `````` 52 point. md5sum: d0532dd568a9189c32477bfce3d6d659 ``````^((?=(xx+?)\2*\$)((?=\2*\$)(?=((x*)\5*)\5\$)\4){2})*x?\$ `````` As you can see, my first solution collapsed to 50 characters pretty easily. The last 2 characters were inspired by one of your solutions (although, I felt pretty foolish for not seeing it myself).

### Davidebyzero commented Feb 27, 2014

 Thanks! I found a mild curiosity. `(?=(x?x+?)\2*\$)` finds the smallest prime factor, but if the number is 1, it uses 1 instead of a prime. So far I can't think of any use for this, though.

### teukon commented Feb 27, 2014

 Interesting.

### Davidebyzero commented Mar 1, 2014

 Hi, teukon! I just completed a regex that matches correct multiplication statements, in the domain `^x+\*x+=x+\$`. It's very long. Would you like to tackle this problem yourself, or would you like me to share my regex? If you'd like to try it yourself, the rules are: Must be ECMAScript-compatible, but must not use ECMAScript non-participating capture group behavior. Here is a script for testing the regex: ```#!/bin/bash regex=\$(tr -d '[:space:]' < "regex for matching multiplication.txt") for (( a=1; a<=12; a++ )); do for (( b=1; b<=12; b++ )); do for (( c=1; c<=144; c++ )); do result=\$( perl -E "print 'x' x \$a; print '*'; print 'x' x \$b; print '='; print 'x' x \$c" | grep -P "\$regex" ) if [ -n "\$result" ]; then echo \$a \* \$b = \$c fi done done done```

### teukon commented Mar 1, 2014

 Great, that seems like fun. I'll give it a go.

### teukon commented Mar 1, 2014

 Ok, I think I have a solution. md5sum 7ee3f71ba9c611066072ae9ac281d087. I'm not using non-participating group capture. I ran it through your test and it printed the multiplication tables (eventually :p).

### teukon commented Mar 1, 2014

 I've managed to shave off 4 characters but can't find a way to make my regex any more efficient. md5sum 871497593e9561b06e508fc8e984abd9.

### Davidebyzero commented Mar 1, 2014

 Holy crap that was fast. It took me days to do this — were you thinking about the problem before? I started working on it at 2014-02-27 23:25 GMT, and was thinking about it days before that. My solution has md5sum 399e7a50c9feecc15639eded1f5303ff BTW, or 929be506fd5faafad13693bdc12dd670 with whitespace. I have not yet done any optimization of its length since I got the first fully working string (instead, I went to bed). The length without whitespace is 913 characters. What's the length of yours?

### teukon commented Mar 1, 2014

 I had briefly considered a multiplication problem like this before, but didn't dwell on it because I didn't want example strings longer than about 40 characters (long example strings are ok for things like squares because no essential information is lost when the problem is presented). This was long before we got to thinking about squares and at the time I thought it was probably impossible anyway. Your post reminded me of this musing, but it wasn't really much of a head start. It did take me longer to solve than squares. Are you sure you want the precise length of my solution? To be safe, I'll just say that it's shorter than 913 characters and try to assure myself that it really is robust. Edit: I've checked my logic and I'm fairly confident that my solution is robust. I can post the length or just a range if you prefer.

### Davidebyzero commented Mar 1, 2014

 Sure, a range would be nice. There's a decent amount of room for optimization in mine. I've gotten it down to 899 characters so far; this has md5sum 756e7246b706f94608ed0550fa5300c4 (and git commit hash be7edce04c30ac94dd0ce4bf099873078935d924). And here is the current version of my test script: ```#!/bin/bash # Trim whitespace and (?#...) style regex comments, allowing use of them without affecting length regex=\$(tr -d '[:space:]' < "regex for matching multiplication.txt" | sed -e 's/(?#[^)]*)//g') echo Regex length = \$(echo -n "\$regex"|wc -c) echo Regex md5sum = \$(echo -n "\$regex"|md5sum) echo for (( a=3; a<=12; a++ )); do for (( b=1; b<=12; b++ )); do for (( c=1; c<=144; c++ )); do result=\$( ( time (perl -E "print 'x' x \$a; print '*'; print 'x' x \$b; print '='; print 'x' x \$c" | grep -P "\$regex") ) 2>&1 ) grep_result="\${result%% *}" time_result="\${result#* }" time_result="\${time_result%% *}" correct=\$((\$a * \$b == \$c)) if [ -n "\${grep_result}" ]; then echo -n \$time_result: \$a \* \$b = \$c [[ \$correct -ne 0 ]] && echo "" || echo " - FALSE POSITIVE!" elif [[ \$correct -ne 0 ]]; then echo \$time_result: \$a \* \$b != \$c " - FALSE NEGATIVE!" fi done done done```