Skip to content

Instantly share code, notes, and snippets.

@Davidebyzero
Created March 24, 2014 06:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Davidebyzero/9735222 to your computer and use it in GitHub Desktop.
Save Davidebyzero/9735222 to your computer and use it in GitHub Desktop.
Continued discussion of Regex Golf Bonus Levels and mathematical regexes (SPOILERS! SPOILERS!)
@teukon
Copy link

teukon commented Apr 18, 2014

It's been a privilege, and great fun, playing Regex Tennis with you. And I don't think I will ever understand how you were able to come up with such heavily optimized solutions to Perfect squares and Multiplication so quickly.

Kind words indeed. I've been equally impressed by some of your work, notably the abundant numbers solution, and the beautiful solution to Dominoes.

And more generally, I'd like to see at least one thing proved impossible to do. Or (more astoundingly) a proof that all computable functions are possible to do in a regex (obviously excluding those whose output can exceed their input).

I expect there are some examples which are provably impossible. It would be nice to have a very simple one but I've run out of ideas.

Oh, and I think you've probably guessed where I'm leading, with that little golf challenge I posed to you. Matching a completely generalized Powers, ^x+^x*=x+$, is indeed possible. It's funny how hard I thought it'd be a month ago:

Yeah. Well done tying up this loose end. I know how you feel though, I remember thinking that square numbers would be incredibly difficult to handle.

This has been fun!

@Davidebyzero
Copy link
Author

The split of Fibonacci regexes between these two designs:

  1. Find a number such that when squared, multiplied by 5, and 4 is added or subtracted, is a perfect square — resulting in finding the Fibonacci that has half the Fibonacci index of the target number
  2. Build up the Fibonacci sequence, up to the one that has one-quarter the index of the target number

... is no longer just about brevity versus speed. If I am not mistaken, only algorithm #2 can be used to return the Fibonacci index as a match. (This can't be done for the Fibonacci numbers 2 and 3, which have indexes 3 and 4 — but this would be easily remedied by turning it into a problem of matching statements of the form ^F_x*=x*$.)

(Incidentally — it would be possible to implement algorithm #1 as going directly to the target number, by using multi-"digit" arithmetic with carry, as I described earlier. But I doubt this would be possible to do in 165 characters or less.)

@Davidebyzero
Copy link
Author

In the interest of tying up another loose end (from Feb 25, 2014):

I had another thought about your powers of 6 problem. Have you considered powers of 3?

I know this can be solved by checking that every factor is a multiple of 3, but I don't think there's a way of doing this that is as short as the loop (imagine: "How the hell are you able to score 181 on Powers of three!?!"). I think I prefer powers of 4 to the other options, but just wanted to share this thought.

A solution to "Powers of 3" that is analogous to the ^(?!(x(xx)+|)\1*$) (18 character) solution of "Powers of 2":
^(?!(x(xxx)+x?|xx|)\1*$) (24 characters)

A solution to "Powers of 3" that is analogous to the ^((x+)(?=\2$))*x$ (17 character) solution of "Powers of 2":
^((x+)\2(?=\2$))*x$ (19 characters)

So yes, the former is not as short as the latter, but it's really close. "Powers of 4", or powers of any non-prime for that matter, would not be solvable at all using the former method.

@Davidebyzero
Copy link
Author

Davidebyzero commented Dec 15, 2018

I've found some errors in this paper on the density of abundant numbers. On "page 142" (page 6 of the PDF) there's a table of the number of non-deficient numbers in a range of intervals. My C program agrees with those up to 108, but finds that there are 247610960 non-deficient numbers under 109, whereas the paper says 247610965. Looks like Deléglise may have used 32-bit math without properly checking for overflow (I'm using 64-bit math).

Also, the table leaves out the number for [1, 106] and accidentally shows the one for [1, 105] next to [1, 106]. And it uses [a, b] to represent open intervals, but [a, b] is the notation for a closed interval — the paper's [1, 108], for example, does not include 108 itself.

So it turns out that even back when I wrote this, there was already a paper that superseded it: On the density of abundant numbers, Kobayashi (2010). It just didn't show up in my searches at the time.

This is an excellent paper. It goes into depth on things I'd wished were in Deléglise's 1998 paper. It derives the time-complexity of computing n digits of the abundant number density to be O(een). It includes the full C++ source code (and describes their workings) of the program that computes bounds on the density; I tried it, and it works! I was able to compute bounds narrowed down to [0.2476167, 0.2476528] in 245 minutes, single-threaded, by passing it N=200000000000000 on the command line. That's weird, because it's almost as narrow as the new bounds established by the paper: [0.2476171, 0.2476475]. I would have thought it would take a lot more computing time to reach that.

Edit: After 23.54 hours of computation with N = 5000000000000000, the program gave a result of [0.24761740, 0.24764507]. That's even tighter than the bounds stated in the paper.
Edit 2: After 40.48 hours of computation with N = 15000000000000000, it gave a result of [0.24761748, 0.24764422].

FWIW, it also gets the closed-open interval right, stating them as [a, b). It does have at least one error though:

any finite union of multiple sets is a periodic set

on page 6. I think he meant "any finite union of multiple periodic sets is a periodic set".

I do wish the paper discussed what limitation the use of native floating point data types (80-bit long double) has on the best bounds achievable by the program. There is some discussion on the matter starting at page 112, but if I understand correctly, this only describes how a problem regarding precision was solved. In scanning through the paper, I didn't find anything explicitly discussing limitations on precision that still apply. So I don't know whether there's a point at which the program will start narrowing the bounds without rigorous mathematical justification.

The same mathematician also co-wrote an article in 2014 about the error term in the count of abundant numbers. This pertains to another thing I said earlier here:

What would be much cooler, however, would be to prove that for every n, |An(2) - A(2)| < f(n) for some f(n) that asymptotically approaches zero, where an f(n) for which this is true is found.

The function they came up with, though, is very very pessimistic (extremely slow to converge, much slower than the density itself seems to converge). It is a start though, I suppose. The article itself is behind a paywall, unfortunately.

@Davidebyzero
Copy link
Author

Davidebyzero commented Dec 15, 2018

I thought of a pretty good analogy for what constructing ECMAScript regexes for functions on the natural numbers is like. I think it's much like doing straightedge and compass geometric construction. There are many things that are impossible to construct with those tools, but that hasn't stopped mathematicians from devoting much time to the subject.

I suspect that only a strict subset of primitive recursive functions are possible but at this point still can't prove it. I'm also pretty sure that with unbounded scratch space, all primitive recursive functions become possible (this would probably be easy to prove).

@Grimy
Copy link

Grimy commented Jan 31, 2019

Sorry to intrude, but there's still one loose end: the log2 golf challenge. I have a length 108 73 66 solution (8cf2ef821f9428e4a611b457ccc57344). What's the length of your b31a12cf9a9982d7d3204d198f87ba4f solution?

@Davidebyzero
Copy link
Author

Hello, Grimy, and welcome!

Wow, that is awesome. My b31a12cf9a9982d7d3204d198f87ba4f is 69 bytes, so you have beaten me! Very well done. I'd like to exchange our regexes, but privately, to maintain the viability of this challenge for a little longer. I'd like to see your earlier versions too!

I recently forked the log2 regex into:

  1. A version that strictly rounds down, which is 93 bytes with md5sum cef4f7a85f9703825c97b01012358d4d, and like the original, returns no-match for zero
  2. A base 10 version that returns the number of digits, which is 112 bytes with md5sum cc065c2c0e212933c0bcb1e3e9717a86 (it's the same as strictly rounded-down logarithm plus one – except with an input of zero, for which both regexes return no-match, because you can't return 1 with an input of 0)
  3. Logarithm in base N (passed as two comma-delimited unary numbers, base first, and including the base and comma in the returned result) at 165 bytes with md5sum df2b6dab8eb3fdf9a6c2d2e4e52257f3 (have not made a strictly-rounding-down or digit-counting version of this yet).

Tying up another loose end, I also wrote a regex to take a single unary number as input, and divide by sqrt(2) (rounding down) and return the result as a match, using an algorithm very similar to the one I sketched above. It uses molecular lookahead. The first fully correct version was 1159 bytes and I got it down to 849. I plan on porting it to pure ECMAScript, but I expect it will roughly double in length.

@Grimy
Copy link

Grimy commented Feb 4, 2019

Thanks!

I am now down to 57 chars on the log2 regex (md5sum c58bef1340954f1bbaf02c5790f33a01, --npcg+ only, --neam independent). I’ll gladly share it if you want.

I haven’t yet tried applying this algorithm to the base 10 or rounding down version of the task, but it should give significant improvements.

EDIT:

  • 76 chars for the version that strictly rounds down (or 74 if I’m allowed to return 0 instead of no match for a 0 input).
  • 97 chars for the base 10 version

@Davidebyzero
Copy link
Author

Davidebyzero commented Feb 7, 2019

Wow! 57 chars. That's incredible. I don't want you to just show it to me right away though; I'd like some time to try to figure it out. Can't work on it right away but I plan on it. I would indeed like to see them, please!

I did have an idea on how to make my floor-logarithm regexes much shorter. It did a little worse than the technique I already used, and frankly I'm very happy about that. The one I'd already used is much more interesting, and the one I just now tried would trivialize it if it did better.

And I'm also quite impressed to see that 76-57=19, a smaller difference than my 93-69=24.

@Davidebyzero
Copy link
Author

This discussion has been continued in the Mathematical Regexes Stack Exchange chat room.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment