Skip to content

Instantly share code, notes, and snippets.

@VictorTaelin
Last active April 30, 2024 05:44
Show Gist options
  • Star 26 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save VictorTaelin/9061306220929f04e7e6980f23ade615 to your computer and use it in GitHub Desktop.
Save VictorTaelin/9061306220929f04e7e6980f23ade615 to your computer and use it in GitHub Desktop.
Simple SAT Solver via superpositions

Solving SAT via interaction net superpositions

I've recently been amazed, if not mind-blown, by how a very simple, "one-line" SAT solver on Interaction Nets can outperform brute-force by orders of magnitude by exploiting "superposed booleans" and optimal evaluation of λ-expressions. In this brief note, I'll provide some background for you to understand how this works, and then I'll present a simple code you can run in your own computer to observe and replicate this effect. Note this is a new observation, so I know little about how this algorithm behaves asymptotically, but I find it quite interesting that this simple approach performs much better than it should.

What are interaction net superpositions?

Interaction Nets are a new model of computation capable of evaluating lambdas optimally. HVM is a practical implementation of that system. Since INets are fully linear, variables must be "cloned" via "dup-nodes", which can be seen as "lazy cloners". Usually, these can only be linked to variables, and are "invisible" to the user. But what happens if we expose dup nodes as a first-class feature of the user-facing language?

Turns out that they behave like "superposed values", which can be seen as a new programming feature that semantically means "this expression has two values at the same time". In HVM, the superposition of X and Y is written as {X Y}. When you apply a function to a superposed value, as in, F({X Y}), F will be executed for each input, X and Y. So, for example, double({2 3}) computes to {4 6}. What makes this interesting, though, is that these calls are "overlapped", in a way that shares computations, giving you a huge performance boost.

What are its applications?

1. Fast function calls

When we apply a function to a tree of superposed values, it will behave like a "map", and be applied to each value. For example, if F(x) = x * 10, then F({1 2 3 4}) is seen as {F(1) F(2) F(3) F(4)}, which reduces to {10 20 30 40}, with the advantage of computations being shared optimally across different calls. For example, suppose that F has a heavy computation inside it:

def F(x):
    return x + slow(999999)

Here, slow(999999) would take a long time to compute. Yet, when F is applied to superposed values, that sub-expression would only be computed once, saving a lot of work. This is a very general behavior. For example, sub-expressions that depend on the inputs (and, thus, can't be computed once) will still be "partly computed only once", as much as possible, making your program much faster.

2. Automated vectorization

Another application is automated vectorization. If we compile eliminators and destructors of datatypes to lambdas with different labels, these behave like superpositions across different types. As such, if we evaluate, say, (Not (Pair True False)), it will reduce to (Pair False True). The Not eliminator "passes through" the Pair constructor, and only interacts with the bools.

3. Search

Now, for the most interesting part; and what made me hyped enough to make this post; we can also use superposition to search. For example, if we evaluate (And {True False} {True False}), it will output {{True False} {False False}}, which is the truth-table for the And function. In other words, applying a function to a superposition of its complete domain results in a superposition of its complete image.

So, what happens if we're searching for a specific output? Turns out we can use superpositions to perform this "brute-force" search without loops. In other words, instead of:

for x0 in [T, F]:
    for x1 in [T, F]:
        for x2 in [T,F]:
            ...
            print F(x0, x1, x2, ...)

We do this instead:

print F({T,F}, {T,F}, {T,F}, ...)

As you may have expected, the later will share a lot of computation across "guesses", resulting in a search that is much faster than a naive brute-force search would be. Below, we exploit this technique to find the solution of a SAT problem in HVM. The "brute-force" algorithm takes 3 minutes to solve a random instance with 32 variables in Rust. In HVM, the same instance is solved in 1 second by using superpositions. Below is an example SAT solver in HVM, for a random instance with 16 variables:

// Bool/List constructors and functions
(Cons h t)  = λcλn(c h (t c n))
Nil         = λcλn(n)
T           = λt λf t
F           = λt λf f
(Id a)      = a
(Not a)     = λt λf (a f t)
(And a b)   = (a λx(x) λx(F) b)
(Or a b)    = (a λx(T) λx(x) b)
(Or3 a b c) = (Or a (Or b c))

// Pretty prints a bool
(Bool x) = (x 1 0)

// Converts a solution to a pretty-printed church-list singleton
(Log x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF x) = (x (Cons λt(t
  (Bool x0) (Bool x1) (Bool x2) (Bool x3)
  (Bool x4) (Bool x5) (Bool x6) (Bool x7)
  (Bool x8) (Bool x9) (Bool xA) (Bool xB)
  (Bool xC) (Bool xD) (Bool xE) (Bool xF)
) Nil) Nil)

// A random 3-SAT instance with 16 variables
(Foo x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF) =
  (Log x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF
  (And (Or3 xC x8 (Not xB))
  (And (Or3 (Not xA) (Not x2) (Not x8))
  (And (Or3 xB (Not x9) x4)
  (And (Or3 xA x1 (Not x8))
  (And (Or3 (Not x9) x0 x5)
  (And (Or3 (Not x4) (Not x4) xD)
  (And (Or3 (Not x1) (Not x4) xB)
  (And (Or3 (Not x2) xE (Not xD))
  (And (Or3 xD (Not x8) (Not x7))
  (And (Or3 xD (Not x1) x7)
  (And (Or3 (Not x8) x4 x8)
  (And (Or3 x5 (Not x0) xC)
  (And (Or3 x1 x0 (Not x3))
  (And (Or3 x3 xE (Not xD))
  (And (Or3 x9 (Not xC) (Not xB))
  (And (Or3 x8 (Not xE) (Not x5))
  (And (Or3 (Not x7) (Not x9) (Not xF))
  (And (Or3 xF xB x2)
  (And (Or3 (Not x2) (Not xE) (Not x7))
  (And (Or3 (Not xE) x3 (Not x3))
  (And (Or3 x3 (Not xF) x1)
  (And (Or3 (Not x0) x0 xD)
  (And (Or3 (Not x8) (Not x3) xC)
  (And (Or3 xC (Not x1) x7)
  (And (Or3 x2 (Not xE) (Not x0))
  (And (Or3 x6 (Not x5) x1)
  (And (Or3 (Not xC) x3 (Not x3))
  (And (Or3 (Not x1) (Not xC) (Not x5))
  (And (Or3 xB xA x6)
  (And (Or3 xF x6 x9)
  (And (Or3 (Not xF) x3 (Not x9))
  (And (Or3 (Not xB) x1 x8)
  (And (Or3 x9 (Not xE) xE)
  (And (Or3 (Not xA) (Not x2) x2)
  (And (Or3 x5 xE (Not x2))
  (And (Or3 (Not xB) xC x2)
  (And (Or3 x1 xB (Not x2))
  (And (Or3 x8 (Not x6) (Not x7))
  (And (Or3 xD x9 (Not xA))
  (And (Or3 x0 x8 (Not x8))
  (And (Or3 (Not xA) x9 (Not x0))
  (And (Or3 (Not xE) (Not x7) (Not xC))
  (And (Or3 x9 xC (Not xA))
  (And (Or3 (Not x7) (Not xF) xD)
  (And (Or3 x0 (Not x2) xD)
  (And (Or3 xC (Not x9) (Not x0))
  (And (Or3 (Not xA) x8 (Not x1))
  (And (Or3 x4 x5 (Not xE))
  (And (Or3 (Not x0) x5 (Not x7))
  (And (Or3 (Not x1) (Not xC) xA)
  (And (Or3 x0 xF x9)
  (And (Or3 (Not xA) x3 (Not x2))
  (And (Or3 (Not x7) (Not xF) x6)
  (And (Or3 (Not x1) x4 (Not x5))
  (And (Or3 xC (Not x8) x2)
  (And (Or3 x4 x7 (Not x5))
  (And (Or3 (Not x9) (Not x0) (Not xA))
  (And (Or3 xC x1 (Not x2))
  (And (Or3 xB xE (Not xB))
  (And (Or3 xF x5 xA)
  (And (Or3 (Not x6) xE x3)
  (And (Or3 (Not xB) xB x1)
  (And (Or3 (Not xF) x0 x7)
       (Or3 xE (Not x3) (Not x2))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))

// Collapsers (to retrieve solutions from superpositions) - credit to @Franchu
Join = λaλbλcλn(a c (b c n))
Col0 = λx let #A0{x0 x1} = x; (Join x0 x1)
Col1 = λx let #A1{x0 x1} = x; (Join x0 x1)
Col2 = λx let #A2{x0 x1} = x; (Join x0 x1)
Col3 = λx let #A3{x0 x1} = x; (Join x0 x1)
Col4 = λx let #A4{x0 x1} = x; (Join x0 x1)
Col5 = λx let #A5{x0 x1} = x; (Join x0 x1)
Col6 = λx let #A6{x0 x1} = x; (Join x0 x1)
Col7 = λx let #A7{x0 x1} = x; (Join x0 x1)
Col8 = λx let #A8{x0 x1} = x; (Join x0 x1)
Col9 = λx let #A9{x0 x1} = x; (Join x0 x1)
ColA = λx let #AA{x0 x1} = x; (Join x0 x1)
ColB = λx let #AB{x0 x1} = x; (Join x0 x1)
ColC = λx let #AC{x0 x1} = x; (Join x0 x1)
ColD = λx let #AD{x0 x1} = x; (Join x0 x1)
ColE = λx let #AE{x0 x1} = x; (Join x0 x1)
ColF = λx let #AF{x0 x1} = x; (Join x0 x1)

// Finds a solution by applying Foo to superposed inputs
Main = 
  (Col0 (Col1 (Col2 (Col3
  (Col4 (Col5 (Col6 (Col7
  (Col8 (Col9 (ColA (ColB
  (ColC (ColD (ColE (ColF
  (Foo
    #A0{T F} #A1{T F} #A2{T F} #A3{T F}
    #A4{T F} #A5{T F} #A6{T F} #A7{T F}
    #A8{T F} #A9{T F} #AA{T F} #AB{T F} 
    #AC{T F} #AD{T F} #AE{T F} #AF{T F}
  )))))))))))))))))

Discussion: https://twitter.com/VictorTaelin/status/1744788091833917696

@Hirrolot
Copy link

Hirrolot commented Jan 9, 2024

Thinking out loud: if we perform the first Futamura projection, namely partially evaluate the optimal evaluator on the fixed program that solves SAT, we will obtain an efficient executable that solves SAT without interpretive overhead. (So Main from the code you've suggested would be dynamic data that "comes" at run-time, while all the other definitions are static.)

@hyperswine
Copy link

interesting

@diogomqbm
Copy link

exciting!

@MarisaKirisame
Copy link

My guess is it is doing BCP but not cdcl.
I think it is possible to construct instance where brute force suck but BCP is very fast, and instance where BCP suck but cdcl is very fast.
Of course, the deep question is then - how do we get cdcl, without actually 'coding' it? Is it even possible?

@Trebor-Huang
Copy link

This algorithm is equivalent to a weakened conflict driven clause learning. The weakening lies in the fact that the binary boolean operators are biased. Or True _ and Or _ True cannot both simplify to True, by the nature of interaction nets. I haven't really thought this through, so there may be gaps.

Since inets are stepwise confluent, all evaluation orders result in equal time consumption. Let's look at a particular order.

  • We first push the dup node for the first variable all the way down. Then we have two copies of Foo where x1 is substituted by True/False respectively and we may simplify the expression as far as possible. Note that the aforementioned weakness kicks in here.
  • We then push the second dup node down, etc. This is essentially a backtracking algorithm, but parallelizable. At one point there will be enough information such that the function evaluates to False. If I understand the collapsers correctly, at this stage except the variables causing the conflict, all the variables will be hit by erase nodes, which prevents the backtracking to ever try this variable combination again. Therefore it effectively implements non-chronological backtracking.
  • Again, such a process may be stalled by the fact that boolean operators don't always simplify, and although the total time consumed is fixed, the algorithm may use parallelism to reduce time (in this case it will be at the cost of duplicating Foo more times, hence higher memory consumption).

@Trebor-Huang
Copy link

Also, it is prohibitingly difficult for me to view any discussion on Twitter, so I would really hope people stop doing that.

@VictorTaelin
Copy link
Author

@Trebor-Huang what do you suggest instead? Twitter has been the only place it makes sense to publish these quick notes and updates on the progress of my work. Reddit is problematic because most posts get ignored/unseen, and, when they do get upvoted, people assume and expect it to be high-effort work - many people complained of me publishing this note because it doesn't contain a deep analysis!

@Trebor-Huang
Copy link

Trebor-Huang commented Jan 11, 2024

Gists seem good enough, don't you think? Or you can use a blog with comment features (hosted using Github PRs, for example). I personally use mastodon for social media, but I don't want to force anybody to move there.

@VictorTaelin
Copy link
Author

Yes but I mean, if I just post a Gist without publishing it, nobody will see. On Twitter I feel comfortable to publish anything that isn't well-polished enough to justify a Reddit thread.

@Trebor-Huang
Copy link

I mean, you could still post it on twitter, but I would really appreciate it if you encourage people to discuss here, instead of in a twitter thread.

@FranchuFranchu
Copy link

Also, it is prohibitingly difficult for me to view any discussion on Twitter, so I would really hope people stop doing that.

@Trebor-Huang I have this problem too. What I've been doing is regularly visit https://nitter.net/VictorTaelin to see what he posts. Nitter is an alternate Twitter frontend that requires no account.

@MarisaKirisame
Copy link

very weird that it does some cdcl but is also parallel.

@AHartNtkn
Copy link

Can these lines

(And a b)   = (a λx(x) λx(F) b)
(Or a b)    = (a λx(T) λx(x) b)

be simplified to

(And a b)   = (a b F)
(Or a b)    = (a T b)

?

@jduey
Copy link

jduey commented Jan 26, 2024

Here's an optimized version that runs much faster.

(Cons h t)  = λcλn(c h (t c n))
Nil         = λcλn(n)
T           = λt λf t
F           = λt λf f
(Id a)      = a
(Not a)     = λt λf (a f t)
(And a b)   = (a λx(x) λx(F) b)
(Or a b)    = (a λx(T) λx(x) b)
(Or3 a b c) = (Or a (Or b c))

// Pretty prints a bool
(Bool x) = (x 1 0)

// Converts a solution to a pretty-printed church-list singleton
(Log x) = (x (Cons λt(t
  (Bool x0) (Bool x1) (Bool x2) (Bool x3)
  (Bool x4) (Bool x5) (Bool x6) (Bool x7)
  (Bool x8) (Bool x9) (Bool xA) (Bool xB)
  (Bool xC) (Bool xD) (Bool xE) (Bool xF)
) Nil) Nil)

x0 = #A0{T F} 
x1 = #A1{T F} 
x2 = #A2{T F} 
x3 = #A3{T F} 
x4 = #A4{T F} 
x5 = #A5{T F} 
x6 = #A6{T F} 
x7 = #A7{T F} 
x8 = #A8{T F} 
x9 = #A9{T F} 
xA = #AA{T F} 
xB = #AB{T F} 
xC = #AC{T F} 
xD = #AD{T F} 
xE = #AE{T F} 
xF = #AF{T F} 

// A random 3-SAT instance with 16 variables
Foo =
  (Log
  (And (Or3 x0 x9 xF)
  (And (Or3 x0 x1 (Not x3))
  (And (Or3 x0 x5 (Not x9))
  (And (Or3 x0 x7 (Not xF))
  (And (Or3 x0 (Not x2) xD)
  (And (Or3 (Not x0) x2 (Not xE))
  (And (Or3 (Not x0) x5 xC)
  (And (Or3 (Not x0) x5 (Not x7))
  (And (Or3 (Not x0) x9 (Not xA))
  (And (Or3 (Not x0) (Not x9) xC)
  (And (Or3 (Not x0) (Not x9) (Not xA))
  (And (Or3 x1 x3 (Not xF))
  (And (Or3 x1 x8 (Not xB))
  (And (Or3 x1 (Not x2) xB)
  (And (Or3 x1 (Not x2) xC)
  (And (Or3 x1 (Not x5) x6)
  (And (Or3 x1 (Not x8) xA)
  (And (Or3 (Not x1) x7 xD)
  (And (Or3 (Not x1) x7 xC)
  (And (Or3 (Not x1) x8 (Not xA))
  (And (Or3 (Not x1) x4 (Not x5))
  (And (Or3 (Not x1) xA (Not xC))
  (And (Or3 (Not x1) (Not x4) xB)
  (And (Or3 (Not x1) (Not x5) (Not xC))
  (And (Or3 x2 xB xF)
  (And (Or3 x2 (Not x8) xC)
  (And (Or3 x2 (Not xB) xC)
  (And (Or3 (Not x2) x3 (Not xA))
  (And (Or3 (Not x2) x5 xE)
  (And (Or3 (Not x2) (Not x3) xE)
  (And (Or3 (Not x2) (Not x8) (Not xA))
  (And (Or3 (Not x2) (Not xD) xE)
  (And (Or3 (Not x2) (Not x7) (Not xE))
  (And (Or3 x3 (Not x6) xE)
  (And (Or3 x3 (Not x9) (Not xF))
  (And (Or3 x3 (Not xD) xE)
  (And (Or3 (Not x3) (Not x8) xC)
  (And (Or3 x4 x5 (Not xE))
  (And (Or3 x4 (Not x5) x7)
  (And (Or3 x4 (Not x9) xB)
  (And (Or  (Not x4) xD)
  (And (Or3 x5 xA xF)
  (And (Or3 (Not x5) x8 (Not xE))
  (And (Or3 x6 x9 xF)
  (And (Or3 x6 xA xB)
  (And (Or3 x6 (Not x7) (Not xF))
  (And (Or3 (Not x6) (Not x7) x8)
  (And (Or3 (Not x7) xD (Not xF))
  (And (Or3 (Not x7) (Not x8) xD)
  (And (Or3 (Not x7) (Not x9) (Not xF))
  (And (Or3 (Not x7) (Not xC) (Not xE))
  (And (Or3 x8 (Not xB) xC)
  (And (Or3 x9 (Not xB) (Not xC))
  (And (Or3 x9 xD (Not xA))
       (Or3 x9 (Not xA) xC))
  // (And (Or3 (Not x8) x4 x8)
  // (And (Or3 (Not xE) x3 (Not x3))
  // (And (Or3 (Not x0) x0 xD)
  // (And (Or3 (Not xC) x3 (Not x3))
  // (And (Or3 x9 (Not xE) xE)
  // (And (Or3 (Not xA) (Not x2) x2)
  // (And (Or3 x0 x8 (Not x8))
  // (And (Or3 xB xE (Not xB))
  // (And (Or3 (Not xB) xB x1)
	))))))))))))))))))))))))))))))))))))))))))))))))))))))
	// )))))))))

// Collapsers (to retrieve solutions from superpositions) - credit to @Franchu
Join = λaλbλcλn(a c (b c n))
Col0 = λx let #A0{x0 x1} = x; (Join x0 x1)
Col1 = λx let #A1{x0 x1} = x; (Join x0 x1)
Col2 = λx let #A2{x0 x1} = x; (Join x0 x1)
Col3 = λx let #A3{x0 x1} = x; (Join x0 x1)
Col4 = λx let #A4{x0 x1} = x; (Join x0 x1)
Col5 = λx let #A5{x0 x1} = x; (Join x0 x1)
Col6 = λx let #A6{x0 x1} = x; (Join x0 x1)
Col7 = λx let #A7{x0 x1} = x; (Join x0 x1)
Col8 = λx let #A8{x0 x1} = x; (Join x0 x1)
Col9 = λx let #A9{x0 x1} = x; (Join x0 x1)
ColA = λx let #AA{x0 x1} = x; (Join x0 x1)
ColB = λx let #AB{x0 x1} = x; (Join x0 x1)
ColC = λx let #AC{x0 x1} = x; (Join x0 x1)
ColD = λx let #AD{x0 x1} = x; (Join x0 x1)
ColE = λx let #AE{x0 x1} = x; (Join x0 x1)
ColF = λx let #AF{x0 x1} = x; (Join x0 x1)

// Finds a solution by applying Foo to superposed inputs
Main = 
  (Col0 (Col1 (Col2 (Col3
  (Col4 (Col5 (Col6 (Col7
  (Col8 (Col9 (ColA (ColB
  (ColC (ColD (ColE (ColF
  (Foo)))))))))))))))))

@jduey
Copy link

jduey commented Jan 26, 2024

Sorting the clauses by bit position and then making each bit superposition be a ref really cut down on the number of rewrites needed. There might be more gains to be had if you bailed after finding a single solution. But maybe not. Nothing is as it seems with HVM and only concrete examples give guidance right now. It would be interesting to see if this corresponds with some known SAT optimization.

@JonathanPlasse
Copy link

Yes but I mean, if I just post a Gist without publishing it, nobody will see. On Twitter I feel comfortable to publish anything that isn't well-polished enough to justify a Reddit thread.

@VictorTaelin, @Trebor-Huang,
You could check out https://www.heymaven.com/.
This is an alternative social network where there is no likes or follow.
I discovered this just today from SethBling, the youtuber who made MarI/O.
10 years later he interviewed the man behind the evolutionary algorithm NEAT he used to play Mario.
It was very interesting, here is the video link https://www.youtube.com/watch?v=5zg_5hg8Ydo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment