Skip to content

Instantly share code, notes, and snippets.

@lspector
Created December 9, 2011 17:27
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lspector/1452484 to your computer and use it in GitHub Desktop.
Save lspector/1452484 to your computer and use it in GitHub Desktop.
Clojure code for a simple genetic programming system with trivial geography, for demonstration purposes.
;; Lee Spector (lspector@hampshire.edu) 20111018 - 20111121
(ns evolvegeo
(:require [clojure.zip :as zip]))
;; Like evolvefn.clj (https://gist.github.com/1335696), this this code
;; defines and runs a genetic programming system on the problem
;; of finding a function that fits a particular set of [x y] pairs.
;; Unlike evolvefn.clj, this code incorporates trivial geography
;; (http://hampshire.edu/lspector/pubs/trivial-geography-toappear.pdf).
;; The aim here is mostly to demonstrate how genetic programming can be
;; implemented in Clojure simply and clearly, and several things are
;; done in somewhat inefficient and/or non-standard ways. But this should
;; provide a reasonable starting point for developing more efficient/
;; standard/capable systems.
;; Note also that this code, as written, will not always find a solution.
;; There are a variety of changes that one might make to improve its
;; problem-solving performance on the given problem.
;; We'll use data from x^2 + x + 1 (the problem from chapter 4 of
;; http://www.gp-field-guide.org.uk/, although our gp algorithm won't
;; be the same, and we'll use some different parameters as well).
;; We'll use input (x) values ranging from -1.0 to 1.0 in increments
;; of 0.1, and we'll generate the target [x y] pairs algorithmically.
;; If you want to evolve a function to fit your own data then you could
;; just paste a vector of pairs into the definition of target-data instead.
(def target-data
(map #(vector % (+ (* % %) % 1))
(range -1.0 1.0 0.1)))
;; An individual will be an expression made of functions +, -, *, and
;; pd (protected division), along with terminals x and randomly chosen
;; constants between -5.0 and 5.0. Note that for this problem the
;; presence of the constants actually makes it much harder, but that
;; may not be the case for other problems.
;; We'll the functions and the arities in a map.
(def function-table (zipmap '(+ - * pd)
'(2 2 2 2 )))
(defn random-function
[]
(rand-nth (keys function-table)))
(defn random-terminal
[]
(rand-nth (list 'x (- (rand 10) 5))))
(defn random-code
[depth]
(if (or (zero? depth)
(zero? (rand-int 2)))
(random-terminal)
(let [f (random-function)]
(cons f (repeatedly (get function-table f)
#(random-code (dec depth)))))))
;; And we have to define pd (protected division):
(defn pd
"Protected division; returns 0 if the denominator is zero."
[num denom]
(if (zero? denom)
0
(/ num denom)))
;; We can now evaluate the error of an individual by creating a function
;; built around the individual, calling it on all of the x values, and
;; adding up all of the differences between the results and the
;; corresponding y values.
(defn error
[individual]
(let [value-function (eval (list 'fn '[x] individual))]
(reduce + (map (fn [[x y]]
(Math/abs
(- (value-function x) y)))
target-data))))
;; We can now generate and evaluate random small programs, as with:
;; (let [i (random-code 3)] (println (error i) "from individual" i))
;; To help write mutation and crossover functions we'll write a utility
;; function that extracts something from an expression and another that
;; inserts something into an expression.
(defn codesize [c]
(if (seq? c)
(count (flatten c))
1))
(defn at-index
"Returns a subtree of tree indexed by point-index in a depth first traversal."
[tree point-index]
(let [index (mod (Math/abs point-index) (codesize tree))
zipper (zip/seq-zip tree)]
(loop [z zipper i index]
(if (zero? i)
(zip/node z)
(if (seq? (zip/node z))
(recur (zip/next (zip/next z)) (dec i))
(recur (zip/next z) (dec i)))))))
(defn insert-at-index
"Returns a copy of tree with the subtree formerly indexed by
point-index (in a depth-first traversal) replaced by new-subtree."
[tree point-index new-subtree]
(let [index (mod (Math/abs point-index) (codesize tree))
zipper (zip/seq-zip tree)]
(loop [z zipper i index]
(if (zero? i)
(zip/root (zip/replace z new-subtree))
(if (seq? (zip/node z))
(recur (zip/next (zip/next z)) (dec i))
(recur (zip/next z) (dec i)))))))
;; Now the mutate and crossover functions are easy to write:
(defn mutate
[i]
(insert-at-index i
(rand-int (codesize i))
(random-code 2)))
(defn crossover
[i j]
(insert-at-index i
(rand-int (codesize i))
(at-index j (rand-int (codesize j)))))
;; We can see some mutations with:
;; (let [i (random-code 2)] (println (mutate i) "from individual" i))
;; and crossovers with:
;; (let [i (random-code 2) j (random-code 2)]
;; (println (crossover i j) "from" i "and" j))
;; During evolution we'll maintain the population as a sequence of
;; [program error] pairs.
(defn pair-with-errors
"Returns a vector of [program error] pairs."
[programs]
(vec (map #(vector % (error %)) programs)))
(defn select
"Returns the program of the best [program error] pair in a tournament
set with the specified size, location, and radius."
[prog-err-pairs tournament-size location radius]
(let [limit (count prog-err-pairs)
tournament-set (repeatedly
tournament-size
#(nth prog-err-pairs
(mod (+ location
(- (rand-int (inc (* radius 2)))
radius))
limit)))]
(first (first (sort #(< (second %1) (second %2)) (vec tournament-set))))))
;; Now we can evolve a solution by starting with a random population and
;; repeatedly evaluating, checking for a solution, and producing a new
;; population.
(defn evolve
[popsize]
(println "Starting evolution...")
(loop [generation 0
population (pair-with-errors (repeatedly popsize #(random-code 2)))]
(let [sorted (sort #(< (second %1) (second %2)) population)]
(println "Generation:" generation)
(println "Best error:" (second (first sorted)))
(println "Best program:" (first (first sorted)))
(println " Median error:" (second (nth sorted
(int (/ popsize 2)))))
(println " Average program size:"
(float (/ (reduce + (map count (map flatten (map first population))))
(count population))))
(if (< (second (first sorted)) 0.1) ;; good enough to count as success
(println "Success:" (first (first sorted)))
(recur
(inc generation)
(pair-with-errors
(for [i (range popsize)]
(let [operator (rand)
tsize 7
radius 10]
(cond (< operator 0.5) (mutate (select population tsize i radius))
(< operator 0.75) (crossover (select population tsize i radius)
(select population tsize i radius))
:else (select population tsize i radius))))))))))
;; Run it with a population of 1000:
(evolve 1000)
;; Exercises:
;; - Remove the numerical constants and see how this affects problem-solving
;; performance.
;; - Add the "inc" function (arity 1) and see how this affects problem-solving
;; performance.
;; - Run this on a different data set of your own choosing.
;; - Replace various hard-coded parameters with variables or arguments to
;; allow for easier experimentation with different parameter sets.
;; - Add additional functions of various arities to the function set and see
;; how this affects problem-solving performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment