Skip to content

Instantly share code, notes, and snippets.

@valtih1978
Last active June 27, 2017 17:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save valtih1978/77e3ffde50a9a3cf19bf to your computer and use it in GitHub Desktop.
Save valtih1978/77e3ffde50a9a3cf19bf to your computer and use it in GitHub Desktop.
Describes how to exploint coin change algorithm + permutations for anagram search
package forcomp
object SicpAnagrams {
// Inplementing Odersky anagrams (FP Scala coursera) using SICP algorithm, http://mitpress.mit.edu/sicp/full-text/sicp/book/node16.html
// PART I: SICP count change algorithm
// Odersky recomends SICP book as the course resource material. Before taking the course,
// I studied it. Might be for this reason, I realized that the anagram assignment is the
// count change problem (exposed in SICP to demonstrate the power of recursion). Odersky
// upgrades it to teach recursion in Scala.
// The change problem finds out all integers (out of the given list) that sum up to a given
// sum. That is, if cc(4, List(1,2,3,5)) should produce 4 since the changes of 4 are
// 4 = 3+1 = 2+2 = = 2+1+1 = 1+1+1+1. Meantime, permutations 1+1+2, 1+2+1 and 2+1+1 are
// considered identical and only one of them contributes to the result.
// In SICP, count-change is defined as follows, (here is my Scala translation)
def cc(sum: Int, coins: List[Int]): Int = {
if (sum == 0) 1 else // match
if (sum < 0 || coins.isEmpty) 0 else
cc(sum, coins.tail) + cc(sum-coins.head, coins) // =choice declined + choice accepted
} //> cc: (sum: Int, coins: List[Int])Int
// there should be 4 solutions, 1+1+1+1 = 1+1+2 = 2+2 = 3+1 all = 4
cc(4, List(1, 2, 3, 5)) //> res0: Int = 4
// It can come in may variations, depending on what are our coin objects (and their values),
// what is counted as combination and whether we want to simple count the combinations or
// enumerate them (yes, the algorithm is constructive -- we can name all combinations
// that we counted!)
// Part II: Denominations vs. concrete coins
// The first question is if coins are exhaustible or not. SICP considers coins as denominations
// rather than concere objects so that they do not run out as you pay with them. This means that
// if you want to find out that sum, say 4, can be decomposed into ones, 5 = 1+1+1+1, it is
// sufficient to say cc(4, List(1)). Lookup with cc(5, List(1,1,1,1)) would be a stupid mistake
// since it says that you have 5 denominations of coin 1. It is however is possible to modify
// thetask that you cannot repeatedly pay with the same coin. Then it is very simple. Just replace
// coins => coins.tail in the last sum above:
var withRepetitions = true //> withRepetitions : Boolean = true
def accountRepetitions[A](coins: List[A]): List[A] = if (withRepetitions) coins else coins.tail
//> accountRepetitions: [A](coins: List[A])List[A]
def exhaustableCC(sum: Int, coins: List[Int]): Int = {
if (sum == 0) 1 else // match
if (sum < 0 || coins.isEmpty) 0 else
exhaustableCC(sum, coins.tail) + exhaustableCC(sum-coins.head, accountRepetitions(coins))
} //> exhaustableCC: (sum: Int, coins: List[Int])Int
withRepetitions = false
exhaustableCC(4, List(1, 2, 3, 5)) //> res1: Int = 1
// This is what was not clear to me in the anagram excesise, what is considered combination
// in this assignment: are words in the dictionary replacable when I pick then so that I pick
// the same word infinite amount of times or I can pick a word only once. That is, given sentence
// "aab", can I say that the "baa" is an anagram if dictionary contains only two words, "a" and
// "b"? I think that dictionary implies that words can be drawn unexhaustably, yet, when I star
// ted to fulfil this assignment and discovered the striking similarity with Odersky-advertized
// book, I was sure in opposite, then hesitated and asked a question, https://class.coursera.org/progfun-004/forum/thread?thread_id=1440
// Part III: enumeration of combinations instead of simple counting them
// That exercise asks to list all combinations of words whose letters add up to the letters of
// given sentence. The first thing is "list combinations" rather than count them. This can be
// fixed in two steps. First, convert recursion into accumulator
def aCC(sum: Int, coins: List[Int], acc:Int): Int = {
if (sum == 0) acc + 1 else // match
if (sum < 0 || coins.isEmpty) acc /*+ 0*/ else
aCC(sum-coins.head, accountRepetitions(coins), aCC(sum, coins.tail, acc))
} //> aCC: (sum: Int, coins: List[Int], acc: Int)Int
// results are the same
withRepetitions = true
aCC(4, List(1, 2, 3, 5), 0) //> res2: Int = 4
withRepetitions = false
aCC(4, List(1, 2, 3, 5), 0) //> res3: Int = 1
// Second, convert integer accumulator into List[List[Int]]. It will accumulate paths (lists of
// coins) that lead us to the solution
def collectChange(sum: Int, path: List[Int], coins: List[Int], acc: List[List[Int]]): List[List[Int]] = {
if (sum == 0) path :: acc else // match
if (sum < 0 || coins.isEmpty) acc /*+ 0*/ else {
val firstHalf = collectChange(sum, path, coins.tail, acc)
collectChange(sum-coins.head, coins.head :: path, accountRepetitions(coins), firstHalf) // choice taken
}
} //> collectChange: (sum: Int, path: List[Int], coins: List[Int], acc: List[List
//| [Int]])List[List[Int]]
withRepetitions = true
collectChange(4, Nil, List(1, 2, 3, 5), List())
//> res4: List[List[Int]] = List(List(1, 1, 1, 1), List(2, 1, 1), List(3, 1), L
//| ist(2, 2))
withRepetitions = false
collectChange(4, Nil, List(1, 2, 3, 5), List())
//> res5: List[List[Int]] = List(List(3, 1))
// You see now those combinations that we counterd earlier.
// Another thing that should not confuse you are the values. The coins have only one dimension whereas
// anagrams are addable per frequency components. That is, Odersky maps every word to its spectrum
// (he calls it "occurrences"): which says how many of each letter you have, e.g. "aab" would have
// specturm (2,1) and 0 for all other letters (zero occurrences should not be listed in the word spectrum).
// Specturm of a sentence is a sum of specturms of the words it consists of. Yes, the spectrums are
// (component-wise) additive and you can say that [1, -1] + [0, 5] = [1, 4]!
// You may have different words but they are anagrams since their letter occurrences match. This is
// as if your coins look different but still have the same deonminations.
// Now we can try adding "coins" (their specturms) from the dictionary and see if they sum up into our
// goal sentence specturm, in the same way as in SICP change-count! That is the same problem! Odersky has
// even specified the function subract(spectrum1, spectrum2), admitting that negative frequencies are
// possible if second argument has components larger than the first. That is exactly the (last) line
// of every our cc function, sum-c!
// Part 4: Generic count change
// Now we can look for the anagrams. But let's first factor out the generic part of SICP algorithm and
//equip it with user tuning so that user could choose which task is he going to solve with this
// combinatorial search.
abstract class CC_State[Choice, Result] {
def terminated(path: List[Choice], acc: Result): Option[(Result)] // if result is produced then terminate
def -(option: Choice): CC_State[Choice, Result]
protected def cc(path: List[Choice], options: List[Choice], acc: Result): Result =
terminated(path, acc) match {
case Some(result) => result // terminated
case None => options match {
case Nil => acc // no more options -- terminate
case c :: cs =>
(this-c).cc(c :: path, accountRepetitions(options), cc(path, cs, acc))
}
}
def cc(options: List[Choice]): Result
}
// We can now try how does the strategy work with simple counting
// Part 4.1: applying generic change to change counting
class CountChanges(sum: Int) extends CC_State[Int, Int] {
// our implementation ignores the path. We are interested in counting matches (acc) only.
override def terminated(path: List[Int], acc: Int) = {
if (sum == 0) Some(acc + 1) // budged converget to 0 -- we have got an anagram!
else if (sum < 0) Some(acc) // none was produced means that we have overshoot our budget -- no result
else None // not terminated -- keep on trying more coins
}
override def -(option: Int) = {
new CountChanges(sum - option) // next candidate to try, sum - dime_nomination
}
override def cc(options: List[Int]): Int =
cc(Nil, options, 0)
}
withRepetitions = true
new CountChanges(4) cc List(1, 2, 3, 5)
//> res6: Int = 4
withRepetitions = false
new CountChanges(4) cc List(1, 2, 3, 5)
//> res7: Int = 1
// and for path recording
// Part 4.3: applying generic coing change to collecting the change combinations
class CollectChanges(sum: Int) extends CC_State[Int, List[List[Int]]] {
// our implementation ignores the path. We are interested in counting matches (acc) only.
override def terminated(path: List[Int], acc: List[List[Int]]) = {
if (sum == 0) Some(path :: acc) // budged converget to 0 -- we have got an anagram!
else if (sum < 0) Some(acc) // none was produced means that we have overshoot our budget -- no result
else None // not terminated -- keep on trying more coins
}
override def -(option: Int) = {
new CollectChanges(sum - option) // next candidate to try, sum - dime_nomination
}
override def cc(options: List[Int]): List[List[Int]] =
cc(Nil, options, Nil)
}
withRepetitions = true
new CollectChanges(4) cc List(1, 2, 3, 5)
//> res8: List[List[Int]] = List(List(1, 1, 1, 1), List(2, 1, 1), List(3, 1), L
//| ist(2, 2))
withRepetitions = false
new CollectChanges(4) cc List(1, 2, 3, 5)
//> res9: List[List[Int]] = List(List(3, 1))
// Again, result is expexted. Now, we need some preliminary declarations for anagrams
// Part 4.4: applyging generic coin change to collecting anagrams
type Word = String
type Sentence = List[Word]
type Occurrences = List[(Char, Int)]
def wordOccurrences(w: Word): Occurrences = {
//w.groupBy(c => c.toLower).map({case (c, list) => (c -> list.length)}).toList
// It is more efficient to use default value, https://class.coursera.org/progfun-004/forum/thread?thread_id=1425
var map = Map[Char, Int]().withDefaultValue(0)
w.toLowerCase.foldLeft(map)((m, c) => m + (c -> (m(c) + 1))).toList
} //> wordOccurrences: (w: forcomp.SicpAnagrams.Word)forcomp.SicpAnagrams.Occurre
//| nces
wordOccurrences("aabc") //> res10: forcomp.SicpAnagrams.Occurrences = List((a,2), (b,1), (c,1))
def sentenceOccurrences(s: Sentence): Occurrences = wordOccurrences(s.flatten mkString "")
//> sentenceOccurrences: (s: forcomp.SicpAnagrams.Sentence)forcomp.SicpAnagrams
//| .Occurrences
sentenceOccurrences(List("aab", "aac")) //> res11: forcomp.SicpAnagrams.Occurrences = List((a,4), (b,1), (c,1))
// function used in the reduce step, notorious sum-c. I do not understand why Scala does not let me to
// put it into the the "-" method.
def subtract(x: Occurrences, y: Occurrences): Occurrences = {
// this implementation admits negatives
y.foldLeft(x.toMap.withDefaultValue(0)) {case (m, (k, v)) =>
val res = m(k) - v
if (res == 0) m - k else m.updated(k, res) // filter out empty items
}
}.toList //> subtract: (x: forcomp.SicpAnagrams.Occurrences, y: forcomp.SicpAnagrams.Occ
//| urrences)forcomp.SicpAnagrams.Occurrences
// Now, override the algorithm to collect word paths (senteces) that lead to specified sentece
// stpectrum by dressing it into anagram closes
class CollectAnagrams(s: Occurrences) extends CC_State[Word, List[Sentence]] {
def terminated(path: Sentence, acc: List[Sentence]) = {
if (s.length == 0) Some(path :: acc) // budged converget to 0 -- we have got an anagram
else if (s.toMap.values.exists(_ < 0)) Some(acc) // none was produced means that we have overshoot our budget -- no result
else None // not terminated -- keep on descending into the words
}
def -(option: Word) = new CollectAnagrams(subtract(s, wordOccurrences(option)))
def cc(options: List[Word]): List[Sentence] = cc(Nil, options, Nil)
}
withRepetitions = true
new CollectAnagrams(sentenceOccurrences(List("aab", "aac", "none"))) cc List("a", "b", "aac", "neon")
//> res12: List[forcomp.SicpAnagrams.Sentence] = List(List(neon, aac, b, a, a)
//| )
withRepetitions = false
new CollectAnagrams(sentenceOccurrences(List("aab", "aac", "none"))) cc List("a", "b", "aac", "neon")
//> res13: List[forcomp.SicpAnagrams.Sentence] = List()
// This generic implemenation is somewhat more concise than my alternative implementation, http://gist.github.com/valtih1978/534f853a13b51577164b
// That one seems to have too must structural decomposition, which allows for instance counting anagrams
// instead of collecting them, once you have anagram values and counting collector. This implementation
// however achieves conciseness at cost of coupling accumulator with value type.
// Purify anagram change. It looks much simpler without any genericness/customization.
// Part 4.5: anagram change algorithm puriried
def collectAnagrams(sum: Occurrences, path: Sentence, choices: Sentence, acc: List[Sentence]): List[Sentence] = {
if (sum.isEmpty) path :: acc else // match
choices match {
case c :: cs if sum.toMap.values.forall(_ >= 0) => {
val firstHalf = collectAnagrams(sum, path, cs, acc) // choice declained
collectAnagrams(subtract(sum, wordOccurrences(c)), c :: path, accountRepetitions(choices), firstHalf)} // choice taken
case _ => acc
}
} //> collectAnagrams: (sum: forcomp.SicpAnagrams.Occurrences, path: forcomp.Sic
//| pAnagrams.Sentence, choices: forcomp.SicpAnagrams.Sentence, acc: List[forc
//| omp.SicpAnagrams.Sentence])List[forcomp.SicpAnagrams.Sentence]
// test
withRepetitions = true
collectAnagrams(sentenceOccurrences(List("aab", "aac", "none")), Nil, List("a", "b", "aac", "neon"), Nil)
//> res14: List[forcomp.SicpAnagrams.Sentence] = List(List(neon, aac, b, a, a)
//| )
withRepetitions = false
collectAnagrams(sentenceOccurrences(List("aab", "aac", "none")), Nil, List("a", "b", "aac", "neon"), Nil)
//> res15: List[forcomp.SicpAnagrams.Sentence] = List()
// Part 5: Performance of Sicp Coin Change vs. Odersky Aanagram search
// You see, we may find the anagrams, using SICP coint change algorithm. The first thing that bothered
// me is that I had to increase stack size (creating another thread) http://stackoverflow.com/a/24468790/1083704
// since chainging anagram into ~50 000 "coins" (that was Oderski's) dictionary size in words means
// that we descend into recrusion that deep and may be deeper if we change with repetitions. Another
// thing that bothered me is that SICP algorithm considers changes 4+3 and 3+4 identical. It therefore
// produces only one of the anagrams. Yet, despite of that Odersy asked for all permutations. This sounds
// stupid if you do not have another algorithm in mind since, though we can permute the couple coin
// changes and runtime will not be increased since the most expensive part, looking through 50 000
// dictionary is over, this neverless adds complexity to the algorithm but no information to the user.
// The final thing is that our algorithm proceeded without using neither func combinations nor wordAnagrams
// /dictionaryByOccurrences and does not rely on sorted frequencies, which Odersky demanded. At this
// point I have resorted to the Internet for the right solution, https://gist.github.com/jkdeveyra/4030815.
// It does involve the Odersky recommeneded for-loops. Here is my translation of it to solve money cahnge
// problem
def for_loop(s: Int, coins: List[Int]): Int = {
def helper(s: Int): Int = {
if (s == 0) 1 else {
val canUse = (1 to s) filter ({ coins contains _})
canUse.foldLeft(0)((acc, c) => acc + helper(s-c))
}
}
helper(s)
} //> for_loop: (s: Int, coins: List[Int])Int
//repetitions do not matter here
for_loop(4, List(1, 2, 3, 5)) //> res16: Int = 7
// You see, we have got much more solutions because
// 4 = 1+1+1+1 = 2+1+1 = 3 + 1 = 2 + 2 is expanded with 3 more: 1+3, 1+2+1 and 1+1+2
// Which of the solutions is more efficient? Obviously SICP, since its search space is much smaller.
// It must be smaller because it does not consider some combinations while trying large dictionary.
// Right? Indeed, it occasionally enters recursion with negative sumes but quickly backtracks. Let's
// compare
def runTests (f: (String, Int, List[Int]) => Unit) {
//println ("bypassed the tests") ; return
List(
("4, (1,2,3,5)", 4, List(1,2,3,5)),
("25, (1 to 40)", 25, (1 to 40).toList),
("20, (1 to 40)", 20, (1 to 40).toList),
("15, (1 to 40)", 15, (1 to 40).toList),
("10, (1 to 40)", 10, (1 to 40).toList),
("10, (1 to 400)", 10, (1 to 400).toList),
("10, (1 to 4000)", 10, (1 to 4000).toList),
("10, (1 to 40000)", 10, (1 to 40000).toList)
) foreach (f.tupled (_))
// tests foreach {t => (compare _).tupled(t)}
} //> runTests: (f: (String, Int, List[Int]) => Unit)Unit
def performanceSICPvsOderskyLoops(title: String, sum: Int, coins: List[Int]) {
def format(n: Long): String = {
(n.toString.reverse.grouped(3).flatMap(s => ' ' + s) mkString "").reverse.trim
}
println(title)
var invocations: Long = 0
def sicp(s: Int, coins: List[Int]): Int = {
//println("cc " + (s, coins))
invocations += 1 ;
//var newInvocation = s :: coins ; if (invocations contains newInvocation) throw new Exception(newInvocation + " has repeated") ; invocations += newInvocation
if (s == 0) 1
else if (s < 0 || coins.isEmpty) 0
else sicp(s, coins.tail) + sicp(s-coins.head, coins)
}
val t = new Thread(null, new Runnable() {
override def run() {
val n = format(sicp(sum, coins))
println(" SICP = " + n + ", loops = " + format(invocations))
}
}, "good stack", 10 * 1000 * 1000)
t.start
t.join
var filtered: Long = 0 ; var recursions: Long = 0
def for_loop(s: Int, coins: List[Int]): Int = {
def helper(s: Int): Int = {
recursions += 1
if (s == 0) 1 else {
val canUse = (1 to s) filter ({ filtered += 1; coins contains _})
canUse.foldLeft(0)((acc, c) => acc + helper(s-c))
}
}
helper(s)
}
println(" Odersky = " + format(for_loop(sum, coins)) + ", filt " + format(filtered) + " , recur " + format(recursions))
} //> performanceSICPvsOderskyLoops: (title: String, sum: Int, coins: List[Int])
//| Unit
// for some reason scalac-compiled version hangs here
runTests(performanceSICPvsOderskyLoops) //> 4, (1,2,3,5)
//| SICP = 4, loops = 49
//| Odersky = 7, filt 8 , recur 15
//| 25, (1 to 40)
//| SICP = 1 958, loops = 491 423
//| Odersky = 16 777 216, filt 16 777 216 , recur 33 554 432
//| 20, (1 to 40)
//| SICP = 627, loops = 144 763
//| Odersky = 524 288, filt 524 288 , recur 1 048 576
//| 15, (1 to 40)
//| SICP = 176, loops = 36 561
//| Odersky = 16 384, filt 16 384 , recur 32 768
//| 10, (1 to 40)
//| SICP = 42, loops = 7 263
//| Odersky = 512, filt 512 , recur 1 024
//| 10, (1 to 400)
//| SICP = 42, loops = 77 103
//| Odersky = 512, filt 512 , recur 1 024
//| 10, (1 to 4000)
//| SICP = 42, loops = 775 503
//| Odersky = 512, filt 512 , recur 1 024
//| 10, (1 to 40000)
//| SICP = 42, loops = 7 759 503
//| Odersky = 512, filt 512 , recur 1 024
// You see, SICP is better indeed for the small dictionary sizes. When we have nominals from 1
// to 40 and ask to change some small sums, ranging from 10 to 25, we see that Odersky for-loops
// blow up in the number of produced combinations (and for loops/recursions executed). Amount
// of recursions in SICP also grows exponentially but with factor of x4 rather than x33 that we
// have for Odersky filtered loops. Odersky on the other hand performs much better in the case
// of large dictionaly of useless words. As it finds all 512 combinations, it stops growing as
// we add extreemly large coins in our dictionary of denominations. So, it might be a better
// solution for the assignment, indeed.
// Part 6: Filtering candidate words for performance
// However, we can revise why does Odersky outperform SICP. That is because of filtering.
// Instead of descending 50 000 times into stack (when rejecting the option, for instance)
// we are often left with a very small sum and considering all those choinces makes no sense.
// We can therefore combine both algorithms: make SICP but with preliminary denomination filtering
// so that we do not pass the useless denominations down the stack. This will also eliminate
// the stackoverflow condition. We then can build permutaions to satisfy odersky assignment.
def cc2(s: Int, coins: List[Int]): Int = {
if (s == 0) 1 else {
val canUse = coins filter (_ <= s)
canUse match {
case Nil => 0
case c :: cs => cc2(s, cs) + cc2(s-c, canUse)
}
}
} //> cc2: (s: Int, coins: List[Int])Int
cc2(4, List(1,2,3,5)) //> res17: Int = 4
def cc2Performance(title: String, sum: Int, coins: List[Int]): Any = {
var iterations: Long = 0
def cc2(s: Int, coins: List[Int]): Int = {
//println("cc " + (s, coins))
iterations += 1 ;
//var newInvocation = s :: coins ; if (invocations contains newInvocation) throw new Exception(newInvocation + " has repeated") ; invocations += newInvocation
if (s == 0) 1 else {
val canUse = coins filter (_ <= s)
canUse match {
case Nil => 0
case c :: cs => cc2(s, cs) + cc2(s-c, canUse)
}
}
}
val cc = cc2(sum, coins)
println("cc(" + title + " = " + cc + " with " + iterations + " iterations")
} //> cc2Performance: (title: String, sum: Int, coins: List[Int])Any
runTests(cc2Performance) //> cc(4, (1,2,3,5) = 4 with 21 iterations
//| cc(25, (1 to 40) = 1958 with 18591 iterations
//| cc(20, (1 to 40) = 627 with 5427 iterations
//| cc(15, (1 to 40) = 176 with 1367 iterations
//| cc(10, (1 to 40) = 42 with 277 iterations
//| cc(10, (1 to 400) = 42 with 277 iterations
//| cc(10, (1 to 4000) = 42 with 277 iterations
//| cc(10, (1 to 40000) = 42 with 277 iterations
// Beaten! Odersky says that his algorithm can be further imporved by caching the invocations.
// Repeated computation is still possible in the SICP solution, http://cs.stackexchange.com/a/28071/2879.
// Let's leave the caching for optional enthuziasts.
// Part 7: Permuting
// In order to fulfil the Odersky assignment, we need to expand our "coin changes" with permutations.
// Let's start taking our solution from Part 4.5 and pass only suitable coins into recursion
def sicp4Anagrams(sentence: Sentence, dictionary: Sentence): List[Sentence] = {
/**Leaves only choices which can be subtracted from sum without producing negative frequencies*/
// simple filter, do not use combinations/dictonaryByOccurrences.
def filterSuitableOccurrences(sum: Occurrences, choices: List[Word]): Sentence = choices filter {
choice => subtract(sum, wordOccurrences(choice)).forall({case (c, int) => int >= 0})
}
def aa2(sum: Occurrences, path: Sentence, coins: Sentence, acc: List[Sentence]): List[Sentence]= {
if (sum.isEmpty) path :: acc else {
filterSuitableOccurrences(sum, coins) match {
case Nil => acc
case c :: cs => {
val chained = aa2(sum, path, cs, acc)
aa2(subtract(sum, wordOccurrences(c)), c :: path,
//cs // without repetitions
coins // with repetitions
, chained)
}
}
}
}
// and permuting from http://valjok.blogspot.com/2014/07/permutations.html
def permutationsGeneric[A](chars: List[A], n: Int/* = chars.length*/): List[List[A]] = {
import scala.collection.immutable.Queue
type Set = Queue[Bubble]
type Bubble = (A, Int)
type Result = List[List[A]]
def siblings(set: (Bubble, Set), offset: Int, path: List[A], acc: Result): Result = set match {case ((char, avail), queue) =>
val children = descend(if (avail - 1 == 0) queue else queue.enqueue(char -> {avail-1}), char :: path, acc) // childern can reuse the symbol while if it is available
if (offset == 0) children else siblings(queue.enqueue((char, avail)).dequeue, offset - 1, path, children)
}
def descend(set: Set, path: List[A], acc: Result): Result = {
if (path.length == n) path :: acc else siblings(set.dequeue, set.size-1, path, acc)
}
val set = Queue[Bubble]((chars.toList groupBy (c => c) map {case (k, v) => (k, v.length)}).toList: _*)
val res = descend(set, List.empty, List.empty)
//println("there are " + res.length + " multiset " + n + "-permutations for " + set + " = " + res)
res
}
val sicp = aa2(sentenceOccurrences(sentence), Nil, dictionary, Nil)
sicp flatMap (result => permutationsGeneric(result, result.length)) // permute for extra results
} //> sicp4Anagrams: (sentence: forcomp.SicpAnagrams.Sentence, dictionary: forco
//| mp.SicpAnagrams.Sentence)List[forcomp.SicpAnagrams.Sentence]
//withRepetitions = true // replacements hardcoded to "true" for anagrams
assert(sicp4Anagrams(List("Linux"), List("lin", "ux")).toSet == List(List("lin", "ux"), List("ux", "lin")).toSet)
// Let's compare with a reference implementation from the Internet.
def sicpVsForAnagrams(sentence: Sentence, dictionary: Sentence) {
val forloop = OderskyWayAnagrams.sentenceAnagrams(sentence, dictionary)
val sicp = sicp4Anagrams(sentence, dictionary)
// length check to ensure that toSet does not collapse anything
assert(forloop.length == sicp.length && forloop.toSet == sicp.toSet, forloop + " != " + sicp)
println("both algorithms produced " + sicp)
} //> sicpVsForAnagrams: (sentence: forcomp.SicpAnagrams.Sentence, dictionary: f
//| orcomp.SicpAnagrams.Sentence)Unit
sicpVsForAnagrams(List("aab"), List("a", "b"))
//> both algorithms produced List(List(b, a, a), List(a, b, a), List(a, a, b))
//|
def main(args: Array[String]) {
OderskyWayAnagrams.main_internal(args, sicp4Anagrams _)
} //> main: (args: Array[String])Unit
// I will be thankful to anybody who will teach me how to avoid the withRepetitions variable.
// Using constant only implies that I must pass withRecursion in every cc recursion declaraion
// /invocation. This is tedious. I could curr the argument out, as http://scastie.org/5812
// exemplifies in part I. This makes withRepetitions visible to all recursive functions/calls.
// Nevertheless, it is also tedious (you basically need to create another wrapper function)
// and not amenable to passing part of the recursion state using objects, as I did here in
// Part 4, eliminates even this ephimerical advantage.
}
object OderskyWayAnagrams {
type Word = String
type Sentence = List[Word]
type Occurrences = List[(Char, Int)]
def wordOccurrences(w: Word): Occurrences = {
// w.groupBy(c => c.toLower).map({case (c, list) => (c -> list.length)}).toList.sorted
// It is more efficient to use default value, https://class.coursera.org/progfun-004/forum/thread?thread_id=1425
var map = Map[Char, Int]().withDefaultValue(0)
w.toLowerCase.foldLeft(map)((m, c) => m + (c -> (m(c) + 1))).toList.sorted
}
wordOccurrences("aabc")
def sentenceOccurrences(s: Sentence): Occurrences = {
wordOccurrences(s.flatten mkString "")
}
// Sortedness of occurrences must be maintained here, which was not necessary for mine SICP alorithm
def expand(acc: List[Occurrences], occurrence: (Char, Int)): List[Occurrences] = occurrence match {
case (c, int) => (for (i <- (0 to int) ; ao <- acc) yield (if (i == 0) ao else {(c, i) :: ao})).toList
}
val el: List[Occurrences] = List(List())
def combinations(occurrences: Occurrences): List[Occurrences] = {
occurrences.reverse.foldLeft(el) (expand)
}
def sentenceAnagrams(sentence: Sentence, dictionary: Sentence): List[Sentence] = {
val dictionaryByOccurrences: Map[Occurrences, List[Word]] = dictionary.groupBy(wordOccurrences(_))
def subtract(x: Occurrences, y: Occurrences): Occurrences = {
// Actually implementation must be that short but unfortunately Eclipse plugin
// does not support object references, http://scala-ide-portfolio.assembla.com/spaces/scala-ide/support/tickets/1002172-referencing--object--predefined-declarations
//SicpAnagrams.subtract(x, y).sortWith((a,b) => a. _1 < b._1)
y.foldLeft(x.toMap.withDefaultValue(0)) {case (m, (k, v)) =>
val res = m(k) - v
if (res == 0) m - k else m.updated(k, res) // filter out empty items
}
}.toList.sortWith((a,b) => a. _1 < b._1)
def computeAnagrams(occ: Occurrences): List[Sentence] = {
//println(occ + " consists of " + combinations(occ))
if (occ.isEmpty) List(List())
else
for {
x <- combinations(occ)
y <- dictionaryByOccurrences.getOrElse (x, List())
z <- computeAnagrams(subtract(occ, x))
} yield {
//println("yielding " + y + " :: " + z)
y :: z
}
}
computeAnagrams(sentenceOccurrences(sentence))
}
val sampleSentence = List("aab", "aac", "none")
val sampleDict = List("a", "b", "aac", "neon")
//sentenceAnagrams(sampleSentence, sampleDict)
//sentenceAnagrams(List("yes", "man"), List("yem", "san"))
sentenceAnagrams(List("a"), List("aa"))
sentenceAnagrams(List("a"), List("a"))
sentenceAnagrams(List("a"), List("a", "a"))
sentenceAnagrams(List("aa"), List("a"))
sentenceAnagrams(List("aa"), List("a", "a"))
def main_internal(args: Array[String], method: (Sentence, Sentence) => List[Sentence]) {
def mkList(str: String) = str.split(" +").toList
if (args.length < 2) {
println("supply a sentence to anagram. Another argument must be adictionary. If it is -l then third arg must be a word file")
println("example: \"" + (sampleSentence mkString " ") + " \" \"" + (sampleDict mkString " ") + "\"")
sys.exit(1)
}
val dict = if (args(1) == "-l") {
val res = scala.io.Source.fromFile(args(2)).getLines.toList
println("loaded dict of " + res.length + " words")
res
} else mkList(args(1))
val time1 = System.currentTimeMillis()
val result = method(args(0).split(" ").toList, dict)
val timeDelta = (System.currentTimeMillis() - time1) / 1000
println(timeDelta + " sec to compute " + result. length + " anagrams: ")
result map (list => list mkString(" ")) foreach println
//println("running " + mkList(args(0)) + " with dict = " + dict)
}
def main(args: Array[String]) {
main_internal(args, sentenceAnagrams _)
}
// Benchmarks on linux.txt, 2gb per heap:
// You see mine SICP-based algorithm is noticably faster though not orders of magnitude
// as I expected. It is also more concise since we can combine two well-known altorithms
// instead of writing a lot of custom code.
/*
1. yes man think joke (Sicp 91 sec to compute 1381098 anagrams) (OderskyWay 130 sec
2. yes man think joke (Sicp 86 sec to compute 1381098 anagrams) (OderskyWay 133 sec
3. yes man think joke (Sicp 93 sec to compute 1381098 anagrams) (OderskyWay 127 sec
4. yes man think joke (Sicp 91 sec to compute 1381098 anagrams) (OderskyWay 134 sec
5. yes man think joke (Sicp 85 sec to compute 1381098 anagrams) (OderskyWay 125 sec
1. welcom chopskick fa (Sicp 158 sec to compute 1381380 anagrams) (OderskyWay 274 sec
2. welcom chopskick fa (Sicp 162 sec to compute 1381380 anagrams) (OderskyWay 302 sec
3. welcom chopskick fa (Sicp 157 sec to compute 1381380 anagrams) (OderskyWay 287 sec
4. welcom chopskick fa (Sicp 160 sec to compute 1381380 anagrams) (OderskyWay 274 sec
5. welcom chopskick fa (Sicp 154 sec to compute 1381380 anagrams) (OderskyWay 278 sec
1. Odersky Anagrams (Sicp 164 sec to compute 4385844 anagrams) (OderskyWay 194 sec
2. Odersky Anagrams (Sicp 163 sec to compute 4385844 anagrams) (OderskyWay 189 sec
3. Odersky Anagrams (Sicp 161 sec to compute 4385844 anagrams) (OderskyWay 188 sec
4. Odersky Anagrams (Sicp 164 sec to compute 4385844 anagrams) (OderskyWay 176 sec
5. Odersky Anagrams (Sicp 167 sec to compute 4385844 anagrams) (OderskyWay 178 sec
*/
// SICP is 2x faster if character cases are not converted
}
// In 2017, based on powerset
def powerset(chars: List[Char]): List[String] = chars match {
case Nil => List("")
case x :: xs =>
val subPowerset = powerset(xs)
subPowerset.foldLeft(subPowerset){case (acc, str) => (x + str) :: acc}
} //> powerset: (chars: List[Char])List[String]
powerset("abc".toList) //> res4: List[String] = List(a, ac, abc, ab, b, bc, c, "")
// I have implemented combinations the long way
//https://www.coursera.org/learn/progfun1/discussions/weeks/6/threads/RkuD2z5iEead6Qo6D3cLkQ/replies/9Zbl_1TPEee3RwoqcUym2A/comments/dwHpm1qKEeeqKgpTZZjjFg
def combinations(chars: Occurrences): List[Occurrences] = chars match {
case Nil => List(Nil)
case (c, freq) :: tail =>
val subPowerset = combinations(tail)
subPowerset.foldLeft(subPowerset){case (acc, str) =>
(1 to freq).foldLeft(acc){case (acc, freq) => (c -> freq :: str) :: acc}
}
} //> combinations: (chars: worksheet6.Occurrences)List[worksheet6.Occurrences]
combinations(List('a'->1, 'b'->1)) //> res4: List[worksheet6.Occurrences] = List(List((a,1)), List((a,1), (b,1)),
//| List((b,1)), List())
combinations(List('a'->2, 'b'->1)) //> res5: List[worksheet6.Occurrences] = List(List((a,2)), List((a,1)), List((a
//| ,2), (b,1)), List((a,1), (b,1)), List((b,1)), List())
combinations(List('a'->2, 'b'->2)) //> res6: List[worksheet6.Occurrences] = List(List((a,2)), List((a,1)), List((a
//| ,2), (b,1)), List((a,1), (b,1)), List((a,2), (b,2)), List((a,1), (b,2)), Li
//| st((b,2)), List((b,1)), List())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment