Idiomatic way of implementing zipWithIndex
on List[A]
in FP is using a State Monad,
import scalaz.State
def zipWithIndex[A](list: List[A]): List[(A, Int)] =
list.foldLeft(State.state[Int, List[(A, Int)]](List.empty))({ (state, a) =>
for {
ls <- state
i <- State.get[Int]
_ <- State.put[Int](i + 1)
} yield (ls :+ (a, i))
}).eval(0)
There are two problems for this impl. First, it is appending item to the List
during fold
, which brings over performance penalty. The second one is more severe. Can you find out what that is? Let's try this implementation through console anyway,
scala> zipWithIndex(List.fill(10)('a'))
res1: List[(Char, Int)] = List((a,0), (a,1), (a,2), (a,3), (a,4), (a,5), (a,6), (a,7), (a,8), (a,9))
scala> zipWithIndex(List.fill(100)('a')).size
res2: Int = 100
so far so good. Now let's try a list with really large size,
scala> zipWithIndex(List.fill(100000)('a')).size
java.lang.StackOverflowError
at scalaz.IndexedStateT$$anonfun$flatMap$1.apply(StateT.scala:62)
at scalaz.IndexedStateT$$anon$10.apply(StateT.scala:95)
at scalaz.IndexedStateT$$anonfun$flatMap$1.apply(StateT.scala:62)
at scalaz.IndexedStateT$$anon$10.apply(StateT.scala:95)
at scalaz.IndexedStateT$$anonfun$flatMap$1.apply(StateT.scala:62)
at scalaz.IndexedStateT$$anon$10.apply(StateT.scala:95)
...
Ah - the nested flatMap
call on State
monad blows the stack space, in other words, Fold
on State
is not stack safe! Now that we know what the problem is, is there a way we can get away with it? In FP, whenever recursion raises issue of SOE, one should consider refactoring the code by trampolining it. Can we count on Trampoline
for the rescue in this case? Yes, we can!
First of all, we need a Monad Transformer of State
on Trampoline
. Let's implement that,
import scalaz.{ StateT, Free, Trampoline }
type TrampolineState[S, A] = StateT[Free.Trampoline, S, A]
object TrampolineState {
def apply[S, A](f: S => (S, A)) = StateT[Free.Trampoline, S, A] { s => Trampoline.done(f(s)) }
def get[S]: TrampolineState[S, S] = TrampolineState { s => (s, s) }
def put[S](s: S): TrampolineState[S, Unit] = TrampolineState { _ => (s, ()) }
}
Now, let's re-impement zipWithIndex
using TrampolineState
.
import scalaz.std.list._
import scalaz.syntax.foldable._
type M[A] = TrampolineState[Int, A]
def zipWithIndex2[A](list: List[A]): List[(A, Int)] =
list.foldLeftM[M, List[(A, Int)]](List.empty)({ (ls, a) =>
for {
i <- TrampolineState.get[Int]
_ <- TrampolineState.put[Int](i + 1)
} yield (ls :+ (a, i))
}).eval(0).run
Do not worry about what foldLeftM
is. It is just a generic version of foldLeft
to fold on top of type M
as long as M
itself is a monad. Ok, let's give our second draft a try,
scala> zipWithIndex2(List.fill(10)('a'))
res13: List[(Char, Int)] = List((a,0), (a,1), (a,2), (a,3), (a,4), (a,5), (a,6), (a,7), (a,8), (a,9))
scala> zipWithIndex2(List.fill(100)('a')).size
res14: Int = 100
so far so good - let's now try the one that blows the stack space for the 1st draft of zipWithIndex
,
scala> zipWithIndex2(List.fill(100000)('a')).size
res16: Int = 100000
Yay, so the new implementation is indeed stack safe now ! Note that it does take more than usual to have the result returned, since, remember, we are doing list appending all the time in the fold
operation.
But this implementation is still a little unsatisfying in that the type alias of M
seems to be unnecessary. Let's try if we can get rid of it,
scala> :paste
// Entering paste mode (ctrl-D to finish)
def zipWithIndex2[A](list: List[A]): List[(A, Int)] =
list.foldLeftM[({type M[x] = TrampolineState[Int, x]})#M, List[(A, Int)]](List.empty)({ (ls, a) =>
for {
i <- TrampolineState.get[Int]
_ <- TrampolineState.put[Int](i + 1)
} yield (ls :+ (a, i))
}).eval(0).run
// Exiting paste mode, now interpreting.
<console>:20: error: could not find implicit value for parameter M: scalaz.Monad[[x]scalaz.IndexedStateT[[A]scalaz.Free[Function0,A],Int,Int,x]]
list.foldLeftM[({type M[x] = TrampolineState[Int, x]})#M, List[(A, Int)]](List.empty)({ (ls, a) =>
Urgh the type projection does not work, as we are hitting the limitation of Scala compiler here :-(
(Miles Sabin recently opened a pull request (https://github.com/milessabin/si2712fix-plugin) to address SI-2712. If accepted, foldLeftM
should be able to be called without specifying types explicitly. Daniel Spiewak has a great gist to introduce it: https://gist.github.com/djspiewak/7a81a395c461fd3a09a6941d4cd040f2)
Is there still a way to reduce the boilerplate at all? Actually, yes - thanks to some very genius work done in scalaz
.
import scalaz.std.list._
import scalaz.syntax.traverse._
def zipWithIndex3[A](list: List[A]): List[(A, Int)] =
list.traverseU { a =>
for {
i <- TrampolineState.get[Int]
_ <- TrampolineState.put[Int](i + 1)
} yield (a, i)
} eval 0 run
traverseU
makes use of an implicit Unapply
instance to help compiler figure out the right type to traverse the List with. Check out Travis Brown's blog if you are interested in more details about that,
https://meta.plasm.us/posts/2015/07/11/roll-your-own-scala/