Instantly share code, notes, and snippets.

# djspiewak/0introduction.md

Last active November 28, 2023 15:03
Show Gist options
• Save djspiewak/2ae2570c8856037a7738 to your computer and use it in GitHub Desktop.
Scala Collections Proposal

# Collections Redesign Proposal

I'm going to start off by motivating what I'm doing here. And I want to be clear that I'm not "dissing" the existing collections implementation or anything as unproductively negative as that. It was a really good experiment, it was a huge step forward given what we knew back in 2.8, but now it's time to learn from that experiment and do better. This proposal uses what I believe are the lessons we can learn about what worked, what didn't work, and what is and isn't important about collections in Scala.

This is going to start out sounding really negative and pervasively dismissive, but bear with me! There's a point to all my ranting. I want to be really clear about my motivations for the proposal being the way that it is.

## Problems

### Generic Interfaces

Not. Good. At least, not in the way that they're used by the existing Scala collections library, or the Java collections library for that matter. Not good at all. Here's why:

```def foo(is: Seq[Int]): MyThingy = {
// ...
}```

What can we do with `is` here? Well, a better question is what can we sanely do, because the Scala collections library allows us to do a lot of things that we really shouldn't. For example, we can run `is :+ 42` and get a result of type `Seq[Int]`. Does that even make sense? What if someone chooses to pass in a `Seq` where prepend is efficient by append isn't, like `List`? What sort of collection do we get as a result? Does it share any structure with the input? I have no idea.

Here's an even better example. Let's fill out the implementation of this function a bit:

```def foo(is: Seq[Int]): MyThingy = {
val t = new MyThingy
for (idx <- 0 until is.length) {
t.accumulate(is(idx))
}
t
}```

If I showed you this code and asked you what the big-O complexity of it was, could you tell me the answer? No, you cannot, because it depends on the runtime type of `is`! Collections and complexity are inherently tied together. When you ask for a value of type `Seq`, what you're saying is that you want a "bunch of stuff", but you don't care about the efficiency of accessing or manipulating that stuff.

Look at it this way: do you ever create a value of type `Seq`? I mean, the `Seq` companion object does offer a constructor method. It's totally possible. And yet, we never ever do it. So why do we constantly ask for these values?

Choosing a collection is a very deliberate and intentional thing. We want to select our collection to match the complexity bounds of our problem. Encouraging (and even enabling!) users to work in terms of generic types that throw away this information is wrong, and very very much an anti-pattern.

There are some useful abstraction lines that we can draw, but something as blunt as `Seq` simply isn't going to cut it, and it encourages absolutely awful behavior.

And of course, the best part about all of this is that the above type signature isn't even right. If we really wanted to do this correctly, we would have to do something like the following:

`def foo[CC[a] <: Seq[a]](is: CC[Int])(implicit cbf: CanBuildFrom[CC, Int, CC]): MyThing = ...`

You get the picture.

#### `Set` vs `Set`

Have you ever gotten a type error that says something about "expected `Set[A]`, got `scala.collection.immutable.Set[A]`"? Yeah. That should never ever happen. The fact that this even exists is terrifying and confusing, and they are honestly the type errors that I dread the most when working with Scala (monad transformer stacks are easier to diagnose!).

These errors, of course, come from the parallel and inherited hierarchy between mutable and immutable collections. Frankly, I don't find this hierarchy to be even slightly useful. If I have a mutable collection, I don't want to pass it around masquerading behind a generic immutable interface. It's just not a helpful feature, and it causes a lot of headache. The mutable collections are necessary, and some of them (e.g. `ListBuffer` and `TrieMap`) are phenomenally useful, but in no way do they need to be part of any shared pantheon.

### Implementation Inheritence

Half of the performance issues in the current collections library stem from specific collection `A` inheriting an implementation that is shared with `B` and `C`, but which was written against `Iterable` and thus is not optimized for any of the above! The `map` function is an amazingly apt example of this. If you run `map` against `Vector`, it will create a new vector, incrementally, building from left to right and growing the trie through successive array copies, inserting the data with a head pointer each time. This is tremendously less efficient (in constant factors), than what could be done with a custom `map` implementation on `Vector`, which is to say allocate the entire trie at once, in place, and then mutate the data in a nice tight loop amenable to cache line optimization.

`map` is a trivial example, but you get the idea. The current collections library pervasively inherits implementations, mostly because it is tremendously convenient to do so, and all of those implementations fall into the category of "functional, but not great". As a heavy user of the collections library, this becomes a big problem, since I have to think very carefully about what methods are inherited and what methods are overridden and where the performance pitfalls might be. The `IList` type in Scalaz very notably replicates the functionality of `List`, but because it is using specialized functions that were implemented specifically for `IList`, its performance numbers are generally much better than those of `List`. And this is the most basic collection imaginable! (other than perhaps `Option`)

Implementation inheritence was originally done because of the large number of duplicated implementations between collections in 2.7, and the number of "gaps" in functionality, where a method you expected to exist everywhere only existed on certain collections, or returned the wrong type. This is a problem that is resolvable in two ways that are not implementation inheritence. First, if your implementations are really copy/paste copies of each other, then why do you have two separate collections? What is the difference between the two at that point? Second, what utilities do we really need and where do they make sense? Implementation inheritence gives us everything everywhere, and with the same type signatures, but that's not necessarily what we want.

### `CanBuildFrom`

You knew it was coming... I have never, ever called `map` on `BitSet` passing a function of type `Int => String`. Not even once. The majority of the schenanigans in `CanBuildFrom` exist to support the following three features:

• Bizarre `map`s on things
• `BitSet`
• `String`
• `SortedSet`
• `TypeTag` capturing on `Array`
• Mapping on `Range`

Honestly, I just don't need those features, and goals like making `String` and `Array` "collectiony" can be done in better ways, so long as we're not trying to inherit implementations. As for `Range`, that's a problem of specialization (our lazy collection type only works on numbers).

### Custom Collections

Have you ever tried to implement a new collection that sits in the collections hierarchy? I have. It's awful. It is unbelievably difficult to get right. And as a consequence, NO ONE DOES IT! Pretty much everyone just sits outside of the collections hierarchy and has conversion methods to and fro, because that's a radically easier thing to do. And so, part of the major selling point of `CanBuildFrom`, all of the `Like` traits and the inheritence, simply goes away! These aren't useful features because no one can use them, because they're far to complicated and arcane.

It's also worth noting that custom collection types have been hard-broken in the past, even in a way that cannot be cross-built around, with the conversion from `Traversable` to `GenTraversable` in 2.9. So the compatibility precedent isn't particularly good here, and seeing as I was basically the only one who complained about that particular breakage at the time, I feel confident in saying that very few (if any) would notice if we broke it again.

## Goals

Here's what we should strive for:

• Transparent efficiency, declared in the types (use `List` if you mean `List`; `Vector` if you mean `Vector`; etc)
• Implementation reuse of things that are trivially and efficiently defined in terms of other top-level functions
• `cbf() ++= xs` is not a "top-level" function
• `zip` defined in terms of `zipWith` is a good example
• `sum` defined in terms of `fold` is another good example
• It's very easy to go too far here. Only things that really make sense!
• Abstraction along higher-level boundaries, offering less and more tailored power
• If you just want to `map` over a collection, require `Functor`
• If you just want to reduce a collection (e.g. `sum`, etc), require `Foldable`
• Convenient and seamless syntax
• Convenient and seamless errors
• Types that you can put in front of a beginner without hand-waving about "use-case"
• Useful extensibility
• Defining your own collection that can be abstracted over in the same ways as built-ins
• Getting shared implementations "for free" where it makes sense (e.g. `zipWith`)
• Surface compatibility, with deprecation
• I hate this restriction, but I think it's necessary
• I will give the caveat though that, any code that uses `CanBuildFrom` explicitly, any code which uses `GenTraversable` or any of the crazy inheritence stuff, is to be considered "out of scope"
• Highly bizarre and never-used "misfeatures" (such as mapping a `BitSet` to `String`) are out of scope

The compatibility goal is really important, because every piece of Scala code in existence touches the collections library. However, I have no qualms about hurling vast quantities of deprecation warnings at people who are doing bad things that I want to get rid of. And I also have no qualms about breaking code that exploits internal machinery and implementation details of the current collections API, which is exactly what `CanBuildFrom` is.

## Implementation

Without further ado, have some code. Obviously this is a rough-ish sketch, but I tried to show enough to convince myself (and others) that it would be possible to do everything. The collections library is huge.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 sealed trait List[+A] { def map[B](f: A => B): List[B] = this match { case head :: tail => f(head) :: (tail map f) case Nil => Nil } } object List { implicit object instances extends Applicative[List] { /* ... */ } } final case class ::[+A](head: A, tail: List[A]) extends List[A] case object Nil extends List[Nothing] // btw, we should have a specialized SingletonVector and EmptyVector instance, because tiny vectors are VERY common and VERY slow final class Vector[+A] private (trie: Array[AnyRef], val length: Int) { // note that we can even optimistically respecialize here! def map[B](f: A => B): Vector[B] = { val trie2 = allocateTrie(length) var i = 0 while (i < length) { // mutate in place! trie2(i) = trie(i) i += 1 } new Vector[B](trie2, length) } } object Vector { implicit object instances extends Applicative[List] { /* ... */ } } // I have ALWAYS wanted these object +: { def unapply[A](vec: Vector[A]): Option[(A, Vector[A])] = ??? } object :+ { def unapply[A](vec: Vector[A]): Option[(Vector[A], A)] = ??? } // type Map[A, B] = HashMap[A, B] final class HashMap[A, B] private (trie: Array[AnyRef], val size: Int) { def mapValues[C](f: B => C): HashMap[A, C] = ??? // could tuple it for compatibility if we wanted def map[C](f: (A, B) => C): HashMap[A, C] = ??? } object HashMap { // conspicuously absent here: there is no Functor[HashMap]! I'm not going to get all religious on this one, but it DOES get weird to implement // there would, of course, be a Foldable[HashMap], where fold = foldLeft = foldRight. } // typeclasses // Mappable/CanMap might be less scary of a name, but it doesn't matter trait Functor[F[_]] { def create[A](a: A): F[A] def map[A](fa: F[A])(f: A => B): F[B] } object Functor { def apply[F[_]](implicit F: Functor[F]) = F // oh yeah, I went there! implicit object ArrayFunctor extends Functor[Array] { // "but how?!", you ask // "ah, through the magic of respecialization!", I answer // // it is possible to efficiently inspect the runtime types involved here (notably, the functions) // and reconstruct the type tags we need to perform this operation. not only that, but the process // of doing this inspection allows us to safely cast our way into efficient, unboxed access in tight loops // // note that we have more information with which to do this sort of thing than we ever used to, // due to the magic of specialization on Function1 and the explicit nature of TypedTag. In other words, // we're not recreating the 2.7 arrays madness. we're simply getting sane arrays that finally interop // in the way you would expect, without imposing the assumptions and inefficiencies of a collections // framework on top of them def map[A, B](arr: Array[A])(f: A => B): Array[B] = ??? } } implicit class FunctorSyntax[F[_], A](self: F[A])(implicit F: Functor[F]) { def map[B](f: A => B): F[B] = Functor[F].map(self)(f) } // views // TODO insert jsuereth's proposal, because it's awesome and works perfectly here // note that in his proposal, his View would have a Foldable instance // if you want to force a view, you just use one of the reductions on Foldable (which could include stuff like toList/toVector, if we want) // ranges are just views, btw. this completely sidesteps the problem of mapping a Range with a function that returns String // the problem it introduces is that mapping ranges, in general, would be lazy. I don't see a way around this, unfortunately. // could catch it relatively easily with a compiler warning, but there's no question that it would be a breaking semantic change. // // foreach would continue to work as expected (since foreach would be a forcing reduction to Unit) // Zippable? trait Applicative[F[_]] extends Functor[F] { // I think this is a more accessible definition of applicative than apply2 // TODO actually this is not a very useful definition. revision coming soon... def zipWith[A, B, C](fa: F[A], fb: F[B])(f: (A, B) => F[C]): F[C] def zip[A, B](fa: F[A], fb: F[B]): F[(A, B)] = zipWith(fa, fb) { (_, _) } } // Traversable? trait Foldable[F[_]] { // folds in an unspecified order, for efficiency def fold[A, B](fa: F[A])(init: B)(f: (B, A) => B): B def foldLeft[A, B](fa: F[A])(init: B)(f: (B, A) => B): B def foldRight[A, B](fa: F[A])(init: B)(f: (A, B) => B): B def length[A](fa: F[A]): Int = ??? def sum[A](f: F[A])(implicit N: Numeric[A]): A = ??? // more reductions here! } trait Indexable[F[_]] extends Foldable[F] { def get[A](fa: F[A])(idx: Int): Option[A] // we will use `apply` for the syntax enrichment, for obvious reasons def unsafeGet(fa: F[A])(idx: Int): A = get(fa).get // TODO more efficient impl } // compatibility shims trait Seq[+A] { @deprecated def map[B](f: A => B): Seq[B] @deprecated def :+[B >: A](b: B): Seq[B] // etc } object Seq { implicit def lift[F[_]: Foldable: Functor, A](fa: F[A]): Seq[A] = new Seq[A] { // hey, if you want to do something like this, the efficiency is on your own head! def :+[B >: A](b: B): Seq[B] = lift(fa.foldLeft(Vector.empty[B]) { _ :+ _ } :+ b) // though we will take some small stab at efficiency here... (also there are dirty tricks we can play to make repeated +:/:+ on Seq faster) def +:[B >: A](b: B): Seq[B] = lift(b :: fa.foldRight(List.empty[B]) { _ :: _ }) // etc } } // we can define other shims analogously // arrays implicit class RichArray[A: TypeTag](val arr: Array[A]) extends AnyVal { // note: we can get REALLY cute here and do some extremely fun optimistic respecialization! // in other words, we can actually make this function absurdly more efficient than current impls, // even without compiler help! def map[B: TypeTag](f: A => B): Array[B] = ... } // we can define RichString in a similar way if we want to

Let's start at the top...

## Base Collection Types

`List` is a `List`. It's not `ListLike`. It's not `Iterable`. It's just a `List`. Now, for the first time, that "implement a `List`" example that you show in Scala training material can be "implement THE `List`"! That seems nice. Due to the insane niceness of `ListBuffer`, we could make `tail` a `var` and I wouldn't quibble too much.

`Vector` is a `Vector`. It is unconnected to `List` in any direct sense. Note that both `List` and `Vector` have their own `map` implementations. These implementations aren't overriding anything; nobody inherits `map`! You have to implement your own `map` if you want it, and in doing so, you can make it much more efficient. These implementations are significantly faster than the `map` implementations on these collection types in the existing library, and they even open up the possibility for further optimizations currently impossible in the library. For example, it's possible to fully specialize the tip `Array`(s) inside of `Vector` and then optimistically respecialize (without compiler help or type signature changes) the `map` function (and similar) to treat primitives specially, operating unboxed in `while` loops without loss of generality. How exciting would it be to have a functional, immutable `Vector` that beats `java.util.ArrayList` for primitive operations, but remains a general purpose collection? Pretty darn exciting.

I added extractors for `Vector`, because I've always wanted them and they make a heck of a lot more sense (and can be implemented much more efficiently) if they only work on `Vector`. Call it a wishlist.

`Map[A, B] = HashMap[A, B]`. One interesting thing here is that the `map` function on `HashMap` is different from `map` on `List` or `Vector`. For compatibility reasons, we could tuple this function without a problem, but the point is that we don't have to. Everyone has their own `map`, and functions like `zip` aren't even defined on `HashMap` (which is good, because I have no idea what that even means in the current library).

## Implicits and Reuse

Here's where we start getting into more interesting waters. While declaring a function that expects a `Seq` is not a useful thing to do, as a general rule, declaring a function that expects something for which there is a `map` operation is. This is where the typeclasses come into play. I named them `Functor`, `Foldable`, `Indexable`, and there would be a `Monad`, but I really don't care about the names. Call it `HasMap` or `Mappable` or something if it seems more user-friendly, and that's not a facetious suggestion! Usability is important.

This comes round to the goal of having abstraction defined along boundaries where it makes sense. `Seq` is not a sensible abstraction boundary, but `map`, `zip`, `foldLeft`, `sum`, and more are. We can also pick and choose more precisely which elements we need (do we need `zip`, or just `map`? do we really need `fold`?), as opposed to being stuck with the "all or nothing" that is inheritence.

This also gives us implementation reuse along reasonable lines. The `zip` implementation is trivially defined in terms of `zipWith` (which is how I defined `Applicative`, which you could call `Zippable` if you wanted). This sort of implementation reuse makes sense. Unless your collection happens to have some magically strange treatment of tuples, there would never really be a reason to have an implementation of `zip` that is distinct from the implementation of `zipWith`. So that's the litmus test. If the use-cases which falsify the sharing are really outlandish (but still achievable), then it's ok to share.

Critically, because we're using type classes and syntax enrichments (which I exemplify in my code), it is very easy to override these implementations and get the most-specific types, because the structure of the typeclass will force you to do so. We don't need `CanBuildFrom` or even a weaker variant of it to support sharing of implementation. We can rely on more traditional typeclasses.

## Views

@jsuereth's views proposal (implementing everything on top of `foldLeft`) is fantastic, though I would argue it would be better implemented on top of `fold` instead of `foldLeft`, to allow for more efficient traversals in some cases. His proposal also fits beautifully into my broader collections framework. In fact, a lot of his machinery would simply disappear by implementing `Foldable[View]`. Reductions are forcing, obviously, and their implementations are neatly reused since they're all defined in terms of `fold`!

On another sidebar, I consider `toList`, `toVector` and such to be reductions. You're just folding your `Foldable` (or `View`, as the case may be) into a specific collection type. I use `toList` and `toVector` quite a bit in practical code in order to specifically side-step `CanBuildFrom` witchcraft on `Seq` and `IndexedSeq`.

### Ranges

In my proposal, `Range` no longer exists, and instead we simply use `View`! Obviously it's possible to define `foreach` on `Foldable` (as indeed we should), and it would be an eager implementation (effectively, reducing to `Unit`). This completely sidesteps the whole "mapping a `Range` to `String`" problem, because `Range` itself is nothing special. There's no problem with a `View[String]` any more than there is a problem with `View[Int]`.

Here's the cloud to the silver lining though, and those who remember the whole `withFilter` debate are probably already thinking about it: `map` on `View` is lazy:

```// in my proposal, this prints 0, 1, 2, ..., 9
for (i <- 0 until 10) {
println(i)
}

// in my proposal, this... does nothing
for (i <- 0 until 10) yield {
println(i)
}```

Back in the day, this was a huge gotcha for Scala beginners, and I have very little reason to believe it wouldn't still be a huge gotcha. I honestly don't know what to do about this. Defining `type Range = View[Int]` is just incredibly appealing, and it makes everything much much simpler. One possible idea would be to have a special compiler warning that checks for "discarded" values of type `View`, which are likely to be errors of the sort we see above. Honestly, I'd like a compiler warning that checks for discarded values in general, but that's just me.

This would break backwards compatibility. There's no two ways around it. This is a common use-case, and my proposal breaks it (possibly with a warning, but still...) and recreates a situation which was widely considered a "beginner trap" back in the day. I'm still thinking about this problem, and I'd like to find a solution. For now though... #frownyface

## Shims

Here we get to the really insane part of the proposal: compatibility shims. See, the problem is that, as much as I hate `Seq` and I honestly and truly believe it is never the right thing to use, it does exist and has existed for a long time. People have written vast swaths of code which use it, and we can't just arbitrarily decide to break it.

My implementation of `Seq` in this proposal is available as an implicit conversion from any type which has the appopriate typeclasses defined (in this case, `Foldable` and `Functor`, though we may want something else). Within `Seq`, we can implement all of the existing signatures in terms of these typeclasses. The implementations will not be efficient, but that doesn't matter. Anyone who is using `Seq` to do real things doesn't know what their efficiency properties are now (I certainly don't), so I feel pretty justified in making this a slow path. Everything in `Seq` would be @deprecated.

We can play the same trick with `IndexedSeq`, `Iterable` and similar. In fact, with a bit of inheritence, we can reuse implementations between these shims (here, for a change, reuse by inheritence doesn't hurt us), so I don't expect too much code duplication here.

## `Array`, `String` and other

Ah, `Array`! Worked about 80% of the time in 2.7. Works 100% of the time in 2.8, but no one understands why, and it seems to have a very bipolar identity. This proposal has a third idea for how we can make it work.

Optimistic respecialization is something that people have been toying with a lot recently. Basically, the trick takes advantage of two things. First, the value of `classOf[Array[Int]]` is static and can be acquired at class load time without penalty. Second, the value of `classOf[Int => Int]` (and all similar functions) is also static, unique and can be acquired at class load time without penalty. Being able to acquire and use these values allows us to write generic type signatures which preserve parametricity (such as the `map` function on `RichArray`, which takes a `TypeTag` in my proposal but doesn't actually need it) and which recover the underlying type information. (remember, anything that is not a primitive array is just `Array[AnyRef]`)

This is really exciting, because it means that, for most of what we want to do inside of `RichArray`, we can implement generic code that operates directly on arrays in tight loops without boxing. This suddenly makes the entirety of our collections operations usable on `Array`, which is a type that is generally only used by people who really want efficiency.

There may be some weird corner cases currently possible with the "seq-like" `String` and `Array` variants in present-day Scala that we cannot make possible in this proposal. I don't see anything important though. Feedback welcome. The tl;dr though is that syntax enrichment (via implicit classes) is a far better way to make `String` and `Array` behave in nice ways than is magical implicit coercion with `CanBuildFrom` constraints, which is what we have today.

### m50d commented Oct 12, 2015

@tpolecat I'm not sure I see it. If my program is written in the subset of scala in which I never mutate an `Array` and never call `Array#equals`, then the functor law transformations are valid, no? (You might say it makes no sense to use `Array` in that case, but some library APIs require it).

### jrudolph commented Oct 12, 2015

(remember, anything that is not a primitive array is just Array[AnyRef])

This is not correct and a big reason for lots of complexity and bugs:

```scala> new Array[String](12).asInstanceOf[Array[Object]]
res1: Array[Object] = Array(null, null, null, null, null, null, null, null, null, null, null, null)

scala> res1(0) = java.lang.Integer.valueOf(12)
java.lang.ArrayStoreException: java.lang.Integer
... 32 elided

scala> res1.getClass
res7: Class[_ <: Array[Object]] = class [Ljava.lang.String;```

but also:

```scala> new Array[java.lang.Integer](12).asInstanceOf[Array[java.lang.Double]]
java.lang.ClassCastException: [Ljava.lang.Integer; cannot be cast to [Ljava.lang.Double;
... 32 elided```

Huh?

### jrudolph commented Oct 12, 2015

In other words, covariant arrays are not a Java feature but one of the JVM, i.e. the static type checking of the JVM is not strict enough to prevent runtime errors when storing into arrays (see here).

### rgwilton commented Oct 12, 2015

+1 on the "Set vs Set" comment.

I hate it whenever I get this issue, and then seem to have to spend half an hour adding/deleting random scala collection imports until the code compiles again.

But, I can appreciate that there are cases where supporting mutable collections are important for performance reasons.

I wonder whether anyone has considered anything like Rust's owned pointers to allow a mutable collection to be safely passed to an API with a guarantee that the caller (and depending on the type - the callee) won't modify the data structure over the body of the API call? But perhaps that would require JVM support?

Rob

### djspiewak commented Oct 12, 2015

Hmm, I think your interpretation of laws is too narrow. The reason laws are useful is because they provide additional avenues for equational reasoning. The functor identities are not a hand-wavy statement about equality, they describe syntactic transformations under which programs are invariant (and they are not, in the cases I mentioned above). If a transformation's applicability depends on the concrete type under consideration then the whole exercise is pointless.

Using the syntactic transformation definition of equality in the laws (which I believe is the most useful one), the functor laws hold for `Functor[Array]` provided that the functions you pass in are actually functions (i.e. aren't doing stupid things like capturing the source array in a closure and mutating it). But you can violate any law by using functions that are not functions…

In other words, covariant arrays are not a Java feature but one of the JVM, i.e. the static type checking of the JVM is not strict enough to prevent runtime errors when storing into arrays (see here).

In order to reach the error cases, you need to have a bug elsewhere in which you select (or more importantly, store) a type which doesn't match the input type. You're right though that the creation of the output array is probably an issue…

### tpolecat commented Oct 12, 2015

@m50d @djspiewak the problem with `Functor[Array]` is that it violates the identity law. If you take an array reference `a` and replace it with `a.map(identity)` or vice-versa, the functor law says that this must be a no-op. But doing this can change the behavior of your program because what you get back from `map` is a new reference, and reference equality is important for mutable structures (as opposed to immutable structures, where [ostensible] parametricity renders it unobservable). In general this is why scalaz has no instances for mutable types.

### non commented Oct 12, 2015

Quick comment -- I'm pretty sure your `Applicative` needs at least one of these:

```def pure[A](a: A): F[A]
def unit: F[Unit]```

Otherwise you wouldn't be able to derive `ap`, which is the method that powers `Applicative`:

`def ap[A, B](fa: F[A], ff: F[A => B]): F[B]`

Also you probably want to be careful calling the other method `zip`. To be consistent with monadic laws (which not all applicatives are but some should be) the method ends up looking more like a cartesian product and less like a pairwise zip.

### djspiewak commented Oct 12, 2015

the problem with Functor[Array] is that it violates the identity law. If you take an array reference a and replace it with a.map(identity) or vice-versa, the functor law says that this must be a no-op. But doing this can change the behavior of your program because what you get back from map is a new reference, and reference equality is important for mutable structures (as opposed to immutable structures, where [ostensible] parametricity renders it unobservable). In general this is why scalaz has no instances for mutable types.

The difference is only visible when mutating either array. If your usage is read-only, it is irrelevant. I tend to lean toward the view that you're right though (i.e. right that it's a problem). The problem is that we completely lose code reuse into the mutable hierarchy (which, perhaps, is definitional). That's not the end of the world, but it's a thing.

Also you probably want to be careful calling the other method zip. To be consistent with monadic laws (which not all applicatives are but some should be) the method ends up looking more like a cartesian product and less like a pairwise zip.

A good point, which @puffnfresh also stated. I'll revise.

### puffnfresh commented Oct 13, 2015

@djspiewak you can also get rid of `zipWith` and define it in terms of `zip`. In scalaz this is called `tuple2`. An alternative formulation could look like:

```class Apply[F[_]] extends Functor[F] {
def tuple[A, B](fa: F[A], fb: F[B]): F[(A, B)]
}```

### djspiewak commented Oct 13, 2015

you can also get rid of zipWith and define it in terms of zip. In scalaz this is called tuple2. An alternative formulation could look like

If there weren't a preexpectation of the semantics of `zip`, that might be interesting (though the overhead of tuple boxing is annoying). Given that `zip` is expected to be linear for sequences rather than Cartesian, it's probably saner to just use `apply2` as the basis, or similar.

### fanf commented Oct 15, 2015

@djspiewak for the view part, I really think that is should be the default on immutable collection, without - if possible - any call to an intermediary method.
I.E, I would be expecting that List(1,2,3).filter( _ % 2 == 0).map( _ + 1) to minimize the nunber of operation, and I would even prefer to have to write: List(1,2,3).filter( _ % 2 == 0).map( _ + 1).toList than List(1,2,3). stagedView.filter( _ % 2 == 0).map( _ + 1) (because in the last case, I will forget most of the time that the default behavior is not the one I always want).

### djspiewak commented Oct 15, 2015

I don't agree that lazy staged behavior should be the default. I do agree that it makes sense in a very large number of cases, far more than is generally believed right now. But when I use a function on a collection, I want to know what its performance characteristics are and I want to know where that time/memory is being spent. Furthermore, staged operations of that sort inhibit compositionality, since you can no longer have a `map` which is of type `List[A] => List[B]`. Instead, you get `List[A] => Staged[List, B]`, with a paired function `Staged[F, _] ~> F`, which is far less useful IMO.

Furthermore, you miss out on efficiencies which could otherwise be achieved. For example, `drop` on `List` is very efficient to implement in a direct fashion. If you stage the operation, you get the flexibility of fusing it with `map`, `filter` and other such operations to achieve potentially higher efficiency. However, if you stage that operation and it isn't fused with anything else (i.e. you really just wanted `drop` and nothing else), then the resulting efficiency of the `drop . toList` is enormously lower than simply evaluating `drop` directly.

So there are tradeoffs, and while staging and fusing is quite a bit more common than one would naively expect, it may or may not be the most common use-case. And even if it is, it's certainly not what people expect, and the tradeoffs it carries are severe and unintuitive.

### fanf commented Oct 16, 2015

@djspiewak sorry for the possibly dumb question, but for the missing on some efficiency part, is it an inherent limitation of the viewducer model? There is no way to make the toList be a noop?
And next question: how Paul achieve it in psp? Not with viewducer, I suppose, and if so, does his method is uncompatible with your proposition (again, sorry for what my be silly question/evident answer).

### m50d commented Oct 16, 2015

@tpolecat sure, if you mutate arrays. Equally the instance for `List` is invalid if you call `System.identityHashCode`. The question is then which subset of Scala we define the functor laws in relation to.

I'd argue:

1. Through accidents of history - partly Java's varargs but mostly the lack of immutable collections in the Java standard library - many JVM libraries accept or return `Array` even when there is no intent to mutate the array.
2. Many users find that the most effective approach when using such libraries is to assume that arrays are not mutated and/or that they're used linearly (rather than e.g. defensively copy). It is unfortunate that these assumptions are unchecked by the type system and they undoubtedly increase the defect rate for such code; nevertheless it remains so.
3. In code which is already making that assumption (i.e. will already be incorrect if that assumption is violated), the existence of a `Functor[Array]` instance is a strict improvement. I'm not against drawing a distinction (feature flag, different imports, or the like) between `Functor` instances which are valid only under the assumption that `Array`s are not mutated and `Functor` instances which are valid under weaker assumptions, but I do want the expressiveness advantages of `Functor[Array]` to at least be available to code which explicitly opts into it.
4. Use cases in which it is desirable both to call existing JVM libraries and to use abstractions like `Functor` are Scala's most important constituency, because they form the language's USP - use cases for a Scala-like language which don't involve calling existing JVM libraries are well served by Haskell, and use cases which don't involve immutability and/or `Functor`-like abstractions have a range of language options available on the JVM.

### mancvso commented Oct 31, 2015

I just wanted to note that, in a latter stage, the amount of implicits and traits should be reduced to avoid delivering even grater compile times.

Artima's article
http://www.artima.com/articles/compile_time.html

### yawaramin commented Dec 12, 2016

@mancvso note that implicit resolution times are about to drop significantly thanks to Miles Sabin's work on inductive resolution https://twitter.com/milessabin/status/779298833981435904