-
-
Save deusaquilus/51ae4a832760d6a6b0199f14b9ad8c71 to your computer and use it in GitHub Desktop.
// Sample report base-class: | |
trait Report { | |
type T | |
def apply(runtimeConf:MyConf)(encoder:spark.sql.Encoder[T]):Unit | |
} | |
// Sample implementation: | |
class OrdersReport { | |
type T = OrderRecord | |
def runReport(runtimeConf:MyConf)(encoder:spark.sql.Encoder[OrderRecord]):Unit | |
} | |
// Record type that the report uses | |
case class OrderRecord(...) |
object RunReports { | |
def main(args:Array[String]):Unit = { | |
// I want to do this: | |
val reports = List(new OrdersReport, new SalesReport) | |
reports.find(_.condition == something).get.runReport | |
// ... but it throws an exception (encoder cannot be found) | |
// So maybe write a wrapper like this: | |
class Wrapper[R <: Report](rpt:R)(implicit enc:Encoder[rpt.T]) { | |
def runDefer(runtimeConf:MyConf) = rpt.runReport(exchange) | |
} | |
// ... and do this: | |
val reports = List(new Wrapper(new OrdersReport), new Wrapper(new SalesReport)) | |
reports.find(_.condition == something).get.runDefer | |
//... but that doesn't work either (still 'encoder cannot be found') | |
// Maybe try the aux pattern: | |
type Aux[TE] = Report{type T = TE} | |
class Wrapper[TEA](aux:Aux[TEA])(implicit enc:Encoder[TEA]) { | |
def run(runtimeConf:MyConf) = aux.runReport(exchange) | |
} | |
val reports = List(new Wrapper(new OrdersReport), new Wrapper(new SalesReport)) | |
// ... still no luck! | |
} | |
} |
Why doesn't this work:
val wrapped = new Wrapper(new OrdersReport)
// error: enable to finder encoder for type stored in dataset
I'm not even using a list with that and it still says 'encoder not found'
It should work, if both of the following are true:
- you are using your
Aux
version (new OrdersReport).runReport(cfg)
works
Working with spark Encoders can be tricky. I believe the inference only works on proper stable identifiers. You are trying to bind an Encoder to the type Report#T
and spark hates this. If your T
were outside a class, it would probably unify. There are some compiler intricacies (not bugs) that result from the interaction between FP and OOP (Scala is not Functional it is Object-Functional). You are probably running into one of these corner cases.
Aaaaahrg! This is more like Type-Voodoo, then Type-math! Somehow this works:
new Wrapper(new OrdersReport)(newProductEncoder(typeTag[OrderRecord]))
... and yet this does not!
new Wrapper(new OrdersReport)
What mystic incantations have I missed!? What animal sacrifices have I not performed???
The type =
is not a stable =
. When you use a path dependent type, like Report#T
, the compiler cannot prove the path dependent type equal to the instance type. This is what I was eluding to in my previous comment. If the type for the Encoder
is path dependent, the compiler has trouble (cannot infer Report#T
) but when the type for the encoder is not path dependent, the compiler is ok (it can infer OrderRecord
).
Think of the type language as a dynamic language like javascript. Sometimes you can get into trouble since =
is not the mathematical =
it is a (mostly) reasonable approximation.
The same thing happens with shapeless poly
functions over HList
instances. The compiler can only unify the types if the poly
instance is in a stable identifier
. This is why all poly instances are written as
object MyPoly extends Poly1{...}
making it an object
makes it a stable identifier. If poly instances are put into path dependent identifiers (these are dependent upon the instance of the enclosing type, not stable) then the compiler cannot unify.
It may work if you say
object OrdersReport extends Report{
type T = OrderRecord
def runReport(runtimeConf:MyConf)(encoder:spark.sql.Encoder[OrderRecord]):Unit
}
since an object is a stable identifier.
Actually my production class Report
can't be an object because it takes input arguments! Nooooooo! Vortex of un-unifiability is sucking me in!
Report doesn't need to be stable, just its Instances!
How can Report instances be objects if they need to take constructor arguments? I.e. what I actually have is:
trait Report {
type T
def runReport(runtimeConf:MyConf)(encoder:spark.sql.Encoder[OrderRecord]):Unit
}
abstract class BaseReport(someArgs:SomeCaseClass) extends Report
class OrdersReport(someArgs:SomeCaseClass) extends BaseReport(someArgs)
// How do I make OrdersReport stable?
You are mixing the type world (at compile time) with the value world. the
List
type is destructive. When you add to aList
the compiler may give you less type information than it received.What you need is an isomorphic (at the type level) container type.
Tuple
is the standard,HList
is the shapeless style. What is happening is when you add to a List all uncommon type information is assumed to be eitherAny
orNothing
variance position depending. YourReport#T
type information is being lost.Of course, this is not trivial and will take some added plumbing to get right but, the result is a compile time verified list of reports.