Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save laskfla/b9cb1ced146778631c36a6532c5a8838 to your computer and use it in GitHub Desktop.
Save laskfla/b9cb1ced146778631c36a6532c5a8838 to your computer and use it in GitHub Desktop.
This is the learning note regading the "Functional Programming Principles in Scala"[Coursera ] by Martin Odersky.
########Queries with For#######
//the insight from Sir.M
For expression is very similar to the common operations of query language for database.
For has "if" for filter and nested form "for" expression is similar to the join in database.
(The collection returned could be then reduced/aggregated ,deduped/distinct , etc)
For example: Suppose that we have a database books, represented as a list of books .
case class Book(title : String, authors : List[String])
//the argument could be passed in "parameter name = value" form
val books : List[Book] = List(
Book(title = "Structure and Interpretation of Computer Programs",
author = List("Abelson , Harald","Sussman, Gerald J.")),
Book(title = "Introduction to Functional Programming",
author = List("Bird, Richard","Wadler, Phil")),
Book(title = "Effective Java",
author = List("Bloch, Joshua")),
Book(title = "Java puzzlers",
author = List("Bloch, Joshua","Gafter, Neal")),
Book(title = "Programming in Scala",
author = List("Odersky, Martin","Spoon, Lex","Venners, Bill"))
)
val x = for (book <- books ;author <- book.authors if author.startsWith("Bird") )
yield book.title //yield just follow the for expression, like the linux if command
val y = for( book <- books if book.title.indexOf("Program") >= 0 )
yield book.title
Find the names of authors who have written at least two books :
//equivalent to a group by case,
(for {
b1 <- books
b2 <- books
if b1.title < b2.title //otherwise may get duplicate data ,but it will not work for user with
//three books, so we need to use a distinct on List
a1 <- b1.authors //or to choose a Set for the books .
a2 <- b2.authors
if a1 == a2
} yield a1) distinct
########Translations of for loops ############
The syntax of for is closely related to the high-order functions : map, flatMap and filter.
First of all, all these functions can all be defined in terms of for :
def mapFunc[T,U](xs:List[T],f: T => U) :List[U] =
for (x <- xs) yield f(x)
def flatMap[T,U](xs:List[T],f: T => Iterable[U]) : List[U] =
for ( x <- xs; y <- f(x) ) yield y //nested for expression
def filter[T](xs : List[T], p : T => Boolean) : List[T] =
for (x <- xs if p(x)) yield x
In reality, it goes the other way. Scala compiler expresses the for-expression in terms of
map, flatMap and a lazy variant of filter.
//the filter in for expression is lazy .
> A simple for-expression is translated to map :
for (x <- e1) yield e2
e1 map ( x => e2)
> A nested for-expression :
for (x <- e1 ; y <- e2; s ) yield e3 is translated to
//s is sequence of generators and filters
e1 flatMap ( x => (for ( y <- e2; s ) yield e3))
> A for-expression
for (x <- e1 if f; s ) yield e2
where f is a filter and s is a (potentially) sequences of generators and filters is translated to :
for (x <- e1.withFilter(x => f ); s )
//withFilter is a variant of filter that does not produce an immediate list , but instead
//filters the following map and flatMap function application.
//then the translation continues and finally could be translated to case 2 .
Example :
for {
i <- 1 until n
j <- 1 until i
if (isPrime(x + y ))
} yield (x, y)
is translated to :
(1 until n ) flatMap( i =>
(1 until i ).withFilter( j => isPrime(x + y ))
.map ( i => (i, j)))
is exactly what we do in the first attempt.but the for loop is more concise and easy to understand.
--Generalization of for --
Interestingly, the translation of for is not limited to lists or sequences, or even collections.
It is based solely on the presence of the methods map, flatMap and witFilter.
This lets you use the for syntax for your own type as well --you must define map,
flatMap and withFilter for these types.
---for and databases---
As long as the client interface to the database defines the methods map, flatMap and withFilter, we can
use the for syntax for querying the database.
This is the basis of Scala data base connection frameworks ScalaQuery and Slick.
Similar ideas underly Miscrsoft's LINQ(Language Integrated Query).
Immutable collections enable the expression of algorithm in a very high
level and concise way that, in addition , has a high chance of being correct :).
Seq
/ \
List Array
Two important differences between Lists and Arrays in Scala :
> Lists are immutable, while Arrays are mutable.
> Lists are recursive, while Arrays are flat (elements are equally accessible and visible).
The List type:
> Lists are homogeneous : the elements of a list must all have the same type.
> A list is constructed by List() (the empty List or Nil) or U :: List[T] ("::" pronounced cons)
In Scala , operators ends with ":" are right associate,
So A :: B :: C == A :: ( B :: C )
Besides,operators ending in ":" are also different in the way they are seen as
method calls of the right-hand operand : the right operand is the target object (or receiver) .
So below expressions are equivalent.
val x = 1 :: 2 :: Nil
val x = Nil.::(2).::(1)
在FP中,为了referential transparency, object 都是 immutable的。
而且基于lamda演算的很多算法都是recursive的。
把data structure也定义成 recursive 的,和 algorithm isomorphic(同构),
这样的话算法的每一步的operation都可以是数据结构的部分对应起来,易于理解,操作简单。
参照"Why recursive data structure" 中对isomorphic的解释:
"Isomorphic means, fundamentally, “having the same shape.”
Obviously, a quadtree doesn’t look anything like the code
in rotateQuadTree or multirec. So how can a quadtree “look like”
an algorithm? The answer is that the quadtree’s data structure
looks very much like the way rotateQuadTree behaves at run time.
More precisely, the elements of the quadtree and the
relationships between them can be put into a
one-to-one correspondance with the call graph
of rotateQuadTree when acting on that quadtree."
更进一步,作者阐述,"If the interface between algorithm and DS is so complicated,
The answer can often be found by imagining a data structure that looks like
the algorithm’s basic form"
Take scala.collection.immutable.List as an example :
All operations on lists can be expressed in terms of the following
three operations :
> head the first elements of the list
> tail the list composed of all the elements except the first
> isEmpty "true" if the list is empty,"false" otherwise.
这里其实可以看到和 recursive algo 的对应关系:
> head + tail 是对list的的递归分解
> isEmpty 是base case和boundary
Perfect matched .
e.g. Insertion Sort :
def isort(xs :List[Int]) : List[Int] = {
xs match {
case List() => List()
case y :: ys => insertSort(y,isort(ys))
}
}
def insertSort(y:Int, ys:List[Int]) : List[Int] ={
ys match {
case List() => List(y)
case z :: zs => {
if(y<=z) y::ys
else z :: insertSort(y,zs)
}
}
}
#######More functions on List#####
Sublists and accessors:
val x = List(1,2,3,4,5,6)
x.last : 6 //the last element in a LIst, exception if List is empty
x.init : List(1,2,3,4,5) //A list consisting all elements except the last one
x take 2 : List(1,2)//Taking the first n elements
x drop 2 : List(3,4,5,6) //The rest of the collection after taking n elements
x(4) : 5 //the element indexed at 4
Creating new Lists :
xs ++ ys //concate lists
xs.reverse // lists in reversed order
xs updated (n,x) //same list as xs, except at index n where it contains x
ys ::: xs //prepend ys to xs ,same as xs.:::(ys)
Finding elemtns :
xs indexOf x //the index of the first element that equal to x
xs contains x //same as xs indexOf x >0
-----Implementations --------
Define them as external functions
//notice the scala match style
def last[T](xs:List[T]) : T = xs match {
case List() => throw new Error("empty.last")
case x :: Nil => x
//这里y :: ys case实际包含x::Nil 的,while ,for single element case ,
//while as pattern match in sequence, so single element match the x :: Nil.
case y :: ys => last(ys)
}
def init[T](xs:List[T]) : List[T] = xs match {
case List() => throw new Error("empty.init")
case List(x) => List()
case y1 :: ys => y1 :: init(ys)
}
// ++ depends on the elements of xs
def ++[T](xs: List[T],ys:List[T]) : List[T] = xs match {
case List() => ys
case z :: zs => z :: ++(zs, ys)
}
//no append in List, so have to use list concatenation
//notice the scala match style
//the complexity here would be quadratic.
def reverse[T](xs:List[T]) : List[T] = xs match {
case List() => List()
case y :: ys =>reverse(ys) ++ List(y)
}
def removeAt[T](n:Int, xs:List[T]) :List[T] = {
if(n >xs.length) xs else {
xs match {
case List() => List()
case y :: ys => if(n==0) ys else y :: removeAt(n-1,ys)
}
}
}
or use the take/drop :
def removeAt[T](n:Int,xs:List[T]):List[T] = (xs take n) ::: (xs drop n+1)
##########Tuples and Pairs############
######Paris and Tuples#####
Pair is a shorthand of Tuple2
//case class auto add apply/unapply , it can be seen as a simply way to combine the class and companion object .
//conventionally, there is single constructor in case class (that is the name "case" come from).
//You did could define multiple constructors,
//while,PM (pattern matching) won't work in that way.
case class Tuple2[+T1,+T2] (_1: T1,_2: T2) {
override def toString = "("+_1+","+_2+")"
def swap: Tuple2[T2,T1] = Tuple2(_2, _1)
}
There could be 3 elements, 4 elements , etc . used in pattern matching.
Pairs or Tuple make the pattern matching on more than 1 variables simpler .
Take the mergeSort as an example :
//类似方法的套路都是先定义一个接受input的函数,再定义problem solving中的private function。
With single pattern matching :
def mergeSort(xs:List[Int], ys:List[Int]) :List[Int] = {
xs match {
case List() => ys
case x :: x1 =>
{
ys match {
case List() =>List(x)
case y::y1 => {
if (x<y) x :: mergeSort(x1,ys)
else y :: mergeSort(xs,y1)
}
}
}
}
}
//pattern matching; whenever you see boilerplate code.
//there is a chance to make it simple and clean
With Pair or Tuple : clearly, it is much compact and clean.
def merge(xs:List[T],ys:List[T]) :List[T] ={
(xs,ys) match {
case (Nil,ys) => ys
case (xs,Nil) => xs
case (x::xl, y::yl) =>
if(x<y) x :: merge(xl,ys)
else {
y :: merge(xs,yl)
}
}
}
#####Implicit######
/*When use it, you should follow the convention and best practices.
each constructor has its own field and its sole purpose is to make
program simple or short .
implicit is to make mergeSort generalized without passing the
compare functions each time (copy the reference each time).
*/
def msort[T](xs:List[T])(lt:(T,T) => Boolean ) :List[T] = {
val n = xs.length/2
if( n == 0) xs
else {
def merge(xs:List[T],ys:List[T]) :List[T] ={
(xs,ys) match {
case (Nil,ys) => ys
case (xs,Nil) => xs
case (x::xl, y::yl) =>
if(lt(x,y)) x :: merge(xl,ys)
else {
y :: merge(xs,yl)
}
}
}
val (fst,snd) = xs splitAt n
merge(msort(fst)(lt),msort(snd)(lt))
}
}
/*
Above lt funtion used 4 times,but only the first 2 are useful.
if we have many calls to msort, then you have to copy it each time.
Scala has the Ordering library for common types.
By making this lt parameter implicit, you do not need to
fill in the lt every time and the compiler will search for the
proper values.
*/
def msort[T](xs:List[T])(implicit Ordering[T]) :List[T] = {
val n = xs.length/2
if( n == 0) xs
else {
def merge(xs:List[T],ys:List[T]) :List[T] ={
(xs,ys) match {
case (Nil,ys) => ys
case (xs,Nil) => xs
case (x::xl, y::yl) =>
if(lt(x,y)) x :: merge(xl,ys)
else {
y :: merge(xs,yl)
}
}
}
val (fst,snd) = xs splitAt n
merge(msort(fst),msort(snd))
}
}
val xs = List(1,-2,5,4,9,7)
msort(xs) //
---Rules for implicit paramters--
Say, a function takes an implicit parameter of type T .
The complier will search an implicit definition that :
> is marked implicit
> has as type compatible with type T
> is visible at the point of the function call,
or is defined in a companion object associated with T.
If there is a single(most specific) definition, it will be
taken as the actual argument for the implicit parameter.
e.g.
1. def f (implicit x:Int) {
}
implicit val y :Int =20
f //would be equivalent to f(y)
2. class C
object C {
implicit val x: C = new C
}
def f (implicit x:C )
f //is equivalent to f(x)
More details :
https://docs.scala-lang.org/tutorials/FAQ/finding-implicits.html
http://jsuereth.com/scala/2011/02/18/2011-implicits-without-tax.html
#######Recurring patterns on List we have seen so far ############
> transform each element and get another list.
> retrieving a list of all elements satisfying a criterion. (like take, drop)
> combining the elements of a list using an operator. (like sum)
In this session, we are going to abstract those patterns and write general functions for those patterns.
The first pattern could be generalized using map function.
abstract class List[T] {
def map[U](f:T => U) : List[U] = this match {
case Nil => this
case y :: ys => f(y) :: ys.map(f)
}
}
The second pattern :
def filter(f:T => Boolean ) : List[T] = this match {
case Nil => this
case y :: ys => if(f(y)) y :: ys.filter(f)
else ys.filter(f)
}
Other variations what extract sublists based on predicate:
xs filterNot p
xs partition p
xs takeWhile p
xs dropWhile p
xs span p //same as (xs takeWhile p,xs dropWhile p ) but
//computed in a single traversal of the list xs.
########Exercise1########
Write a function pack that packs consecutive duplicates
of list elements into sublists. For instance:
pack(List("a","a","a","b","c","c","a"))
should give
List(List("a","a","a"),List("b"),List("c","c","c"),List("a"))
//first attempt :
//think of the solution like iterative programming.
//iterative the list from first element and compare with the consecutive element,
//if match , then compose them together as a list , otherwise make the first element as new list.
def pack[String](xs:List[String]) : List[List[String]] = xs match {
case Nil => List()
case x :: x1 => x1 match {
case Nil => List(List(x))
//while when we see we have to check the x1 structure before making decisions , then we should know we
//have to do the same thing for ys. So this algorithm does not work.
//Actually, what you need to implement is the span function.
case y :: ys => if (x == y) List(List(x),List(y))++ pack(ys)
else List(List(x),List(y)) ++ pack(ys)
}
}
pack(List("a","a","a","b") //List(List("a","a"),List("a"),List("b"))
//second attempt :
//think of the solution with Functional programming:
//in recursion, 1) assume the pack function already exists and works well.
//2)in each step, we will only take care of the last element of the list .
//write the logic between the head element with the pack(tail)( thinking recursively, do not try to control the whole process flow!).
//recursion is the induction step in mathematics actually, perfectly matched. (base case + recursive call according to relation between the
//current problem and sub-problem. Also making progress to base base)
//List is processed actually from right to left, while imperative programing process list from left to right.
def pack[T](xs:List[T]) : List[List[T]] = xs match {
case Nil => List()
//this piece of code resemble the span function. While the library use a mutable ListBuffer[T] (could append element)
//for better performance.
case x :: x1 => if(pack(x1).isEmpty) List(List(x)) //calling pack(x1).Empty would make it return only after have traversed the last element.
else { pack(x1).head match {
case Nil => List(List(x))
case y :: ys => if (x == y ) List(x :: pack(x1).head) ++ pack(x1).tail
else List(x) :: pack(x1)
}
}
}
//Third attempt
//The problem here is to extract sub-lists from the list while the element equals to the expected value.
//We can still process the list from left to right and will be much concise and readable.
//So, after you get the point of recursion, we need to think higher and learn to use those library functions to make your code
//concise and faster.
def pack[T](xs:List[T]) : List[List[T]] = xs match {
case Nil => List()
case x :: ys =>
val (first , rest ) = xs span (y => y == x )
first :: pack(rest)
}
########Exercise2########
Using pack, write a encode function called run-time length encoding which is to encode consecutive duplicate data with a single element plus the duplicate
number.
//most work of encode has been done with pack, after that we need to iterate the list and for each list
//and map that to a pair of (T, Int). Obviously, we could use the map function.
def encode(xs:List[T]) : List[(T,Int)] =
//for empty list case of pack, map function would return empty as well, so it is good to go with below code
pack(xs).map (y => (y.head, y.length))
######Reduce & Fold combinators #######
The third recurring pattern is to combine the elements of list with an operator, that is called reduce or fold.
//as we defined in the before sessions for reduce :(this is actually the foldRight as it need a zero for empty)
def reduce[T,U](xs:List[T])(f:(T,U) => U )(zero:U) : U = xs match {
case Nil => zero
case x :: x1 => f(x, reduce(x1)(f)(zero))
}
reduceLeft //reduce from left hand of a list and throw error if called on empty List.
sum could be expressed as : def sum(xs:List[Int]) = xs reduceLeft(0::xs)(_ + _) //much compact and readable
product could be expressed as : def product(xs:List[Int]) = xs reduceLeft(1::xs)(_ * _) //much compact and readable
foldLeft //a more general form or reduceLeft which takes an accumulator , z , as an additional parameter, which is returned when
//foldLeft is called on an empty list. reduceLeft = tail.foldLeft(head) (f)
(List(e1,e2,...en) foldLeft z)(op) = ((...(z op e1)op)....) op en.
Similarly ,
foldRight : (List(e1,e2,...en) foldRight z)(op) = e1 op (op(....(en op z ))) //z is on right hand
reduceRight : (List(e1,e2,...en) foldRight z)(op) = e1 op (op(....(en-1 op en )))
abstract class List {
def foldLeft[U](z: U)(op : (U,T) => U) : U = this match {
case Nil => z
case x :: xs => xs.foldLeft(op(z,x))(op)
}
}
-------Difference between foldLeft vs foldRight-----------
For operations that are associative and commutative, foldLeft and foldRight are equivalent .
Sometimes, only one of the two operations is appropriate. like the concat operation, which is foldRight and can not be replaced with
foldLeft
def concat(xs:List[T],ys:List[T]) : List[T] = {
( xs foldRight ys ) ( _ :: _ ) //what a concise definition using combinator style !
//this can not be replaced by foldLeft as you will get type error (you are doing ys :: T and :: does not defined on T)
}
#######Reasoning#######
How do you check if a proram is correct ? Sometimes, the operations satisfy a few laws
which is often represented as equalities of terms.
Reasoning is the core claims of FP, namely , it is more amendable to reasoning about programs.
-----Referential transparency -------
(FP in Scala--Red book introduced this in the first chapter)
Note that a proof can freely apply reduction steps as equalities to some part of a term.
That works because pure PF does not have side effect; so that a term is equivalent to the term which it
reduces.
This principle is called referential transparency.
-----Structure Induction------
The principle of structure induction is analogous to natural induction :
To prove a property P(xs) for all lists xs ,
> Show that P(Nil) holds (base case).
> for a list xs and some elements x, show the induction step :
if P(xs) holds, then P( x :: xs) also holds.
Example :
concat is associative and that it admits the empty list Nil as neutral elements to the left and to the right and
the laws of concat :
(xs ++ ys ) ++ zs = xs ++ (ys ++ zs )
xs ++ Nil = xs
Nil ++ ys = ys
Q : How can we prove properties like these ?
A : By structural induction on Lists.
The implementation of concat is :
def concat[T] (xs :List [T], ys : List[T]) : List[T] = xs match
case Nil => ys
case x :: x1 => x :: concat(x1,ys)
}
It distill (extract) two defining clause of ++
Nil ++ ys = ys //first clause
(x :: x1) ++ ys = x :: (x1 ++ ys ) //the second clause.
So the base case : xs = Nil
left hand : (Nil ++ ys ) ++ zs = ys ++ zs //the first clause
right hand : Nil ++ (ys ++ zs ) = ys ++ zs
Induction step: x :: xs
left hand ( (x :: xs ) ++ ys ) ++ zs = ( x :: (xs ++ ys ) ) ++ zs //the second clause
= x :: ((xs ++ ys ) ++ zs )
= x :: (xs ++ (ys ++ zs))
= (x :: xs) ++ (ys ++ zs ) //the right hand
So the law of associative established.
---Exercise---
Show by induction on xs that xs ++ Nil = xs
base case : xs = Nil
xs ++ Nil = Nil ++ Nil
= Nil
= xs
induction step :
(x :: xs) ++ Nil
= x :: (xs ++ Nil)
= x :: xs
so the laws established.
#####Other sequences(immutable)########
Element access in List is linear to the element position in the list,access to the first element is much
faster than access to the middle or end of a list.
The Scala library also defines an alternative sequence implementation, Vector.
This one has more evenly balanced access patterns than List.
Essentially, it is represented as a shallow trees.
> elements up to 32 elements is essentially stored in an array.
> elements more than 32 make the tress span to 32 small sub-trees and each subtree has 32 elements.
the tree will span so on .
32 elements in root node, then 32 subtrees each has 32 elements , and then 32 * 32 subtrees with each has 32 elements.
So we get elements count as 32(2^5), 32*32(2^10) , 32*32*32(2^15),...
1)Element access time is log32(N),
2)and the bulk operator on vector run faster and in chunk of 32 array elements which may in single cache line.
e.g map, filter, etc
3)while if your algorithm matches well with the recursive data structure (get head + tail) of list, and size is not large.
list is a better option.
The support operations is the same as list, except the "::" (is used for pattern matching of list and is a constructor of new list)
instead of x :: xs. there is
x +: xs Create a new vector with leading element x, followed by all elements of xs.
xs +: x Create a new vector with trailing element x, preceded by all elements of xs.
(Note that the ":" always point to the sequence)
Arrays and Strings support the same operation as Seq and can implicitly be converted to sequences where needed.
(They can not be subclasses of Seq because they come from Java)
Iterable
/ | \
Seq Set Map
/ | \
(String|Array)List Vector Range
val xs = Array(1,2,3)
xs map ( x => x*2) filter (x >3)
----------Range----------
Range represents a sequence of evenly spaced integers.
Three operators :
to (include), until (exclusive), by (to determine step value)
val r : Range = 1 to 10 by 3 // 1 4 7 10
val s : Range = 1 to 5 //1,2,3,4,5
annoymous ranges : 1 to 10 by 3 , 6 to 1 by -2 , etc
Range a represents as single object with three fields : lower bound,
upper bound and step value .
-----Other operations on Seq ----
xs exists p //true if there is elements in xs that make p holds
xs forall p //true if p(x) holds for all elements of xs.
xs zip ys // a sequence of pairs drawn from corresponding elements of sequences of xs and ys .
e.g. List(1,2,3) zip "abcd" // List( (1,'a') ,(2,'b'),(3,'c'))
xs flatMap f //applied a collection -valued function f to all elements of xs and concatenates the results.
//flatMap is the combination of map + flatten (fold operator)
xs.sum //the sum of elements for numeric collection
xs.product //the product ....
xs.max/min //the max or min in the collection, the elements much be comparable .
---Pattern matching for pairs ---
Generally , the function value
{ case p1 => e1 , ..pn => en }
is equivalent to :
x => x match { case p1 =>e1 ...pn => en }
But it is much shorter and readable .
###Exercise###
define a isPrime to test if n is a prime number (do not consider the performance)
def isPrime(n:Int) = (2 until n) forall ( x => n % x != 0 )
for the number 1 to n, for each value x, find the prime numbers of the combinations of
the number 1 to x + x.
def findPrime(n:Int) : Seq[(Int,Int)] = {
((1 to 10) flatMap (x => (1 to x) map (y => (y, x)))) filter { case (m, n) => isPrime(m + n) }
}
######6.2 For expressions ####
High order functions such as map, flatMap or filter provide powerful constructs for manipulating lists.
But sometimes the level of abstraction required by these functions make the program difficult to understand.
In this case, Scala's for expression notation can help.
Example :
case class Person(name:String, age:Int )
To obtain the names of persons over 20 years old, you can write :
for (p <- persons if p.age >20) yield p.name //readable and easy to understand
is equivalent to :
persons filter (p => p.age >20) map ( p => p.name)
The for expression is similar to loops in imperative languages, except that it builds(yield) a list of the results of all the iterations.
Syntax of for :
for (s) yield e //or f { s } yield s , then s could be in multiple lines without requiring semicolons.
where s is a sequence of generators and filters and e is an expression whose value is returned by an iteration.
> A generator is of the form p <- e, where p is a pattern and e an expression whose value is a collection.
> A filter is of the form if f where f is a boolean expression for each iteration value.
> The sequence must start with a generator.
> If there are several generators in the sequence, the last generators vary faster than the first.
Let us rewrite the findPrime with for expression:
for {
i <- (1 until n )
j <- (2 until i )
if isPrime( x + y )
} yield (x , y )
#####Exercise #######
Write a version of scalarProduct that makes use of a for :
def scalarProduct(xs :List[Int],ys: List[Int]) : Int = {
for {
( x ,y ) <- xs zip ys
} yield x * y
}.sum
#######Set collection ########
A set is written analogously to a sequence :
val fruit = Set("apple","banana","pear")
val s = ( 1 to 6 ).toSet
The difference between Seq and Set
1. Sets are unordered
2. Sets do not have duplicate elements.
3. The fundamental operation on sets in contains :
s contains 5 //true
---The N-Queens problem---
The eight queens problem is to place eight queens on a chessboard so that no queens is threatened by another.
In other words, there can't be two queens in the same row, column or diagonal.
Think it recursively as well !
Once we have placed the k -1 queens , one must place the kth queen in a column where it is not "in check"
with any other queen on the board.
Take the 4-queens as an example :
o x o o
o o o x
x o o o
o o x o
Here is the algorithm with recursion.
Remember the rule of recursive algorithm :
1. pretend this algorithm named algo already exists and works.
2. provide base case
3. write the process logic for k given the algo(k-1) works and the return results.
For the n-queens problem:
//Martini Odersky 这老师讲解问题真是很简练,这些都行都要靠自己揣摩出来。
/*assume the previous k-1 queens has been put properly in the check board and
the element in the returned list mark the row number. and the column number
started from 0 to list.length - 1.
the key algo for isSafe is that :
1)none of the two places has the same row number (as we enumerated from col index,
so col index is always different)
2)for diagonal , think of the line of the two elements that in the diagonal position.
the characteristic is that the math.abs(cola -colb ) = match.abs(rowa - rowb).
some key points about the program:
1. since in list, the "::" (prepend) operation is constant time
and usually preferred, then in the isSafe part,
when we add a column index for the queenList, the col number should be
indexed from list.length-1 to 0 by -1 (the 3rd column's sol is place in the
head of the list)
2. when the place is iterated, it always started from 0 to the constant number n (the
n-queens). so in that case, we need to remember the initial parameter.
that is why we have 2 parameters in the queen function.
A better approach is to define another inner placeQueen recursive function that
only has one parameter and it could always access the n value.
3. a variant of the solution is to put the solution in the end of the list ,
using the ":+" method of a list .
it may looks like more natural then the first one, but may cost more resources.
*/
def queens(n:Int ,k:Int) : Set[List[Int]] = {
if(n==0) Set(List())
else {
for {
queenList <- queens(n-1,k)//queens(n-1) is not a sub-problem of queens(n)
place <- (0 until k) //not 0 to n , but 0 to k, the original parameter
if (isSafe(queenList,place))
}yield place :: queenList //queenList :+ place
}
}
def isSafe(queenList:List[Int],rowplace:Int):Boolean = {
val listLen = queenList.length
val pairs = (listLen-1 to 0 by -1) zip queenList
//println(pairs +":"+rowplace)
pairs forall {
case (col, row) => row != rowplace &&
math.abs(listLen-col) != math.abs(rowplace-row)
}
}
def queensTail(n:Int ,k:Int) : Set[List[Int]] = {
if(n==0) Set(List())
else {
for {
queenList <- queensTail(n-1,k)//queens(n-1) is not a sub-problem of queens(n)
place <- (0 until k) //not 0 to n , but 0 to k, the original parameter
if (isSafeTail(queenList,place))
}yield queenList :+ place
}
}
def isSafeTail(queenList:List[Int],rowplace:Int):Boolean = {
val listLen = queenList.length
val pairs = (0 until listLen) zip queenList
//println(pairs +":"+rowplace)
pairs forall {
case (col, row) => row != rowplace &&
math.abs(listLen-col) != math.abs(rowplace-row)
}
}
}
---to show the output, we could use below function---
//这里也有小坑,学习M大的讲解知识,不实践跟没学一样。。。
//first attempt :
def show (set :Set[List[Int]]): Unit = {
val x = for {
list <- set
ele <- list.reverse //notice the index is reversed
} yield Vector.fill(list.length)("*").updated(ele,"X").mkString(" ")
println(x.mkString("\n"))
}
//这样做有两个问题: 1) 返回的x是Set[String],而Set会去重,所以最后只会输出一个,debug了半天。。。
//Second attempt: convert Set to List , will it would be fine ?
def show (set :List[List[Int]]): Unit = {
val x = for {
list <- set
ele <- list.reverse //notice the index is reversed
} yield Vector.fill(list.length)("*").updated(ele,"X").mkString(" ")
println(x.mkString("\n"))
}
//这样输出有个小问题是,各个解都会连在一起,所以想分开各个解,但是由于这里返回X,已经不能区分各个解了。
//这样在输出的时候就得每隔n个行就得再输出一个换行,显然这个方式有点complex了。
//看看M老师的方法: 就是综合上面的解存在的问题: 1) first attempt 的问题,所以使用map 2)second attempt的问题map也可以解决
def show (list : List[Int]) :Unit = {
val x =for (x <- list) yield Vector.fill(list.length)("*").updated(ele,"X").mkString(" ")
println(x)
println("\n")
}
queens(4) map show
#####Map#####
A map of type Map[Key,Value] is a data structure that associates keys of type Key with values of type Value.
Examples :
val romanNumerals = Map("I" ->1 , "V" ->5 , "X" -> 10)
val capitalOfCountry = Map("US" -> "WashingTon", "Switzerland" ->"Bern")
Map extends Iterable[(Key,Value)] , so maps support same collection operations as other iterables do:
val countryOfCapital = capitalOfCountry map {
case (x, y) => (y, x)
}
In fact, the syntax key -> value is just an alternative way to write the pair (key, value).
Map is also a function , it extends the function type : Key => Value, so maps can be used
everywhere functions can.
In particular , maps can be applied to key arugments :
capitalOfCountry("US")
def testMap(f: String => String): String = {
f("US")
}
val k = testMap(capitalOfCountry)
Map will throw "No such element" exception if value not found for the key argument.
Map provide a better ways for serarch. The get method would return an Option[Value] and then
could be evaluated or tested if it has value or empty.
capitalOfCountry get ("US") //Option[String] = Some("WashingTon")
val k = capitalOfCountry get ("UK") //Option[String] = None
k.isEmpty //true
//Scala recommend using pattern matching instead of the if else.
Since Options are defined as case classes, they can ben decomposed using pattern matching:
def showCaptial(country: String) = capitalOfCountry.get(country) match {
case Some(capical) => capital
case None => "missing data "
}
----Sorted and GroupBy-----
Two useful operation of SQL queries in addition to for-expressions are groupBy and orderBy
orderBy on a collection can be expressed by sortWith and sorted.
val fruit = List("apple","pear","orange","pineapple")
fruit.sortWith(_.length < _.length)
fruit.sorted //natural sort defind on String
groupBy is available on Scala collections.
It partitions a collection into a map of collections according to a
discriminator function f (the "key" generator of a map).
fruit.groupBy(_.length) //immutable.Map[Int,List[String]] = Map(5 -> List(apple), 4 -> List(pear), 9 -> List(pineapple), 6 -> List(orange))
---Example--
Polynomials could be expressed with a map : x^2 -2x +5 could be expressed as Map(0 -> 5, 1 -> 2,2 -> 1).
class Poly(val terms: Map[Int,Int]) {
def + (that:Poly) : Poly = ???
}
}
The key function for this "+" method of Poly is the "combineMaps" :
//this method is a kind of verbose ,
def combineMaps(xs:Map[Int,Int],ys :Map[Int,Int]) : Map[Int,Int] = {
val allkeys = xs.keys ++ ys.keys
for (key <- allkeys ) yield {
val xkey = xs.get(key)
val ykey = ys.get(key)
(xkey , ykey ) match {
case (Some(xk), Some(yk)) => (key,xk + yk)
case (Some(xk), None) => (key,xk )
case (None, Some(yk)) => (key,yk )
}
}
}.toMap
//this version looks better but it has mutable variables and this may not be recommended
def combineMaps(xs:Map[Int,Int],ys :Map[Int,Int]) : Map[Int,Int] = {
var rmap :Map[Int,Int] = ys;
val newMap = for (ele <- xs ) {
val y = ys.get(ele._1)
y match {
case None => rmap = rmap.+(ele)
case Some(yvalue) =>rmap = rmap.updated(ele._1,ele._2 + yvalue)
}
}
rmap
}
//the Mr.M's version : the "++" could concatenate 2 maps , but it replace the same entry in the first
//map if key matched. That is a slightly unexpected as we need to sum the coefficient:
//better than the above 2 version: simple and concise (every step only needed as necessary ,no more)
class Poly(val terms: Map[Int,Int]) {
def + (that : Poly ) = new Poly (terms ++ (other map adjust) ) //adjust the value of second map if same key found in first map
def adjust(term :(Int,Int)) : (Int,Int) = {
val (exp ,coeff) = term
//below is to test whether terms contain the exp key
terms get exp match {
case Some(coeff1) => exp -> coeff + coeff1
case None => exp -> coeff
}
}
}
//the adjust could be simplified again by using the withDefaultValue.
class Poly(val terms0: Map[Int,Int]) {
val terms = terms0 withDefaultValue 0
def adjust(term :(Int,Int)) : (Int,Int) = {
val (exp ,coeff) = term
exp -> coff1 + terms(exp)
}
---Default value --
So far, maps are partial functions (only effective to some input). Applying a map to a key value in map(key)
could lead to an exception , if the key was not stored in the map.
There is an operation withDefaultValue that turns a map into a total function:
val cap1 = capitalOfCountry withDefaultValue "<unkonwn>"
cap1("Andora") //"<unknonw>"
---variable argument---
the constructor of Polynomials looks a little longer and we have to create an intermedidate map structure.
val mapx = new Poly(Map(1->2,2->-2,4->6))
while as Scala also support the varargs feature , so Poly could have a constructor accepting variable argument list
of pairs (a sequence of pairs) :
def this(bindings : (Int,Int) *) = {
this (bindings.toMap)
}
then we could new Poly by : val mapx = new Poly(1->3,4->5)
####Excersie#######
1. the "+" operation on Poly used map concatenation with "++". Design
another version of "+" in terms of foldLeft :
def + (other:Poly) ={
new Poly((other.terms foldLeft ???))(addTerm)
}
def addTerm(terms: Map[Int,Int], term : (Int,Int)) = ???
//never think of a foldLeft solution as i thought it may only work on a single value
//it did. but the inital value could be map and the output could be a map then .
//a single value does not mean a "Int" or a "Double", it shold be an object !
[Keith]:
def + (other:Poly) ={
new Poly((other.terms foldLeft this.terms.withDefaultValue(0) ))(addTerm)
}
def addTerm(xs:Map[Int,Int],ele : (Int,Int)) : Map[Int,Int] =
val (key,value) = ele
xs.+(key -> value + xs.get(ele))
2.which of the two version do you think is more efficient?
Obviously, the above one is more efficient as it traverse the maps only once .
/*##Convert telephone numbers to sentences###*/
//Task :
//Phone keys have mnemonics assigned to them.
val mnemonics = Map (
'2' ->"ABC",'3' ->'DEF','4'->"GHI",'5'->"JKL",'6' ->"MNO",
'7' -> "PQRS",'8' ->"TUV",'9'->"WXYZ")
val sentences = Set("Scala is fun","We are here","See you later")
/*
Assume you are given a dictionary words as a list of words.
Design a method such that :
translate(phoneNumber)
produces all phrases of words that can serve as mnemonics for the phone number.
Example : The digit number : 7225247386 should have the mnemonic, "Scala is fun" as one element of
the set solution phrases.
Why ? because number 7 has "S" associated and number "2" has both the "a" and "c" associated and so on .
***Background***
This example was taken from:
Lutz Prechelt : An Empirical Comparison of Seven Programming languages. IEEE(2000).
Tested with Tcl, Python, Perl, Rexx, Java, C++, C
Code size medians :
> 100 loc for scripting languages
> 200-300 loc for the others.
Let us see if we could resolve this task in Scala in a more concise way.
*/
/*--first attempt after 3 hours thinking and coding ---
My recursive version solution :
Choose recursive version as the prototype as it is easier to code and shorter.
Since the digits may not be too long , so the recursive version is actually works well.
Then we need to consider below for recursion :
> What is input and output type ?
As we assume there is already a solution return Set[String] for a input digit String,
and we only need to care about the current case .
The input type is String and output type is Set[String].
But actually as it need to record at which position we should check the matched sentences
for the current digit. So we need a Map[String,Int] to record the expected match position.
> What is the base case ?
There is only single digit in the string and we need to match with all the sentences.
*/
val zeros = List.fill(sentences.size)(1)
val sentencesMap : Map[String,Int] = (sentences zip zeros).toMap
def translate(phoneNumber: String) : Set[String] = {
val phoneLen = phoneNumber.length
//filter out spaces(punctuations actually)
val candidates = sentencesMap.filter {case (x,y) =>x.filter(_ !=' ').length == phoneLen }
def trans(digits : String) : Map[String,Int] = {
if (digits.length == 1) {
var baseMap:Map[String,Int] = Map()
for (x <- mnemonics(digits.head) ) {
baseMap = baseMap ++ candidates.filter(_._1.last.toUpper == x)
// println("x in base case is :" + x)
// println("baseMap is " + baseMap)
}
baseMap
} else {
val xs : Map[String,Int] = trans(digits.tail)
var itMap : Map[String,Int] = Map()
for( echar <- mnemonics(digits.head)) {
itMap = itMap ++ xs.filter { case (x,y) => {
// println("it case :" + echar)
// println("itMap : " + itMap)
val str = x.filter(_ != ' ').toUpperCase
str.charAt(str.length -1 - y) == echar //y is index from right side
}
}.map { case (m,n) =>(m,n+1) }
}
itMap
}
}
trans(phoneNumber).keys.toSet
}
//Second attempt (iterative optimization ^-^)
/*the first attempt is not good when it checks if one of the mnemoic character is matched with
//the sentences as a specific position as its complexity is m x n where m is the mnemonics length
//and n is the Map size .
//can we avoid this just as did in Week 6.6 ?
Actually , for such operations, it could be replaced by a fold operation.
Loc dropped from 27 to 23 but it looks better than the first one.
And it did not use any variable.
A few learnings :
> Map.get return a Option and Option has contains method and some other collection method as well,
you may wrongly use contains method as if on the element of the Option , like the String case
So you could use Map function instead and with a default value if necessary.
*/
val zeros = List.fill(sentences.size)(0)
val sentencesMap: Map[String, Int] = (sentences zip zeros).toMap
def translate(phoneNumber: String): Set[String] = {
val candidates = sentencesMap.filter { case (x, y) => x.filter(_ != ' ').length == phoneNumber.length }
def mnemonicsCheck(ch: Char, str: String, postIndex: Int): Boolean = {
mnemonics(ch).contains(str.charAt(str.length - 1 - postIndex).toUpper)
}
def addItem(xm: Map[String, Int], ele: (String, Int), digitNum: Char): Map[String, Int] = {
ele match {
case (k, v) if mnemonicsCheck(digitNum, k.filter(_ != ' '), v) => xm + (k -> (v + 1))
case _ => xm
}
}
def trans(digits: String): Map[String, Int] = {
if (digits.length == 1) {
candidates.foldLeft(Map[String, Int]())((map, entry) => addItem(map, entry, digits.head))
} else {
val xs: Map[String, Int] = trans(digits.tail)
xs.foldLeft(Map[String, Int]())((map, entry) => addItem(map, entry, digits.head))
}
}
trans(phoneNumber).keys.toSet
}
println( translate("7225247386"))
println( translate("932734373") )
//Professor M's version
/* let us think the other side : when we map a string to sentences or words,
how about we map the sentences or words to a digit string and then compare the 2 strings.
if match, then we get one mnemonic for the digit string .
Besides that , save the digitnum to each words in a Map instead of re-calculating each time
is more efficient than my version :). //a programmer's insight for the problem.
Leaned from the code :
> "str.toUpperCase map charCode" is more efficient and shorter than
"str map (x =>charCode(x.toUpper))"
> while as we check the whole dictionary for each digit sequence (or each query) ,
we could save the result of mapping from words to digit numbers and then
we could simply check the dictionary in a constant time.
> Interface oriented(more general) instead of Object oriented programming methodology .
So the wordsForNum is defined as Map[String,Seq[String]] instead of Map[String,List[String]]
> The most import part is that there is no recursion at all.
*/
/*But actually I did not understand the problem here.
The dictionary is for word and then we may match any substring of the input digit number
and then we got a composed "phrases" or "sentences".
But even for the simplified version as above, Professor M's solution is better .
So the real solution for this is as below
*/
//Scala worksheet version
object Test {
val mnemonics = Map(
'2' -> "ABC",
'3' -> "DEF",
'4' -> "GHI",
'5' -> "JKL",
'6' -> "MNO",
'7' -> "PQRS",
'8' -> "TUV",
'9' -> "WXYZ")
val in = Source.fromURL("https://lamp.epfl.ch/files/content/sites/lamp/files/teaching/progfun/linuxwords.txt")
val words = in.getLines.toList.filter( str => str forall (_.isLetter) )
//Invert the mnemonics to give a map from chars 'A' -'Z' to '2'-'9'
val charCode:Map[Char,Char] =
for {
ele <- mnemonics
ch <- ele._2
} yield (ch,ele._1)
//Maps a word to the digit string it can represents, e.g "Java" ->"5282"
def wordCode(word:String ) : String = word.toUpperCase map charCode
val wordsForNum : Map [String, Seq[String]] = {
words groupBy wordCode withDefaultValue(Seq())
}
def encode(number:String) : Set[List[String]] = {
if(number.isEmpty) Set(List())
else
(for {
split <- 1 to number.length
word <- wordsForNum(number take split )
rest <- encode(number drop split )
} yield word :: rest
).toSet
}
//format not good
def translate(number:String) : Set[String] ={
val words = encode(number)
words map ( _ mkString(" "))
}
translate("7225247386")
}
/*continue the program proving on Trees, a more general data structure
FP is important as it is very close to the mathematical theories of data structures.
####Structural Induction on Trees ######
Structural induction is not limited to lists; it applies to any tree structure.
The general induction principle is the following :
To prove a property P(t) for all tress of a certain type,
> show that P(L) holds for all leaves L for a tree type .
> for each type of internal node t with subtrees s1, ...sn , show that
P(s1) /\ ... /\ P(sn) implies P(t) , i.e under the assumption that P(s1) to P(sn) satisfy
the predicate then P(t) holds.
//then we could induce the p(t) holds at its own(the element it holds) and also
//on all subtrees of t .
###Example: IntSets######
Show some interesting insights about the IntSet.
Recall our definition of IntSet with the operations "contains" and "incl":
*/
abstract class IntSet {
def incl(x:Int) : IntSet
def contains (x:Int) : Boolean
}
object Empty extends IntSet {
def incl ( x:Int) : IntSet = NonEmpty(x, Empty, Empty)
def contains(x :Int) :Boolean = false
}
class NonEmpty(node :Int, leftsub :IntSet, rightsub: IntSet) extends IntSet {
def incl(x:Int) : IntSet = {
if (x >node) NonEmpty(node, leftsub, rightsub incl x ) //Immutable!
else if ( x <node ) NonEmpty(node, leftsub incl x , rightsub)
else this
}
def contains(x:Int) : IntSet = {
if (x >node) rightsub contains x
else if (x <node) leftsub contains x
else true
}
}
/* Now let us approve the IntSet implementations are correct .
The Laws of IntSet
What does it mean to prove the correctness of this implementation ?
One way to define and show the correctness of an implementation consists of
proving the laws that it respects, i.e define some laws that our implementation should
satisfy and then show that the implementation indeed does that.
In the case of IntSet, we have the following three laws :
For any set s , and elements x and y :
Empty contains x = false
(s incl x) contains x = true
(s incl x) contains y = s contains y if x != y
In fact, these 3 laws define the characteristics of IntSet completely (Keith: no idea about this ...)
###Proving the laws of IntSet (L)#########
How can we prove these laws ?
--Proposition 1 : Empty contains x = false.
Proof : According to the definition of contains in Empty .
--Proposition 2 : (s incl x ) contains x = true
Proof by structural induction on s.
Base case : Empty
(Empty incl x ) contains x
= NonEmpty(x, Empty , Empty) contains x
= true // by definition of NonEmpty.contains
Induction step :
Assume the leftsub and rightsub of tree t satisfy "(s incl x ) contains x = true "
then we could prove that :
(s incl x) contains x ,
The s here could be a general NonEmpty IntSet as NonEmpty(z, l, r ).
There are 3 case here :
1) the z == x
(NonEmpty (x, l, r) incl x ) contains x
= NonEmpty(x, l, r) //by definition of NonEmpty.incl
= true // by definition of NonEmpty.contains
2) the x >z
(NonEmpty (z, l, r ) incl x ) contains x
= NonEmpty(node, l, r incl x ) contains x // by definition of NonEmpty.incl
= (r incl x ) contains x // by definition of NonEmpty.contains
= x //by the assumption
3) ditto for the third case where x < z
--Proposition 3 : (s incl x) contains y = s contains y if x != y
Base case : Empty
(Empty incl x ) contains y
= NonEmpty(x, Empty , Empty) contains y
according to the proposition 2 , we have that the it is always false
on the right side :
Empty contains y is always false as well, the proposition holds for Base Case
Induction step :
s is defined as a general NonEmpty(z, l, r ) and assume it hold for l and r subtrees.
There are 2 cases here for the contains operation as z != x
1) x >z
(s incl x ) contains y
= (NonEmpty(z, l, r ) incl x ) contains y
= NonEmpty(z, l, r incl x ) contains y
for any of the three cases of y (y == z, y>z, y <z)
we have
NonEmpty(z, l, r incl x ) contains y
if y == z, then it is true
if y < z, then it is l contains y
if y >z , then it is (r incl x ) contains y = r contains y //the assumption
( or we could see this is equivalent to NonEmpty(z, l, r) contains x.
so the prove process could be reduce both side to a common point or
reduce one side and then goes up to the right one to match )
On the right side, for the 3 cases of y
we have
NonEmpty(z, l, r) contains y
if y == z, then it is true
if y <z , it is l contains y
if y >z , it is r contains y
so the left and right side exactly matched for any case of y in the case of x >z
2) x <z
Ditto
So we conclude that (s incl x ) contains y = s contains y if x != y.
*/
//#########Exercise ########
Suppose we add a function union to IntSet :
abstract class IntSet {
def union (other : IntSet) : IntSet
}
object Empty extends IntSet {
def union (other:IntSet) = other
}
class NonEmpty (x :Int, l :IntSet, r : IntSet) extends IntSet {
def union (other:IntSet) :IntSet =(l union (r union (other))) incl x
}
To prove the correctness of union , it can be translated into the following law :
Proposition 4 :
(xs union ys ) contains x = xs contains x || ys contains x //xs contains or ys contains x
Show proposition 4 by using structural induction on xs.
[Keith]: To do
/*##Delayed evaluation####
((100 to 1000) filter isPrime)(1) //take the second prime number between 100 and 1000.
it will calculate the full prime list between 100 and 1000, while we only use the first 2 of them.
which is much bad in performance.
However, we can make the short-code efficient y using a trick :
Avoid computing the tail of a sequence until it is needed for the
evaluation result (which might be never)
The idea is implemented in a new class, the Stream.
Streams are similar to lists, but their tail is evaluated only on demand.
#####Defining Stream ####3
Streams are defined from a constant Stream.empty and a constructor Stream.cons.
*/
//For instance.
val x = Stream.cons (1, Stream.cons(2,Stream.empty))
//They can also be defined like the other collections by using the object Stream as a factory
Stream(1,2,3)
//The toStream method on a collection will turn the collection into a Stream:
(1 to 1000).toStream // Stream(1,?) , the Stream structure is similar to List, but the tail is not evaluated.
//#####Stream ranges#######
Let us try to write a function that return (lo until hi).toStream directly :
def streamRange(lo:Int, hi:Int) : Stream[Int] =
if (lo >= hi) Stream.empty
else Stream.cons(lo,streamRange(lo+1,hi))
def listRange(lo:Int,hi:Int) : List[Int] =
if (lo>=hi) Nil
else lo :: listRange(lo+1,hi)
It will not create the whole Stream range as the listRange may do, and it will create a structure like this :
----
/ \
1 ? //the head is 1 and the tail part is not evaluated yet .
//####Methods on Stream#####
It basically follow the same operator as List does , except for the "cons" operator .
x :: xs always produces a list, never a stream.
There is however an alternative operator #:: which produces a stream.
x #:: xs == Stream.cons(x, xs )
#:: can be used in expressions as well as patterns.
//#####Implementation of Stream######
The implementation is quite similar to List.
Concrete implementation of streams are defined in the Stream companion object. Here is the
first draft.
object Stream {
//cons return a Stream[T] object with the tl object encapsulated in it .
//tl or tail must not be evaluated in the cons body, otherwise it would not a "lazy"
def cons[T](hd:T,tl: => Stream[T]) = new Stream[T] {
def isEmpty = false
def head = hd //even using "def",while the value is evaluated in initialization.
def tail = tl //notice the "def" keyword, not "val" .
//if "val",even tl is pass by name, then it would be evaluated when assigned.
}
val empty = new Stream[Nothing] {
def isEmpty = true
def head = throw new NoSuchElementException("empty.head")
def tail = throw new NoSuchElementException("empty.tail")
}
}
The first time the tl is dereferenced when called on tail method.
//#####Lazy evaluation#######
Roughly Laziness means do things as late as possible and never to it twice.
Stream is good, but if we evaluate the tail multiple times, the correponding
stream will be recomputed each time .
This problem can be avoided by storing the result of the first evaluation of tail and
re-using the stored result instead of recomputing tail.
This optimization is sound, since in a purely FP, and expression produces the same result
each time it is evaluated. (while if there is side effect, or a not purely FP, then
the result may be unexpected)
We call this schema "Lazy evaluation"(as opposed to by-name evaluation in the case where
everything is recomputed and strict evaluation for normal parameters and "val" definitions.)
//###Lazy evaluation in Scala###
Lazy evaluation is so attractive that Haskell is a FP that use lazy evaluation by default.
Why Scala does not do that ?
while there are two problems about the lazy evaluation, it is quite unpredictable in when the
computation happens and how much space it would take. In a pure FP, it does not matter when
the computation happens. But once you add mutable side effects like IO or mutable structure, then
it matters.
Scala uses strict evaluation by default, but allows lazy evaluation of value definition with the
lazy val form :
lazy val x = exp
/*[Keith] comment
The call-by-name is actually implemented by pass an object reference to the function .
So, when the parameter is references, it is just calling the method of the object
with the method body defined just the same as its definition.
The lazy evaluation solve the "multiple evaluation" problem when the call-by-name argument is
evaluated multiple times when called/referenced multiple times.
It actually store the value the first time it is referenced, similar to the java singleton pattern(lazy
initialization mode)
*/
//using a lazy value for tail , the Stream.cons can be implemented more efficiently :
def cons[T](hd:T, tl : => Stream[T]) = new Stream[T] {
def head = hd
lazy val tail = tl
...
}
/*[Keith Comment]: that is the reason why spark is fast in transformation and it is lazy evaluation.
Also, the RDD calculation is only triggered by an "action", so the result is purely depends on the
the action , so the RDD can not be reused. If configured, the intermediate RDD could be cached.
As Professor M recorded, lazy evaluation and eliminated the intermediate data objects, but it also
own cons : it is unpredictable about when the calculation happens (in spark it is fine as it is only
triggered by an "action") and can not determine how much space it would take in each call .
*/
###Seeing it in Action###
//Let's observe the execution trace of the expression using the substitution model :
// (streamRange(1000,10000) filter isPrime ) apply (1)
//and we could assume the filter and apply is defined as below :
def filter[T]( f: T =>Boolean) : Stream[T] = {
this match {
case Stream.emtpy => Stream.empty
case x #:: xs if f(x) => x #:: xs filter f
case _ => this.tail filter f
}
}
def apply (x:Int) = if (x==0) head else this.tail.apply(x-1)
def streamRange(lo:Int, hi:Int) : Stream[Int] =
if (lo >= hi) Stream.empty
else Stream.cons(lo,streamRange(lo + 1,hi)) //recursive structure
//let us reducing the expression
streamRange(1000,10000).filter(isPrime).apply(1)
--> if ( 1000 >= 10000) empty
else cons(1000,streamRange(1000 + 1, 10000) )
.filter(isPrime).apply(1)
-->cons(1000,streamRange(1000+1,10000)).filter(isPrime).apply(1)
//abbreviate the cons(1000,streamRange(1000+1,10000) as C1 and so on
-->C1.filter(isPrime).apply(1)
-->(if(C1.isEmpty) C1 //expanding filter
else if ((isPrime(C1.head)) cons(C1.head, C1.tail.filter(isPrime))
else C1.tail.filter(isPrime))
.apply(1)
--> C1.tail.filter(isPrime).apply(1) //evaluating if
--> streamRange(1001,10000).filter(isPrime).apply(1)
the evaluation continues like this until :
--> streamRange(1009,10000).filter(isPrime).apply(1)
--> cons(1009,streamRange(1000+1,10000)).filter(isPrime).apply(1)
--> C9.filter(isPrime).apply(1)
--> ((isPrime(C9.head)) cons(C9.head, C9.tail.filter(isPrime))).apply(1)
/*the filter pushed to tail and now we get another stream without a filter operation.
so now, we could proceed to evaluate the apply function.
as we need to take the second element, so we proceed to process the apply function
on the tail Stream : C9.tail.filter(isPrime).apply(0)
So now it is clear how the lazy evaluation helps eliminate the intermediate result :
the final function call (should be an "action") will trigger the whole evaluation chain.
each element is send to the next function whenever the current function call is done.
(filter returned when it met the first prime number and send to next function for evaluation.
when apply does not meet, it will trigger the following evaluation to start on Stream.tail.
So if there is another function on the output of apply, see f, then if f does not return,
it will trigger the whole evaluation chain to repeat.
That is same logic for Stream in Java8 and should be same for RDD in spark as well.
So the reason why streamRange is more efficient than listRange is that : it is emitting
element per the final "action" 's requirement, no more.
)
*/
--> cons(C9.head, C9.tail.filter(isPrime))).apply(1)
--> C9.tail.filter(isPrime).apply(0)
...
-->cons(C13.head, C13.tail.filter(isPrime))).apply(0)
-->C13.head //1013
So it is convinced that streamRange did never beyond the second prime number !
/*Keith Comment
I just came across the reason why it is not allowed to insert & select from the same table in spark.
The reason is that the RDD from the same location of HDFS is just immutable.
And to support fault-tolerant, the underlying source of RDD should not be changed in RDD.
As we may have multiple stages and the RDD may be cached or re-computed due to node failure ,
It is required that the underlying data of RDD should not be changed.
While for hive, the reason is that it does not use cache anyway as each step is actually a
map-reduce job and no way to cache the data.
So it is safe that we could insert into the same table as it is a atomic operation and goes well
with hive. (even we did the same thing in spark, but the RDD of the select query may be reused ).
Advantage of RDD is also the disadvantage (no free lunch !).
The immutability of RDD could not support the parameter update in ML on the fly ,
and need to reset the whole thing and re-run the process with different parameter and
can not adjust the parameter values during the computation.
Check the Angel project of Tencent for more details.
*/
#######7.4 Continue with laziness######
One nice aspect of laziness, is that is makes it easy to deal with infinite amout.
For instance, the stream of all integers :
/*notice the recursive function call of "from" here in which it does not define a base case and
it does not toward to the base case in each iteration.
This is because, from(n+1) is a call-by-name of cons function and it will not be evaluated
when passed to cons . In cons body, the from(n+1) is assigned to a "def" tail and it will NOT
trigger the evaluation of from as well.
While, whenever tail is referenced within an expression, it may be evaluated once and
we get a head which is call-by-value and another Stream with a call-by-name tail object.
Learn the way to construct an infinite stream.
*/
def from(n:Int) : Stream[Int] = n #:: from(n+1)
//the stream of all natural numbers
val nats = from(0)
//the stream of all multiplier of 4
val nats = from(0) map (_*4)
val maps = (nats take 10).asList //will trigger the evaluation .
/*The Sieve of Eratosthenes
An ancient technique to calculate prime numbers.
> Start with all integers from 2, the first prime number
> Eliminate all multipliers of 2
> The first element of the resulting list is 3, a prime number
>Eliminate all multipliers of 3
> Iterate forever. At each step, the first number in the list is a
prime number and we eliminate all its multipliers.
So now let us define the sieve method :
*/
//it is also a "recursive" but lazy , so it is fine
//so actually we could say the infinite stream is actually a lazy recursive function call.
def sieve(n:Int) : Stream[Int] = {
//println(n)
if(isPrime(n)) n #:: (sieve(n+1) filter (_ % n != 0))
else sieve(n+1).filter (_ % n != 0)
}
println(sieve(2).take(100).toList)
/*The square roots
Our previous algorithm for square roots always used a isGoodEnough test to tell when to
terminate the iteration.
With streams we can now express the concept of converging sequence without having to worrying about
when to terminate it :
*/
def sqrtStream(x:Double) : Stream[Double] = {
def improve(guess:Double) = (guess + x/guess)/2
lazy val guesses : Stream[Double] = 1 #:: (guesses map improve)
guesses
}
sqrtStream(2).take(3).toList
/*[Keith Comment] : the key point to understand this "lazy" recursion is remember that
"lazy" means only evaluate when accessed . So when the expression :
lazy val guesses : Stream[Double] = 1 #:: (guesses map improve)
is evaluated, the "guesses" on the right side is actually not checked , just take it as
a black box (the question mark ?). So it is type checked on both side and no error.
(BTW,I struggled at this point for a few hours :) )
The "lazy val" means : the "val" is for hold a value and "lazy" means initialize as lazy as possible.
The recursion call here works as below :
Take C is shorthand for : (guesses map improve).
sqrtStream(2).take(4).toList
--> guesses.take(4).toList
--> guesses.head :: guesses.tail.take(3)
--> (1 #:: C).head :: guesses.tail.take(3)
--> 1 :: C.take(3)
--> 1 :: ((1 #:: C) map improve).head :: ((1 #:: C) map improve ).tail.take(2) //evaluating C
--> 1 :: (improve(1) #:: C map improve).head :: (C map improve ) take (2) //evaluating map
--> 1 :: improve(1) :: (C map improve ) take (2)
--> 1 :: improve(1) :: ((1 #:: C) map improve ) take (2)
--> 1 :: improve(1) :: (improve(1) #:: (C map improve)) take(1)
--> 1 :: improve(1) :: improve(improve(1)) :: ( C map improve) take (1)
--> 1 :: improve(1) :: improve(improve(1)) :: (((1 #:: C) map improve) map improve) take(1) //evaluating C
...
--> 1 :: improve(1) :: improve(improve(1)) :: improve(improve(improve(1)))
While if above is true, I would concern about the performance. When I add a "println" in improve,
I did not see the multiple evaluation on the same value of 1 .
And I go to SO for help to find out how it is implemented in Scala.
When I typed the question , SO gives me a few similar question. I did find one exactly the one I am raising.
and fortunately, Will Ness, the FP gunu gave a more detailed answer and I realized I miss one important
characteristics of Lazy val : memorization.(Notice the similar question is about Haskell and in Haskell by default
every val is a lazy val, but it still applied to Scala.)
sqrtStream(2).take(4).toList
--> guesses.take(4).toList
--> (1 #:: C).head :: guesses.tail.take(3)
--> 1 :: C.take(3)
--> 1 :: ((1 #:: C) map improve).head :: ((1 #:: C) map improve ).tail.take(2) // evaluating C
//tail is evaluated as tail_1 = guesses map improve = C
#*then evaluated as tail_1 = improve(1) #:: tail_1 map improve*#
//let us name it as tail_1 = head_1 #:: tail_2
//head_1 is a value and tail_2 is uninitialized at this moment.
--> 1 :: head_1 :: tail_2 take (2) // evaluating map
--> 1 :: head_1 :: (tail_1 map improve) take (2) //evaluating tail_2
--> 1 :: head_1 :: ((head_1 #:: tail_2) map improve) take (2) //tail_1 is a value now
--> 1 :: head_1 :: (improve(head_1) #:: (tail_2 map improve)) take(2)
#*so the tail_2 is evaluated as improve(head_1) #:: tail_2 map improve *#
//let us name it as tail_2 = head_2 #:: tail_3
//and we see the head_1 is reused as it is a value
--> 1 :: head_1 :: head_2 :: tail_3 take (1)
--> 1 :: head_1 :: head_2 :: (tail_2 map improve) take(1) //evaluating tail_3
--> 1 :: head_1 :: head_2 :: (head_2 #:: tail_3) map improve) take(1) //tail_2 is a value now
...
--> 1 :: head_1 :: head_2 :: improve(head_2)
So we see that when Scala evaluate this infinite stream with "lazy val", the lazy val will be
evaluated once and then memorized. The recursion on lazy val would like above case.
The recursion MUST be transferred to tail element and then continue in each tail element evaluation.
As we see here, the recursion on lazy val actually happened as the normal recursive as we start on
base case and then proceed to the final problem. But it writes on the reverse side .
Recursion usually means we write code from f(n) to f(1)(base case) , while with lazy val ,
We could write like f(1) to f(n) which is more intuitive.
*/
Suppose there are a few glasses without any marking but the capacity is known for each of them
and a faucet and a sink below the faucet.
There exists an unknown amount of water in one of the glasses(say x ) and we want to figure out
this amount. The allowed operations are :
> full the glass with the faucet.
> empty the glass and pour it either to the other glass or the sink.
//Actually the water pouring is to find a puring sequence to get an expected amount of water.
//Anyway, over shoes over boots ! and this would be the "reversed" order by the classic water pouring problem.
For example , 2 glasses like below:
glass A = 4 deciliter ,name it as MA
glass B = 9 deciliter ,name it as MB
and there is an unknown amount of water in glass B ( called X and take it as 6 for an example) now.
Think about the problem with the example case and then generalize the ideas.
The solution for this case would be :
> Step1: pour the water from B to A , if A is full, empty it and pour it to the sink.
after step1 , we get an unknown about in A and also record the passes that we empty A during
this operation, say PA.
B : 6 -> 2 //B pour to A
A : 0 -> 4
A : 4 -> 0 //A pour to sink
B : 2 -> 0 //B pour to A
A : 0 -> 2
> Step2: right now ,the problem convert to identify the smaller amount in glass A called SA.
Then X = SA + PA * MA , 6 = 2 + 1*4
X = 1*4 + 2
> Step3 : the only way we could do now, it is empty A and pour it back it B.
A is empty and B contains 2 deciliter now.
A: 2 -> 0 //A pour to B
B: 0 -> 2
> Step4 : now B has SA and A is empty, as SA is smaller than X.
now we full A and then pour it to B until B is full.
So now, the amount remained in A when B is full would be the key to the problem .
> Step5 : We empty B and then pour A to B until it is full and then pour B to sink.
Now there would be some amount in A , otherwise the problem is solved.
Repeat until A is empty and B is full after A pours to B.
A: 0 -> 4 //pour from sink to A
A: 4 -> 0 //A pour to B
B: 2 -> 6
A: 0 -> 4 //pour from sink to A
A: 4 -> 1 //A pour to B
B: 6 -> 9
B: 9 ->0 //B pour to sink
Then we know the relation between X and the amount remained in A called RA_1 now.
X = SA + PA * MA
--> = (9 - (2*4 - RA_1)) + 1 * 4
--> = 9 - 1*4 + RA_1
So the answer depends on RA_1 now.
Repeat step 4 and 5 until we reach the case where A is empty and B is full:
A: 1 -> 0 //A pour to B
B: 0 -> 1
A: 0 -> 4 //pour from sink to A
A: 4 -> 0
B: 4 -> 5
A: 0 -> 4 //pour from sink to A
A: 4 -> 0
B: 5 -> 9
Then RA_1 = 9 - 2*4 and then we know : RA_1 = 1 and then we know X = 6 !
X = 9 - 1*4 + RA_1
--> = 9 - 1*4 + (9 - 2*4)
--> = 2*9 - 3*4
--> = 6
Now the solution would be clear for the 2 glasses case : we only have the 4 deciliter and 9 deciliter as
a unit in the measurement and any number would be expressed as the combination of 4 and 9 : an algebraic equation.
First we need to prove that any number could be expressed as the combination of
any two or more distinct number ,say X, Y .--To DO
So the idea is to find the coefficient of A, B and express the unknown water amount as :
Answer = Capacity_B*Coeff_B + Capacity_A * Coeff_A.
Now let us generalize the solution :
Suppose B is not empty and A is empty at this initial state and also Capacity_B > Capacity_A.
> Step1: pour the water from B to A until either:
B is empty or A is full.
if A is full and B is empty then Capacity_A would be the answer.
if A is full and B is not empty, pour A to sink and increase the coefficient of A(Coeff_A) by 1
and then repeat this step.
This step would end up with B is empty and the volume in A would be less than the capacity of A.
Problem converted to find the amount in A now.
Notice the amount in A would be less than min(Capacity_A,Capacity_B).
> Step2: Pour the water from A to B :
If A is empty and B is not full //obviously ,see above.
Full A from the faucet and then pour it to B until either
OR A is empty and B is full , decrease Coeff_A by 1 and increase Coeff_B by 1 and return .
The problem is solved as we already know the answer would be
Capacity_B*Coeff_B + Capacity_A * Coeff_A.
OR A is empty and B is not full, decrease the Coeff_A by 1 and repeat step 2 .
OR A is not empty and B is full, increase Coeff_B by 1 and decrease Coeff_A by 1 ,empty B and repeat step2.
(Notice that Capacity_B > Capacity_A, so we would measure the amount remained in B by Capacity_A,
the opposite side operation is impossible and would get into a dead loop).
---First version ---
//Review : 45 LOC and use var , hard to read and understand.
//Uppercase for constant and avoid trouble in pattern matching.
def waterPourProblem(cap: (Int, Int), init: (Int, Int)): Int = {
val coeff_pair = waterPour(
//with using min/max , now we do not to care which one is large and the initial water exists in
//which glass. The thick is to define a function only work on specific order and then pass the sorted value
//to avoid the verbose if/else.
(Math.min(cap._1, cap._2), Math.max(cap._1, cap._2)),
(Math.max(init._1, init._2), Math.min(init._1, init._2)),
(0, 0))
println(coeff_pair)
Math.min(cap._1, cap._2) * coeff_pair._1 + Math.max(cap._1, cap._2) * coeff_pair._2
}
/*below assumption must be matched:
1. cap._1 < cap._2
2. vol._1 != 0 and vol._2 == 0
*/
def waterPour(cap: (Int, Int), vol: (Int, Int), coeff: (Int, Int)): (Int, Int) = {
var (coeff_a, coeff_b) = coeff
var (vol_a, vol_b) = vol
val (cap_a, cap_b) = cap
//move the unknown water to B, it could exists in A or B in initial state.
//if initial in A, then it is the next move.
//if initial in B, then we are in the first move.
vol_b = vol_a
vol_a = 0
//Thread.sleep(1000)
while (cap_b - vol_b > cap_a) {
println("while_loop", cap_a, cap_b,vol_a, vol_b, coeff_a, coeff_b)
coeff_a = coeff_a - 1
vol_b = vol_b + cap_a
}
if (cap_b - vol_b == cap_a) {
coeff_a = coeff_a - 1
coeff_b = coeff_b + 1
(coeff_a, coeff_b)
} else {
vol_a = cap_a - (cap_b - vol_b)
vol_b = 0
coeff_a = coeff_a - 1
coeff_b = coeff_b + 1
waterPour(cap, (vol_a, vol_b), (coeff_a, coeff_b))
}
}
---Second Version--
/*The insight we get now is that : actually the water to be measured would always be in B
as we need to measure the amount using the smaller glass to get the point where A is empty
and B is full. So we could remove the vol_a variable.
Embed the waterPour function inside waterPourProblem to reduce the parameters passed to waterPour.
(one problem in FP is the parameters explosion,simplify when possible).
Remove the "var" variable as that is not encouraged and the above code is kind of hard to track and understand
the algorithm in code (write the code like the problem is the philosophy of a high level language).
Code format and adjust "if".
What we learn from this case is that : the recursion here is a tail recursion.
We write the code NOT the way like before recursion : assume we get solution for k-1, and now
how could we process the k case. Instead, We define a "base case" where to return the result
and iterate (tail recursion is the same as a while loop or iteration).
I even suspect if this case could be converted the "express k in k-1 form".
Review : 20 LOC, no var and a tail recursion version.
*/
def waterPourProblem(cap_a:Int,cap_b:Int, init:Int ): Int = {
/*Assumption: cap._1 < cap._2
*/
@tailrec
def waterPour(vol: Int, coeff_a:Int,coeff_b:Int): (Int, Int) = {
//println("call now:", cap_a,cap_b, vol, coeff_a,coeff_b)
if (cap_b - vol == cap_a) {
(coeff_a - 1, coeff_b + 1)
}else if (cap_b - vol > cap_a ) {
waterPour(vol + cap_a, coeff_a - 1 , coeff_b)
} else {
//skip the operation which is to move the remained water in A to B
//vol = cap_a - (cap_b - vol_b)
waterPour(cap_a - (cap_b - vol), coeff_a - 1, coeff_b + 1)
}
}
val coeff_pair = waterPour(init,0, 0)
println(coeff_pair)
cap_a * coeff_pair._1 + cap_b * coeff_pair._2
}
//Go back to the "right" understanding of the Water Pouring problem.
Now let us make the generalization further to arbitrary number of glasses of arbitrary given
capacities ,and an arbitrary target capacity in one of the glasses, i.e there exits a glass that contains
the target size after the operations.
/*
This problem is actually a search problem. (when we can not handle the complex logic with a if/else sequence).
We model the water in the glasses as a state and after each operation, the state changed .
The operations are the :
empty //empty the glass
full //fill the glass wit water
pour from one to another
The states of glasses would be changed as a state machine.
And the solutions would be a set of sequences of Moves called Paths.
As we modeled the problem as a search problem (search possible solutions in a space), then we could decide to
use BFS/DFS. In this case , we use BFS first as it is easier to code as it is just to add a arbitrary move step from previous
path(whether it is more efficient or not depends on the
real problem and the input.)
Also notice that the search space would be increased in a exponential of 3 if we do not remember the traversed state.
So by adding the traversed state, the solution would be an example of "dynamic programming".
Let us try it !
*/
//By combining OO and FP ,we make below classes(models) and the actions on classes are always FP (no side effect).
States and Moves
Glass: Int //0 to number -1
States : Vector[Int](one entry per glass) //Represent the water in each glass
Moves :
>Empty(glass)
>Fill(glass)
>Pour(from,to)
Path(List[Move]) //initialized with a list of Move
from(paths: Set[Path]): Stream[Set[Path]]//as the problem is a search problem and the search space could exploded too much
//so we use Stream as the final output and then could generate paths per request.
//Professor M's solution
//problem initialized with a list of glasses' capacity.
class MyPouring(glasses:List[Int]) {
//state is just a list of Int ,each marked the current water volume in each glass
//indexed from 0 to number of glasses -1
type State = List[Int]
val initialState : State = glasses map ( _ => 0)
trait Move {
def change(state:State) : State
}
//A move is an action on a glass, so it is constructed with a glass number
case class Empty(glass:Int) extends Move {
//"updated" indicate a new value out of old value, not updating the existing object
def change(state:State) :State = state.updated(glass,0)
}
case class Fill(glass:Int) extends Move {
def change(state:State) : State = state.updated(glass,glasses(glass))
}
case class Pour(from:Int,to:Int ) extends Move {
def change(state:State) :State = {
//identify the amount to be subtract for source and added for target glass
val amount = Math.min(state(from),glasses(to) - state(to))
state.updated(from,state(from) - amount).updated(to, state(to) + amount)
}
}
/**
* Define the function in its natural scope.
* A path constructed from a list of Move.
* It has next with a move to generate a new path after the Move
* The latest move added to the head of the list.
* T
*/
class Path(val path:List[Move], val endState : State) {
def next(move:Move) : Path = new Path(move :: path, move change endState)
//An endState constructed from the initial state, we do not want to compute the endState
//each time we call, why not remember it ?
//def endState : State = path.foldRight(initialState)(_ change _)
override def toString = path.reverse.mkString(" ") + "--->" +endState
}
val moves = {
val range = 0 until glasses.length
(for {
glassa <- range
glassb <- range
if (glassb != glassa)
} yield Pour(glassa, glassb)) ++
(for(glass<- range) yield Empty(glass)) ++
(for(glass<- range) yield Fill(glass))
}
/**Finally, let us define the solution to the problem.
We generate all the possible from the initial state.
In this case, we do not require the end condition in the method parameter,
then the output would be a Stream as we can not generate all the paths at once.
Notice the recursive call of "from", the input type and ouput type may not be the same
and we could construct the output value by recursive call.
If we do not use th states to record the path we have traversed, then it may take too much
time to get a result and thought it is in dead loop, while actually not....
The paths exploded at a exponential of 6 : 6, 36,216, 1296 , 7976 , 47876 ...
For the case of glasses of 4,7 to get 6, it takes just 5 steps to find the first solution.
For the case of glasses of 4,9 to get 6, it takes 8 steps, i.e. : 47876*6*6 = 1.7million
Besides that, the action like take 3 may get answers but actually they are duplicate but just
add some moves that has already traversed.
*/
def from(paths:Set[Path], states: Set[State]) : Stream[Set[Path]] = {
if(paths.isEmpty) Stream(Set())
else {
val more : Set[Path] = {
for {
path <- paths
next <- moves map path.next
if ! states.contains(next.endState)
} yield next
}
val newStates : Set[State] = states ++ (more map(_.endState))
paths #:: from(more,newStates)
}
}
val initialPaths = Set(new Path( Nil ,initialState))
val paths = from(initialPaths, Set(initialState))
def solutions(target:Int) : Stream[Path] = {
for{
pathSet <- paths
path <- pathSet
if path.endState contains(target)
}yield {path }
}
}
/*
Some other thoughts/solutions or variants:
In a program of the complexity of the pouring program, there are many choices to be made.
Choices of representations (Models) :
> Specific classes for moves and paths , or some encoding ?
> OO methods or naked data structures with functions?
> Can we have a recursive solution for this case ? //Keith added
*/
/*
Guidelines for programming : it depends on your experience. As the old saying goes , you could learn
a program in ten days and then you improve for ten years.
Some guidelines considered useful :
>Modularity and name everything you can .
Break things up to little pieces and each one only has limited operations.
Name each piece that makes the program much more intelligible and readable.
> Put operations into natural scopes.
For example , we put change inside a move operation because a move change things.
> Keep degrees of freedom for future refinement.
You do not need to keep everything flexible which may be over-designed.
Just keep the part which supposed to be refined in future.
> Add parentheses if possible as the FP style use the function with parameters a lot.
*/
-----------original code------------
def sumInts(a:Int, b:Int) :Int = {
if(a>b) 0 else a + sumInts(a + 1, b)
}
sumInts(1,10)
def cube(x:Int) :Int = x * x * x
def sumCube (a:Int, b:Int ) :Int = {
if(a>b) 0 else cube(a) + sumCube(a + 1 , b)
}
sumCube(1,10)
def fact(x:Int) :Int = {
if(x==0) 1 else x * fact(x-1)
}
def sumFact(a:Int, b:Int) : Int = {
if(a>b) 0 else fact(a) + sumFact(a + 1, b)
}
sumFact(1,10)
-----------first attempt---------------
We see the common part of these code, only difference in the function to process each int .
so make that function as a parameter.
def sum(f:(Int) => Int, a:Int, b:Int ):Int = {
if(a>b) 0 else f(a) + sum(f,a + 1, b)
}
def sumInts(a:Int, b:Int) = sum(x => x, a:Int, b:Int)
def sumCube(a:Int, b:Int) = sum(x => x * x * x, a:Int, b:Int)
def sumFact(a:Int, b:Int) = sum(fact, a:Int, b:Int)
---------second attempt----------------
Now, we see the duplicate parameter of a, b on the both side, can we make it less verbose ?
In the above attempt, we use high-order function which take another function as input.
In this case, not only take another function as input, but also return a new function as value.
def sum(f:(Int) => Int) : (Int, Int) => Int = {
def sumF(a:Int, b:Int) :Int = {
if(a>b) 0 else f(a) + sumF(a + 1,b)
}
sumF
}
def sumInts = sum(x => x)
def sumCube = sum(x => x * x * x)
def sumFact = sum(fact)
sumInts(1,10)
sumCube(1,10)
sumFact(1,10)
--------third attempt---------------
those defined function sumInt, sumCube, sumFact is still a little verbose, can we make it shorter ?
sumInts(1,10) is equivalent to sum(x => x)(1,10)
sumCube(1,10) is equivalent to sum(x => x * x * x)(1,10)
sumFact(1,10) is equivalent to sum(x => fact(x))(1,10)
As function application is associated to the left.
--------Fourth attempt----
There is special syntax for those functions that return functions. For example,
the above sum can be rewritten as(i.e. another syntax sugar) :
def sum(f:(Int) => Int)(a:Int, b:Int): Int = {
if(a>b) 0 else f(a) + sum(f)(a + 1, b)
}
sum(x => x)(1,10)
sum(x => x * x * x)(1,10)
sum(x => fact(x))(1,10)
In this way, the function sum in the original code with 3 parameters has been "currying" to a
high-order function with this form.
The benefit is that the parameters can be applied in different context , i.e. partially applied.
sum(f:Int=>Int) application will return a function : (Int, Int) => Int
For example :
val f = sum(x => x * x) _ //the underscore is to make this is a partially applied function
f(1,10) //385
----Concepts--
1. The type A => B is the type of a function that takes an argument of type A and return a result of type B.
So, Int => Int is the type of a function that map integers to integers.
2. In general, a definition of function with multiple parameter lists :
def f(arg1)(arg2)...(argn) = E
is equivalent to
def f(arg1)(arg2)...(argn-1) = {def g(argn) = E;g} where g is fresh identifier.
(actually is : def f(arg1)(arg2)...(argn-1) : (argn)=>E = {def g(argn)} = E;g}
This is exactly we did when define the model for sumF, we drop the outer argument list and define a new
sum function which return sumF to handle the outer argument.
Or for short , we can just write an anonymous function :
def f(arg1)(arg2)...(argn-1) = (argn) =>E
By repeating the process n times
def f(arg1)(arg2)...(argn) = E
is shown to be equivalent to
def f = (arg1 =>(arg2 =>...(argn =>E)...))
This style of definition and function application is called currying.
The function type associated to the right .
For example ,
def sum(f:Int=>Int)(a:Int, b:Int) :Int =...
The type of sum is (Int =>Int) =>((Int,Int) =>Int) in anonymous form.
---Exercise---
1. write a product function to calculate the product between a and b ?
2. write factorial in terms of product
3. write a more general function, which generalize both sum and product ?
Answers :
1.
def product(a:Int, b:Int) : Int = {
if(a>b) 1 else a * product(a + 1, b)
}
2.
def factorial(a:Int) = product(1,a);
def sum(f:(Int) => Int)(a:Int, b:Int): Int = {
if(a>b) 0 else f(a) + sum(f)(a + 1, b)
}
3.
The parameters we need :
1)the function f to map a to f(a)
2)the function combine to combine the f(a) and the rest of the values.
3)the parameter a, b
4)and the default value if a>b
So it is kind of map+combine
--version1--
def mapReduce(f:(Int) => Int, combine:(Int, Int) => Int,zero:Int)(a:Int,b:Int) : Int = {
if(a>b) zero else combine(f(a),mapReduce(f,combine, zero)(a + 1,b))
}
mapReduce(x => x,(a,b) => a + b,0)(1,10) //sum
mapReduce(x => x,(a,b) => a * b,1)(1,10) //product
--version2 with fully curring---
def mapReduce(f:(Int) =>Int)(combine:(Int,Int) => Int) (a:Int, b:Int)(zero:Int) :Int ={
if(a>b)zero else combine(f(a),mapReduce(f)(combine)(a+1,b)(zero))
}
--version2.1 with parameters in different places--
def mapReduce(f:(Int) =>Int)(a:Int, b:Int)(zero:Int)(combine:(Int,Int) => Int) :Int ={
if(a>b)zero else combine(f(a),mapReduce(f)(a+1,b)(zero)(combine))
}
//only the recursive function application call defines recursive function
//the combine case in above is not recursive as the second one is actually a parameter.
mapReduce(x => x)(1,10)(0)((a,b) => a + b)
Is actually equivalent to :
def mapReduce(f:(Int) =>Int) : (Int,Int) => ((Int) => (((Int,Int) => Int) => Int)) = {
def mapReduceF1(a:Int,b:Int) : (Int) => (((Int,Int) => Int) => Int) = {
def mapReduceF2(zero:Int) : ((Int,Int) => Int) => Int = {
def mapReduceF3(combine:(Int,Int) => Int) : Int = {
if(a>b) zero else combine(f(a),mapReduce(f)(a + 1, b)(zero)(combine))
}
mapReduceF3
}
mapReduceF2
}
mapReduceF1
}
mapReduce(x => x)(1,10)(0)((a,b) => a + b) //sum
mapReduce(x => x)(1,5)(1)((a,b) => a * b) //product
mapReduce(x => x * x * x )(1,10)(0)((a,b) => a + b) //sum of cube
The return type can be omitted except the outermost one as it is a recursive call.
But need to add underscore to differentiate the function application and partially applied function.
def mapReduce(f:(Int) =>Int) : (Int,Int) => ((Int) => (((Int,Int) => Int) => Int)) = {
def mapReduceF1(a:Int, b:Int) = {
def mapReduceF2(zero:Int) = {
def mapReduceF3(combine:(Int,Int) => Int) = {
//where partial function does not change the fact that the function is only
//applied/evaluated when the whole parameter list is ready.
if(a>b) zero else combine(f(a),mapReduce(f)(a + 1, b)(zero)(combine))
}
mapReduceF3 _
}
mapReduceF2 _
}
mapReduceF1 _
}
mapReduce(x => x)(1,10)(0)((a,b) => a + b) //sum
mapReduce(x => x)(1,5)(1)((a,b) => a * b) //product
mapReduce(x => x * x * x )(1,10)(0)((a,b) => a + b) //sum of cube
---Further improvement ---
change the recursive function to non-recursive one ,that is
to make it as a tail recursion
e.g:
def sum(f:Int => Int)(a:Int,b:Int) : Int = {
if(a>b) 0 else f(a) + sum(f)(a + 1 , b)
}
changed this to tail-recursion :
def sum(f:Int => Int)(a:Int,b:Int) : Int = {
def loop(x:Int,acc:Int) :Int = {
if(x > b) acc else loop(x + 1,f(x) + acc)
}
loop(a,0)
}
for the mapReduce one :
def mapReduce(f:Int => Int,combine: (Int,Int) => Int, zero:Int)(a:Int,b:Int) :Int = {
def loop(x:Int,acc:Int) :Int = {
if(x > b) acc else loop(x + 1, combine(f(x),acc))
}
loop(a,zero)
}
---------------Polymorphism---------------
Two principal forms of polymorphism:
1.subtyping
2.generics
Method to abstract and compose types.
Type parameterization means that classes as well as methods can now have types as parameters.
Started from List , an immutable linked list and is a fundamental data structure in many functional languages.
It is constructed from two building blocks:
Nil : the empty list
Cons : a cell containing an element and the reminder of the list . eg,List(1,2,3) ,
_____
|c1|p1|
/ \
1 \
____
|c2|p2|
/ \
2 \
____
|c3|p3|
/ \
3 \
Nil
trait List[T] //not only Int, but Boolean, Double...
class Cons[T](val head:T,val tail :List[T]) extends List[T]
class Nil[T] extends List[T]
Type parameters are written in square brackets, e.g. [T]
Generic Functions:
Like classes, functions can have type parameters.
For instance, here is a function that created a list consisting of a single element.
def singleton[T](elem:T) = new Cons[T](elem,new Nil[T])
Then we can use singleton[Int](1), singleton[Boolean](true), or use singleton(1) or singleton(true)
as long as Scala can infer the type parameters from context.
--------Type erasure---
Type parameters do not affect evaluation in Scala.
We can assume that all type parameters and type arguments are removed before evaluating the program.
This is called type erasure(type erased at run-time).
In effect, type parameters are only for the compiler to verify that the program satisfies certain correctness
properties, but they are not relevant for the actual execution.
--------Polymorphism--------
So what we have seen is a form of polymorphism.
Polymorphism means that a function type comes "in many forms".
In programming it means that :
1. the function can be applied to arguments of many types, (subtyping) or
2. the type can have instances of many types(generics).
We have seen two principal forms of polymorphism:
1. subtyping : instance of a subclass can be passed to a base class .
e.g. val l:List[Int] = new Nil[Int]
val l:LIST[Int] = new Cons(1,new Nil[Int])
2. generics : instances of a function or class are created by type parameterization.
Subyping comes from OOP first and generics comes from FP first.
------------------------------------------------------
Week4.2
In this session, we are going to see the interactions of subtyping and generics , the two principal forms
of polymorphism.
Two main areas:
1. bounds : subject type parameters to subtype constraints.
2. variance : defines how parameterized types behave under subtyping.
Type bounds:
same issue with Java, the generic class or methods can not operate on the subtypes even
it is reasonable. So we need a way to eliminate this restriction .
Bounds can help .
def assertAllPos[S <: IntSet])(r:S) S =...
type parameter S is defined as subtype of IntSet.
Notation:
1. S <: T means : S is a subtype of T and
2. S >: T means : S is a super type of T.
3. S >: L <: U would restrict S any type on the interval between L and U
------Variance--------
Covariance
Give NonEmpty <: IntSet
is
List[NonEmpty] <: List[IntSet] ?
If this is true, we all types for which this relationship holds covariant because their subtying relationship
varies with the type parameter.
For example, the Array in Java is covariant:
Array[NonEmpty] <: Array[IntSet]
Use case : a function to count the elements of an Array of IntSet,then it should work on Array of NonEmpty
So in this case, what we care about is not relevant to the real type of the element.
Or like the function sort which work on Array[Object] also works on Array[Int].
But it did cause error if we operate on the element and assign it to different type other then the type
parameter as Java keep the Array type at runtime and will throw run-time error for this change.
###But in Scala, Array is not co-variant.
You can NOT do assignment like this
val b :Array[IntSet] = new Array[NonEmpty]
While if there is type assignment or change of the element type, then we need to use contra-variant.
------Liskov principle------
When do we make the generic covariant or contra-variant ?
The following principle, stated by Barbara Liskov, tells us when a type can be subtype of another:
If A <: B, then everything one can do with a value of type B ,one should also be able to do with
a value of A
the formal statement :
Let q(x) be a property provable(type sound?) about objects x of type B. Then
q(x) should be provable for objects y of type A where A <: B
]
Function type are Class type:
Function type means class :
(Int) => Int is a function type with single argument and return value is Int.
is actually equivalent to this in Scala :
trait Function1[Int,Int] { //trait Function1 with Int as the type parameter
def apply(x:Int):Int
}
Function value means the object :
val f = (x:Int) = x * x is equivalent to :
val f = {
class AnnoFunc extends Function1[Int,Int] {
def apply(x:Int) : Int = x * x
}
new AnnoFunc
}
the shorter form is just like Java :
val f = new Function1[Int,Int] {
def apply(x:Int) = x * x
}
Function call, such as f(a,b) where f is a value of some class type, is expanded to :
f.apply(a,b)
Note : Method is not function value otherwise it would be a infinite expansion.
e.g apply in the above definition if expanded to class, then it will be not terminated.
Method definition :
def f(x:Int): Boolean = ...
is not itself a function value.
But if f is used in a place where a Function type is expected,it is converted automatically to the function
value(we define type in the parameter, and you give the value of this type when call the method):
(x:Int) => f(x)//i.e. input parameter is Int and return value is f(x) as defined in method body
or expanded to :
new Function1[Int,Boolean] = {
def apply(x:Int) = f(x)
}
for example :
def fc(anno:(Int) => Int,x:Int) :Int = {
anno(x)
}
could be called with fc(f,100), in this case, f will be converted to a function value.
The conversion of name f (method name) to anonymous function called eta-expansion in lambda calculus.
--------------Exercise--------------------
define an
object List{
....
}
with 3 functions in it so that users can create list of length 0-2 using the syntax
List() //the empty List
List(1) //the list with single element 1
List(2,3)//the list with elements 2 and 3
Answer : take List(1) as an example
def apply[T](x:T): List[T] = new Cons(x,new Nil)
#######2018-02-04#######
----------Week4.3 Primitive type as class ---
Boolean
这个地方的做法和之前看lambda的思路完全一致,如果没有那个基础和当时的思考的话,很难理解这段
视频。在lambda演算中,一切都是函数,数字也会被表示成函数。这其实是更高层次的抽象。
人们定义number是为了表示value,做memory的,但目的是为了operation的。
Lambda中,Boolean就是被表示成了选择函数,所有其他在Boolean上的OP都可以被这个函数表示
Let us see how to define that in Scala:
abstract class Boolean {
//the if(condition)then expression1 else exression2 is abstracted as
//the ifThenElse function called on a boolean value .
//The only function for ALL boolean operations.
//if the boolean value (object) is true, then t, else e returned.
//That is different behavior according to the true/false object.
//So it needs to be abstract/undefined in the Boolean class.and then
//define function body in the 2 different sub-class(singleton object)
//As the boolean evaluation is short-circuit, so use call by name .
def ifThenElse[T](t: =>T, e: =>T) : T
def && (x: Boolean) : Boolean = ifThenElse(x,false)
def || (x: Boolean) : Boolean = ifThenElse(true,x)
def unary_! : Boolean = ifThenElse(false, true)
def == (x:Boolean) : Boolean = ifThenElse(x,x.unary_!)
def != (x:Boolean) : Boolean = ifThenElse(x.unary_!,x)
//Exercise, define < operator to compare Boolean
//and assume false<true
def < (x:Boolean) : Boolean = ifThenElse(false,x)
}
Now,we define true/false :
object true extends Boolean {
def ifThenElse[T](t: =>T, e: =>T) :T = t
}
object false extends Boolean {
def ifThenElse[T](t: =>T, e: =>T) : T = e
}
----Int(Natural Number)-----------
According to Peano number,Natural Number is composed by zero and a successor class.
Let us define a simple version with only support positive integer and zero.
abstract class Nat {
def isZero: Boolean
def predecessor : Nat
def successor : Nat
def + (x:Nat) : Nat
def - (x:Nat) : Nat
}
object Zero extends Nat {
def isZero = true
def predecessor = throw new Error("zero predecessor")
def successor = new Succ(Zero)
def + (that : Nat) : Nat = that
def - (that : Nat) : Nat = if (that.isZero) Zero else throw new Error("zero substraction :" + that )
override def toString() = "0"
//overload the comparison operator
def <(that : Nat) : Boolean = {
if(that.isZero) false
else if(this.isZero) true
else this.predecessor<(that.predecessor)
}
//the parameter of a class will be a field in the java object.
//so the recursive call of +/- method will access the field named n
class Succ(n : Nat) extends Nat {
def isZero = false
def predecessor = n
def successor = new Succ(this)
//succ(n) + m = succ(n+m), this is a recursive call
def + (that : Nat) : Nat = {
println(this + " add " + that)
new Succ(this.predecessor + that)
}
//succ(n) - m = n - m.predecessor, a recursive call as well
def - (that : Nat) : Nat = if(that.isZero) this else n - that.predecessor
def product(that:Nat) : Nat = {
if(that.isZero) Zero
else {
this + this.product(that.predecessor)
}
}
override def toString(): String = {
//we can use "|" or "[" as the indicator, use numbers here
//just for readability
String.valueOf(1+ new Integer(predecessor.toString))
}
}
object HelloWorld {
def main(args: Array[String]) :Unit = {
val n1 = new Succ(Zero);
val n2 = n1.successor
val n3 = n2.successor
val n4 = n3.successor
println(n4+n1) //5 ,
}
}
###Variance Definition#####
Variance : how subtyping relates to genericity.
Roughly speaking, a type that accepts mutations of its elements should not be
covariant.
But immutable types can be covariant, if some conditions on methods are met.
(e.g contra-variant on arguments and covariant on return types).
Definition of variance:
Say C[T] is a parameterized type and A, B are types such that A <: B.
In general, there are three possible relationships between C[A] and C[B] :
C[A] <: C[B] then C is covariant
C[A] >: C[B] then C is contravariant
Neither C[A] or C[B] is a subypte of the other then C is nonvariant.
-----Note:Liskov substitution principle-----
Substitutability is a principle in object-oriented programming stating that,
in a computer program, if S is a subtype of T,
then objects of type T may be replaced with objects of type S
(i.e. an object of type T may be substituted with any object of a subtype S)
without altering any of the desirable properties of T
(correctness, task performed, etc.).
Scala use declaration-site varince by annotating type paratmers with :
class C[+A] {...} --covariant
class C[-A] {...} --contravariant
class C[A] {...} --nonvariant
###Typing rules for Functions#####
Generally, we have the following rules for subypting between functions types:
If A2 <: A1 and B1 <:B2, then
A1 => B1 <: A2 => B2
i.e argument should be contra-variant and return type is covariant.
So this leads to the following definition of Function1 trait :
trait Function1[-T, +U] {
def apply(x:T) : U
}
For example, if client expect a function of : String => AnyRef
then, then you could pass a function like : AnyRef => String
####Scala variance checks######
So in Scala, the compiler will check that there are no problematic combinations when
compiling a class with variance annotations.
**Roughly**:
> covariant type parameters can only appear in method results.
> contravariant type paramters can only appear in method parameters.
> nonvariant type paramter can appear anywhere.
Example, in List library:
Nil is a singleton object : case object Nil extends List[Nothing]
Without variance, we have to define each Nil[T] for any List[T].
Now, as Nothing is the bottom type and List is covariant, it type checks
to declare a singleton object Nil which is the subtype of List[T] for any type T
(As Nil is an object, so it can only be right value, i.e., not a type)
####Bounds for type parameters#####
Considering adding a prepend method to List which prepends a given element,
yielding a new List.
A first attempt would be :
trait List[+T] {
def prepend(ele:T) : List[T] = new Cons(ele, this)
}
While it does not work.
Although List is immutable , it does not have the issue like Array met when
using covariant.
But it is still does not work. the compiler throws error:
"Covariant type T occurs in contravariant position in type T of value ele."
The RCA:
Take this class hierarchy as an example:
IntSet
/ \
/ \
NonEmpty Empty
val x:List[IntSet] ;
val y:List[NonEmpty] ;
x=y;
x.prepend(new Empty)
it does not type check as the left hand expects a List[NonEmpty] but get
a List with one element actually does not conform to type NonEmpty.
Take a look at how this could happen : just due to that we did not
follow a type tree mode, and make NonEmpty , Empty assignable.
How to fix that : to prevent this issue, we need a bound for ele:
def prepend[U >:T](ele:U ) : List[U] = new Cons(ele,this)
Now it type checks. The left hand is List[U],
on the right side, ele is type U and "this" is type List[T].
The result type is List[U] as U is super type of T.
So it type checks.
Rules:
>covariant type parameters may appear in lower bound of method type parameter.
>contra-variant type parameters may appear in upper bound of method type parameter.
A
/ \
/ \
B C
/ \
/ \
D E
/ \
/ \
K G
For example,
If type parameter of class F is covariant,
then to prevent the case like above(D->B->A and then assign C/E to A), the type parameter of method M
must have the lower bound T .
Besides that, according to Scalas' type inference rule :
def f(xs:List[NonEmpty],x:Empty) = xs prepend x ,the result type is List[IntSet].
The RCA is that Empty is not super type of NonEmpty,the inferer
will try to find the smallest super type of Empty and NonEmpty.
it is IntSet in this case. so we take Empty as IntSet.
and then x will be super type of NonEmpty.
(note that when we cast Empty as IntSet, we lose the
customized methods/fields of Empty, no free lunch).
If type parameter of class F is contra-variant,
then we use [U <: T] in type parameter.
In this case , it actually good to declare prepend like this :
trait List[-T] {
def prepend(ele:T ):List[T] = new Cons(ele,this)
}
For example:
val x:List[B] = new Cons(new B, Nil)
val y:List[D] = x
x.prepend(new K) //it type checks
(for prepend, the left side is List[B] , on the right side,
if we add element D to List[B](the underlying type),
then the right side would be List[B] as D is subtype of B.
So it type checks.
But it may not be expected/flexible as the result type is determined
by the initial type parameter of List.
Sometimes, when we add D, we expect as List[D].
So we could define prepend like this :
trait List[-T] {
def prepend[U<:T](ele:U ):List[U] = new Cons(ele,this)
}
######Decomposition#######
When compose tree-like structures, how do we use the class hierarchy and define access methods for element access.
For example, consider a arithmetic expression (for simplicity, only consider number and sum).
then
Expr ::= Number | Sum
Number ::= [0-9]+
Sum ::= Expr '+' Expr
At first attemp, we define a Expr trait and 2 sub-classes : Number and Sum
trait Expr {
//the first 2 are classification methods and the other 3 are access methods
def isNumber : Boolean
def isSum : Boolean
def numValue : Int //if it is number, then access the value
def leftOp : Expr //if it is sum, then leftOp is the left operand
def rightOp : Expr
}
class Number(n:Int ) extends Expr {
def isNumber = true
def isSum = false
def numValue = n
def leftOp = throw new Error ("number.leftOp")
def rightOp = throw new Error ("number.rightOp")
}
class Sum(left:Expr, right:Expr ) extends Expr {
def isNumber = false
def isSum = true
def numValue :Int = throw new Error("sum.numValue")
def leftOp = left
def rightOp = right
}
###Evaluation of Expressions####
def eval (e :Expr ) :Int = {
if (e.isNumber) e.numValue
else if(e.isSum) eval (e.leftOp) + eval (e.rightOp)
else throw new Error("Unknown expression " + e) //a prudent clause in case someone add a new subclass of Expr
}
It may looks good.
While it become tedious and error-prone if we add new subclasses or new method.
we have change all existing sub-classes !
For example, if we add 2 new expressions : Prod and Variable.
then the total methods count would be 5x8 = 40 (add 2 new classification and 1 access method for Variable)
minus the previous method count : 15, so we need to add another 25 methods.
That is quadratic of the total classes (trait + 4 sub-classes)!
For sub-classes ,the solutions may be :
0) use run time type check and type cast.
def isInstanceOf[T] : Boolean //java : x instanceOf T
def asInstanceOf[T] : T //java: (T)x
But that is not type safe and not encouraged in scala as it most use for interoperability with Java.
While, its pros are : no need for classification methods and add access methods only for
classes where the value is defined.
1) OO decomposition
if what you want to do is to evaluate the expression, then we could simply define eval in
trait and subclasses
trait Expr {
def eval : Int
}
class Number(n:Int) extends Expr {
def eval = n
}
class Sum(e1:Expr,e2:Expr) extends Expr {
def eval = e1.eval + e2.eval
}
While, it still not flexible regarding adding new methods like product/display. then all sub-classes nedd
to be changed.
The limitation is that when we want to simplify the expression instead of evaluating it,
like a*b + a*c -> a*(b+c).This is a non-local simplification (only concerning the sub-class itself,no others involved).
And we still need to check the expression types and this expression can not be encapsulated
in the method of a single object. So you are back to the first attempt: need to add test
and access methods for all the different subclasses.
2) Default implementation in trait.
add new method default implementation in trait (java8 interface) and add customized implementation in new sub-classes
It has the same limitation of solution (1).
3) Pattern matching
####Pattern matching######
The above task is to find a general and convenient way to access objects in a extensible
class hierarchy.
Observation of the task :
the sole purpose of test(class classification) and accessor functions is to
reverse the construction process :
> Which subclass was used (classification) ?
> What were the arguments of the constructor (to get the values)?
This situation is so common in many functional languages, Scala included,
so let us automate it this process with : **Pattern Matching**
Here is the changes we need to do when using pattern matching for sub-classes :
> declare the sub-classes with "case" modifier, in this case, scala compiler will
auto-generate a companion object (apply :factory method) and some methods like unapply , etc
Pattern matching is a generalization of switch from C/Java to class hierarchies.
It is expressed in Scala using the keyword match.
Example :
def eval (e: Expr) : Int = e match {
case Number(n) => n
case Sum(e1,e2) => eval (e1) + eval (e2)
}
Syntax Rules :
> match is followed by a sequence of case, pat => expr.
> Each case associate an expression expr with a pattern pat.
> A MatchError exception is throw if no pattern matches the value of the selector
Forms of patterns (Semantic restrictions):
> constructors , e.g. Number, Sum
> variables, e.g. n, e1,e2 which must always begin with a lowercase letter.
and can only appear once in a pattern.
> wildcard patterns _ to match anything.
it could appear multiple times in a pattern, but that means different things.
> constants ,e.g. 1, true or constant variables like PI/N.
which must begin with a uppercase except null, numbers like 1, 2 , true, false
Those forms of patterns could be composed to construct more complicated cases.
like case Sum(e1,e2) to match a Sum constructor with e1, e2 as the argument (Expr).
The references to the pattern variables are replaced by the corresponding parts in the
selector.
What do Patterns match ? (Semantic of pattern match)
> A constructor pattern C(p1,p2...pn) matches all the values
of type C (or its subtypes) that have been constructed with
arguments matching the patterns p1,...pn
> A variable pattern x matches any value, and binds the name of the
variable to this value.
> A constant pattern c matches values that are equal to C (equls or ==)
And the real pattern matching process is a recursive call .
e.g. eval ( Sum(Number(1), Number(2)) )
first, it match Sum(e1,e2) and e1 binds to Number(1), e2 to Number(2)
second, Number(1) matches Number(n) and n binds to 1 ,ditto for Number(2)
such process could go further and eventually terminates.
Also, it is possbile to defin eval just inside the Expr trait:
trait Expr {
def eval:Int = this match {
case Number(n) =>n
case Sum(e1,e2) => eval(e1) + eval(e2)
}
}
Which one is preferred(the above and the OO one) depends on the the future
extensibility and the possible extension path of your system.
If you add sub-classes more often then the OO one is preferred
as you only need to change the new sub-classes.
If you add new methods more frequently and the class hierarchy kept stable,
then the above one is preferred as for new methods you only need to add that
in the trait body once.
(这种traint内部pattern 去match实际还没有定义出来的subclass实际上说明了
method 和class的分离性。method依附object/class的主要目的是polymorphism。
method/function本身是可以独立存在的(C语言),所以这种定义method的方式perfectly fine
)
--Exercise----
Add case classes Var for variable x and Prod for products x * y as
discusses previously.
Change your show function so that it also deals products.
Pay attention you get operator precedence right but use as few
parentheses as possible.
Example :
Sum ( Prod(2,Var("x")), Var("y"))
should print "2 * x + y ".
Prod( Sum(2,Var("x")), Var("y"))
should print "(2 + x) * y "
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment