Skip to content

Instantly share code, notes, and snippets.

@lenards
Last active October 1, 2022 23:32
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save lenards/8aa8fb2e81c67971558c to your computer and use it in GitHub Desktop.
Save lenards/8aa8fb2e81c67971558c to your computer and use it in GitHub Desktop.
A short introduction to Scala syntax and operations reworked and heavily borrowed from Holden Karau's "Scala Crash Course"

Just Enough Scala

(a moderately, well, shameless rework of Holden Karau's "Scala - Crash Course")

Scala is a multi-paradigm high-level language for the JVM.

It offers the ability to use both Object-oriented & Functional approaches.

Scala is statically typed. Type inference eliminates the need for more explicit type declarations. It is intended to work with and along side of Java code. It has the ability to use any Java class (and inherit from it, etc). It can be called from Java classes.

What we need

To follow examples in Spark, it helps if you have an understanding of the following concepts:

  • variables
  • functions
  • closures
  • Scala Collections API
  • tuples & case classes

We can explore Scala interactively through sbt console or dse spark.

We'll referred to this as the REPL (which stands for Read-Evaluate-Print-Loop).

Variables

In Java, defining a primitive int and immutable, read-only reference to a String looks like:

  int x = 7;

  final String y = "hello"

In Scala, we'd do the following:

scala> var x: Int = 7
x: Int = 7

scala> val y: String = "hello"
y: String = hello

As we mentioned, we don't need to always declare the types - they can be inferred:

scala> var x = 7
x: Int = 7

scala> val y = "hello"
y: String = hello

Functions

In Java, we might define a static function that squares a primitive integer like so:

...
    public static int square(int x) {
        return x*x;
    }
...

In Scala, named one line functions can be simply defined as:

scala> def square(x: Int): Int = x*x
square: (x: Int)Int

scala> square(3)
res0: Int = 9

And we can use a code block to define the body of the function too:

scala> def square(x: Int): Int = {
     |   x*x
     | }
square: (x: Int)Int

scala> square(3)
res1: Int = 9

The pipes, |, are added by the REPL to use that definition is continuing on the next line.

We might find it handy to print out variables to text, and maybe even define a function to help.

In Java, we'd do:

...
    void announce(String text) {
        System.out.println(text);
    }
...

In Scala, we'd do:

def announce(text: String) = {
    println(text)
}

Closures

With Scala, we can define closures (or, we might also want to call them anonymous or lambda functions).

We can do so with varying levels of ceremony, depending on type inference to help reduce declarations...

(x: Int) => x + 2 // full version, indicating the argument type

x => x + 2 // droping the type, it can determined via type inference

_ + 2 // use 'placeholder' syntax, we can remove declaration 'x'

x => { // just like functions, the body can be a block of code
    val numberToAdd = 2
    x + numberToAdd
}

Note: Similar to some scripting languages, the evaluation of last line of a block of code will be returned.

These closures are not far from regular functions. They're just missing the defined names:

x => { // just like functions, the body can be a block of code
    val numberToAdd = 2
    x + numberToAdd
}

def addTwo(x: Int): Int = {
    val numberToAdd = 2
    x + numberToAdd
}

Scala Collections API

The Scala Collections API was partly the inspiration for the API used by Spark. It helps to be familiar with the operations you'll find in Scala Collections (like: foreach, map, filter, reduce, etc). For a more thorough listing of functions, see Seq

scala> val lst = List(1, 2, 3)
lst: List[Int] = List(1, 2, 3)

scala> lst.foreach(x => println(x))
1
2
3

scala> lst.foreach(println)
1
2
3

scala> lst.map(x => x + 2)
res5: List[Int] = List(3, 4, 5)

scala> lst.map(_ + 2)
res6: List[Int] = List(3, 4, 5)

scala> lst.filter(x => x % 2 == 1)
res7: List[Int] = List(1, 3)

scala> lst.filter(_ % 2 == 1)
res8: List[Int] = List(1, 3)

scala> lst.reduce((x, y) => x + y)
res9: Int = 6

scala> lst.reduce(_ + _)
res10: Int = 6

scala> lst
res11: List[Int] = List(1, 2, 3)

Note: a new result was shown with each operation, but lst was not modified. All of these operations leave the list unchanged.

Tuples & Case Classes

Scala provides a simple tuple definition: (val1, val2)

A case class in Scala as similar to a namedtuple in Python or a Java Bean or Plain Old Java Object (POJO). It offers a simple definition and gives an equality method and a formatted toString.

scala> val points = List((0.1,1.3),(1.1,2.2),(0.1,1.2))
points: List[(Double, Double)] = List((0.1,1.3), (1.1,2.2), (0.1,1.2))

scala> case class Point(x: Double, y: Double)
defined class Point

scala> val pt1 = Point(points(0)._1, points(0)._2)
pt1: Point = Point(0.1,1.3)

This shows that you can refer to fields in a tuple with ordinal numberings (x._1 or x._2).

If we have a z-coordinator tuple, referring to the third value would be:

scala> val t = (0.1,1.3,4.2)
t: (Double, Double, Double) = (0.1,1.3,4.2)

scala> t._3
res2: Double = 4.2
@lenards
Copy link
Author

lenards commented Jan 27, 2015

Notes to self ...

I think I want to add some coverage of Option and Some to avoid nulls in collections as that comes up in Spark usage as well.

It might be nice to just have the running conversation in the REPL, you can include comments in the REPL without issues and that would give a narrative.

@lenards
Copy link
Author

lenards commented Jan 27, 2015

It would be nice to have an example of .map(<blah>).flatten and .flatMap(<blah>).

Also might be nice to take a string, split it, and reduceByKey for a count and lo-fi word count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment