xianny/Scala Days Berlin 2016.md

## Scala Days Berlin 2016.md

      
    Raw
  

              Scala Days Berlin 2016.md
            
          
    Scaladays

Scaladays Keynotes

Scala 2.12 is the new big thing! It has 33 near features, but more importantly it contains optimizations for Java 8 using Java 8 language features such as lambdas and default methods, while still being backwards compatible with Java 6.
Scala 2.13 more focused on improving scala’s built-in libraries, including the collections library. Particularly, it will become more inline with spark collections and more have more robust lazy collection support. There are also plans to separate Scala into a core and platform module.
Neat things happening with Scala:

Scala.js
Scala native
Dotty compiler

Dotty Compiler

A lot of effort has gone into creating DOT (Dependant Object Types) and a DOT calculus. With DOT calculus, formal statements can be made and proven and therefore be used to reason about the correctness of certain language features. The rest of the language can be encoded in it.
Dotty is double the speed and half the size of the current Scala compiler! It uses TASTY (typed ASTs) as an intermediate representation, which allows more compiler insights and optimizations.
Things that will be removed:

procedure syntax
macros
early initializers (in favor of traits parameters)
existential types
general type projection (but class projection stays, C#T)

New features:

Intersection types (T & U) replaces with
-- Unlike with, it is commutative!
Union types (T | U) avoids huge lubs
functional arity adaptation
trait parameters
static methods and fields (mainly for Java interop)
non-blocking lazy vals (Originally, they would lock object during the evaluation of a lazy val, which can cause deadlocking. Instead, lazy vals will be made thread local. The @volatile tag can be used if we want thread-safe lazy vals)
multiversal equality, which compile time type checking
Named typed parameters
Scala.meta
-- Scala.meta will be a more principled approach to metaprogramming and macros
-- eliminates a lot of boilerplate
an effect system (side effect checking using implicits) (A new model of doing side effects by passing implicit parameters)
null safety - model nullability as | to make it pure
generic programming that can abstract over arity, basically faster shapeless
better record types - also shapeless inspired and implemented via hashmaps
@infix to denote infix operators, such as min. Everything else will be required to use the dot notation.

Scala is functional, strict, and pragmatic.
Scala is a great foundation for implementing cool new PL advancements
Scaladex

A neat new way to discover new libraries! Try it out.
Precise types brings performance

Building fast Scala code

In order to build fast Scala code, we need to find out why code is slow. Scala code is slow for two reasons:

the libraries are inefficient
user code misuse an efficient libraries in inefficient ways

An example:
Scala average implementation is much slower than Java.
Java is just 3 checks per iteration.
Scala implemented in terms of foreach, working over boxed objects.
Boxing is very expensive, a single boxing (allocation) costs as much as 5 dynamic dispatches, 15 static dispatches, and 20 additions.
But this is because Scala has a lot of useful features to help programmers be productive, but JVM doesn't support a bunch of them. The compiler is forced to do conservative lowering to the most common object to make sure the code still runs correctly, and this causes Scala to run slower.
What are some alternatives?

Cloned methods

An alternative is having different methods for each primitive type.
BUT Scala has 9 primitive types. With n type parameters, we would have 10^(n+m) specializations!
What if we specialized for a subset?

We can use the @specialized annotation to mark arguments to be specialized.
Another alternative is @miboxed, which includes some specialization but it has substantial slowdowns for arrays.
However, all specialization breaks modularity, because some choices have to be made about what to specialize. This requires people to know in advance what has been specialized in order to write performant code!
Auto-specialization in Dotty linker

Instead, we can analyzes how the code is used and specialize specifically for that code based!
Performance becomes closer to Java, with a hit in compilation time.
Limitations to Approach


Slows down compilation
Requires dependencies to have TASTY
Does not help if library does tricks e.g
-- depends on boxing behaviour,
-- uses null as a special value
-- has a typed API but internals are completely untyped (casting everything to Any then recast back)

User code optimizations

We can also perform user code optimizations by replacing inefficient code with efficient implementations.
In an ideal world, we would be able to code without worrying about efficiency immediately, or choose more maintainable implementations without taking a performance hit. (e.g. using a global reduce on a collection)
We need library specific optimizations:
For example:

Replace linear time x.size with x.isEmpty
Rewrite any division by a power of two by a bit shift
Merge multiple filters into a single filter

There are naturally some complications, and we can’t make optimizations blindly. For example, we can’t necessarily join two filters because there may be side-effects in the filters.
The linker has to check if the functions are PURE, in the sense that there are no OBSERVABLE side-effects.
Ideally, we would also have custom warnings and error messages for non-sensical code, such as:
collection.toPar.reduceLeft
Scala.js Track

Scala.js and beyond

Let’s not write things twice when we don’t have to!
Scala.js can be used easily with an sbt workflow: sbt fastOptJS produces the javascript files and the sourcemaps.
Scala.js provides us with everything we need to work with front-end:

Dom manipulation
Type safe CSS and HTML
Type safe client server interactions!

It can be used with wrappers for existing client libraries such as React and angular
But is Scala.js all or nothing? What about mixed teams? Should front-end teams learn Scala?
A mixed approach

Instead, we can create services that compile to JS that can be consumed by the front-end team. The services can have type-safe AJAX calls.
Akka.js: Drop your actors outside the JVM boundaries

How to use actor model within Scala.js
Why?


code re-use
code portability
high modularity
same programming model everywhere
transparent communication (between platforms)
concurrency management

https://github.com/typesafehub/akka-js
Scala.js internals

The Scala.js compiler has three parts:

A compiler plugin (around 5000 loc): transforms the scalac tree into an intermediate representation
Emitter: Takes the IR and generates JavaScript
Optimizer (around 4418 loc)

Optimizer

The optimizer is the most interesting!
Usually, optimizers are applied until it reaches a fixed point. However, the scala.js optimizer is single-pass.
As a result, it has to be a little clever to not miss optimizations.
Most optimizations go through a pre-transform phrase, which is a virtualization of the transformations.
It attempts to optimize, and if there are failures then it rolls back and attempts the next best optimization.
Backtracking is enabled by using CPS (continuation passing style),
Some improvements include optimizing multiplication to binary shift and tuple destructuring.
Operator always works the same but built in methods you have to verify whether or not they've been overwritten.
Perfect scalability


What is scalability?
What characterizes a scalable architecture and design?
What characterizes a scalable architecture and design?
What is perfect scalability?

What characterizes a scalable architecture and design?

Performance and scalability are related, but not the same. Increasing scalability means increasing the number of requests handled. Increasing performance means processing the same load in less time, but this does not necessarily increase the number of requests handled.
What is perfect scalability?

When you add more resources, we can handle higher load with a linear relation.
What characterizes a scalable architecture and design?


No state
No contention (i.e. share nothing)
Independent computations

BUT this is still flawed. If we have a single HTTP namespace, which might mean a single hardware node balancer, and we might eventually hit network issues. The bottleneck is the shared hardware.
We can fix this by breaking up the namespace, that is, by sharing less!
With Amdahl’s law and Gunther’s law, we know there are fundamental lower limits to a design. In fact, additional resources will result in a decrease in ability to handle the load not an increase (coordination time keeps increasing and increasing). The best we can do is prevent it from becoming a negative return, and try to aim for no return instead.
Enemies of scaling


Contention/Shared resources
-- We can avoid sharing using eventual consistency event sourcing CQRS, and only keep private state. The state will be updated using the delta from domain events.
-- However, this requires communication and therefore creating overhead.
Communication is another enemy of scalability!
--See it as a cost. We should limit communication, especially point to point communication, which is a form of coupling.
Ordering
-- sequence leads to shared state, and leads to contention
-- stay commutative
Linear time sequences
-- Use fsm and single-use actors to avoid linear processing
-- Communication between services must be a sync and non-blocking

Designing for perfect scalability must be done upfront.
Build services designed to adhere to these principles.
What does scalable architecture look like


Elastic
-- Elasticity allows reduction in cost when possible
-- Spike Load is not solved by being scalable - you system needs to be elastic and predictive
Command sourcing
-- Command is a request that can fail
-- Event is something that already happened
-- If we persist commands and handle them asynchronously, our load can spike above and we can always queue/retry
Degrade gracefully
-- "An escalator can never break it can only become stairs"
Microservices
Simple is good
-- Simple patterns, consistently applied are easier to scale
No Global "now"
-- Causal ordering is better than clock time ordering
-- Worst kind of co-ordination is temporal
Persistence is (not) futile
-- Often systems have too much persistence
-- Only required when you need to recover!!!
Don't share databases!
-- If your services share a database, your database is a monolith
Distributed transactions are anti pattern. Don’t stop the world!
Idempotent avoids the need for sequence to some degree because it helps events to be handled in any order
-- It also avoids the need to persistence because we can reprocess using the original command

Monitoring

Monitoring is SUPER IMPORTANT, the log is not enough.
However, monitoring is also a cost, so we must be prudent.
Conclusion

Perfect scalability is achievable -- but not without design. Avoid the enemies of scalability, and find patterns that don’t use the enemies of scalability.
We also must monitor and adjust during runtime.
Databases

Quill

Compile-time language-integrated queries
Functions converted to AST converted to normalised SQL query string. Normalisation happens in the query (using the quote { … } function so running each query string is as performative as running vanilla SQL.
You can abstract over predicates and pass parameters at runtime.
Slick - Polymorphic record types in lifted embedding

Lifted embedding builds a Slick AST that reifies the computation. The AST is then compiled into SQL.
Toy slick implementation: szeiger/slick/tree/toy-slick-scaladays2016
What’s in the name?

Lifted: every type T are lifted in Rep[T]
Embedding: because it's embedded in Scala
Detour: functional dependencies (between type parameters)

Given some type parameters, we can do an implicit search to resolve another unknown type, given unambiguity.
CanBuildFrom uses functional dependencies, as it is uniquely defined by its first two type parameters.
Reactors - The road to composable distributed computing

What makes a programming model good?


It should be comprehensible
-- X86 for example is not
But it should also be consise
-- lambda calculus which is turing complete, but made up of very few rules. However, it is not concise and still hard to read.

Programs written must be quick and easy to understand
For a distributed, location transparency is very important, i.e. it will run correctly no matter where it's deployed and regardless of its relationship with other nodes.
The actor model does this very well...but there is a problem.
The actor system is not composable.
The receive blocks of each actor will override the other. We can try using multiple receives, but messages intended for one receive may be prematurely caught by another.
Solution: Reactors

Scala Native

Working with the JVM can be hard.

Making benchmarks is hard because of the warm up time
It can be TOO safe -- if you want to manipulate RAM then it becomes really really hard to use.
interop with anything that is not on the JVM is hard.

The dream


The code runs immediately without a warmup period
Lower-level data structures such as structs allocated to stach
More control with memory management
Easier calls to other languages

Waking up

Imagine rewriting something in C++ to Scala, which uses vectors. But suddenly it becomes MUCH slower and consumes memory like crazy because vectors exert extreme memory pressure.
However if we had structs, we would no longer allocate it on heap and it would be much faster because there would be no need to GC.
How does Scala Native work under the hood?

It is a LLVM-inspired compiler and produces native binaries
It can optimise tail calls, even mutual tail calls
Some questions:

Is it the same language? Yes, mostly. Except with extra low-level primitives.
Is it just a back end? Not quite.
Will it use GC? Yes for now. A slower one than JVM GC but it can only get better.
-- Of course, don't need GC all the time if you use lower level constructs.
Hardware support? 64-bit Intel.
Libraries? All java libraries have been ported to make them work and make it the least surprising experience.
When? Developer preview in the near future.
How can you profile GC? GC is just a library so you can just profile it as you would normally.

Meta-programming Track

Scalan: A reasonably typed meta-programming framework (Alexander Slesarenko)

Meta-programming: takes programs as input, gives programs as output
How Scalan can help?

DAG-based intermediate representation


unified and extensible data structure
immutable and higher order
configurable visualization (Graphiz)

Composable and generic representation for


functions (lambdas) as DAGs
virtualized user-defined types
domain-specific isopmorphisms
domain-specific converters

You can really define your own types as first-class citizens. not just prettyfying
Basic Machinery: Code Virtualization


based on standard Scala compiler
systematic transformation to make code abstract


Process an AST of a source code
Associate nodes of the AST with calls of a virtualised API
Generate virtualised code containing only the calls of the virtualised API

Basic Machinery: Standard Evaluation/Staged Evaluation

Instead of making method calls to data values, make calls to symbols (nodes of the graph) that are mapped to some data (or anything?!)
Staged evaluation can be understood as a self-reproducing process
When program is stage evaluated, it reproduces itself in a graph-based IR
Metaprogramming 2.0 (Eugene Burmako)

What is Scala.meta?


Vendor neutral AST (tree interchange format), not Intellij PSI trees, nor Scala internal AST.
Trees are designed such that no syntactic details, such as formatting and comments, will get left out
Contains a new abstraction: tokens, which represent elementary parts of Scala’s grammar, such as white spaces/comments/identifiers. Each token has a bunch of associated metadata, such as the location on the line.

It’s very easy to use:
import Scala.meta._
"x + y".parse[Term]

Why should you care?

Because macros are bad.

they have a lot of boilerplate
Intellij needs out of band support from compiler to support macros
macro code changes won’t triggen an sbt recompile

BUT they still enable unique functionality that is leveraged by many library authors. So we can’t just get rid of it!
What will happen next?

A new future for macros!
With a lot of thought and work, it was found that complexity with macros was largely incidental. There are two orthogonal concepts in the essence of macros:

Meta programming at compile time
Inlining code at a call site


Scala.meta will have a lot less boilerplate
Scala IntelliJ plugin for macros
-- with in-editor expansion of macros
-- works by converting PSI trees into Scala.meta trees, and thrown through Scala.meta to expand it.

Macros are not going away!

Will be replaced with a better version based on Scala.meta
Easier to write with better IDE support
Other cool things that can use/use Scala.meta


Dotty linker can use Scala.meta to implement rewrite rules
Codacy (static code analysis tool for github/bitbucket) uses the output of Scala.meta parser, since it produces a very precise model of the Scala code.
Scalafmt (code formatter for Scala)
-- Works by breaking up a line into tokens and inserting/deleting things are needed

Principles of Elegance - Jon Pretty, @propensive

"Combining simplicity, power, and a certain ineffable grace of design"
Intuitive, readable code
Typesafety - robustness and confidence
Often typesafety and readability don't combine well
How is Scala different from other languages e.g. Haskell, Python


Scala is more complex, has more features
transcends different user abilities; can be used quickly by beginners and also can do pretty advanced stuff
inventiveness is encouraged

Principle 1: Keep public APIs small


easier to learn, use, maintain, understand
easier to compose
avoid polluting public APIs with types and terms not intended for end-users; hide details if it's not relevant, keep internals internal
expect users to want to use wildcard imports; make sure everything in your package is relevant
use fewer, generic methods instead of many specific methods

How to keep public APIs small?


make use of private and protected modifiers
use typeclasses instead of overloading
nest things more deeply

Principle 2: Name wisely


classes, types, values, methods all need names
name should communicate something
consider: does the method implement a familiar concept?
short names are good for pervasive types/methods
if in doubt, use longer names
especially important for implicits to avoid accidental shadowing from duplicate names
see Haoyi Li's blog post Conciseness and Names

Principle 3: Embrace the type system


empowers us with constraints
types give us confidence to reason about our code
avoid primitive types like Int and String; use types to express semantic ideas
avoid structural types like Option, Either, tuples (use case class instead)
really low bar to introducing new types
promote values to types where possible; let the compiler help you!

Principle 4: Consider user experience


use many of the same principles that apply to UX design
your users are all programmers
users have different abilities and expectations
take advantage of familiar expectations (use empathy)
but educate where necessary (pre-empt common misunderstandings, explain if you are going against standard practice)
keep boilerplate minimal
each line should be significant and meaningful
casual users should be able to understand what code does

API useability heuristics - from UX design heuristics


draw parallels with the real world, try to associate real things with your code
consistency and standards
error prevention
recognition, not recall (recognise what a method does by its signature and shape, not by its name)
flexibility and efficiency of use
aesthetic and minimalist design

Use site-first design


optimize for the use site
write some sample code you would like to compile
then try to write the definitions to make your samples compile
only compromise on sample code when all possibilities in the library layer have been exhausted
it's like test-first... but different

Implicits Inspected and Explained


OO: is a and has a
implicits: is viewable as - a different use for the object
examples: implicit execution context in Futures, implicit sender in Akka

Microservices - Vaughn Vernon

ACID everywhere - why?


consistency is overrated
businesses used to run OK without ACID
ask: "how much time would be acceptable to allow between the consistency of this data or the consistency of this other data?"
eventual consistency: happens organically, usually matches business needs

Typelevel - Miles Sabin

Typelevel is...

a community of projects and individuals organised around

pure, typeful, functional programming in Scala
independent, free/libre and open source software; open, accessible, model for best practices
a desire to share ideas and code; recognise that people want to use software with other people, other projects
accessible and idiomatic learning resources
an inclusive, welcoming, and safe environment

http://typelevel.org
http://github.com/typelevel/general
http://gitter.im/typelevel/general