agocorona/transient-url.md

## transient-url.md

      
    Raw
  

              transient-url.md
            
          
    This is a tutorial intended to teach how to invoke any part of a Transient-universe program using HTTP requests. It is also very useful to understand the mechanism of serialization and remote execution of distributed transient programs.
This is not the REST API that is also included in transient-universe. This API is shown in examples like  api.hs which Is undocumented but I hope, may be self-explaining.
Note this is the api.hs version for the "new" branch which is being detailed here.
Transient is a library for the language Haskell that allows high-level effects like parallelism, concurrency, asynchronicity, streaming and distributed computing and manage them without special constructions.
To run the examples you need to install transient-universe:
cabal install transient-universe

With the new version of transient, any component of a transient program that uses the cloud monad can be invoked by an HTTP request.
As an example, this program:
main= keep $ initNode $ inputNodes <|> do
  ....
  ....
(NOTE: add complete examples with includes, compilation, etc)
That program initializes a server node and lets the user to input hosts and ports of other nodes by entering them in the command line or as interactive console input. It is the standard way to initialize a node:
> runghc program.hs -p start/localhost/8000/add/localhost/3000/n

That initializes the node at  localhost:3000 (read by initNode) and adds a node  localhost:3000 to the list of nodes know (this latter is processed by inputNodes).
In the interactive mode the parameters are entered as menu options with a guided dialog:
>runghc program.hs
...
Enter  start            to: re/start node
...
>start

option: start
hostname of this node. (Must be reachable)? > locahost
"locahost"
port to listen? > 8000
8000
Connected to port: 8000
Enter  list             to: list nodes
Enter  add              to: add a new node
option: add
Hostname of the node (none): > localhost
"localhost"
port? > 3000
3000
services? ([]) > n

connect to the node to interchange node lists? (n) "n"
Added node: ("localhost",3000,[])

Now, as a free functionality without additional code, it is possible to tell the program to add a node localhost:3001 by invoking it remotely with a HTTP request, with this URL:
> curl 'http://localhost:8000/0/1/e/f/()/w/"add"/"localhost"/3001/[]/[]/"n"/' --globoff

(Note: the final form of this request may vary)
it returns the following message by HTTP:
SMore/1/100103000/("localhost",3001,[])/()/e/e/

And print the following in the terminal of the node:
Added node: ("localhost",3001,[])

Limitations as is now:

No authentication
No proper HTTP responses. Just raw messages.
A undetermined number of response messages can be received, no total length and no chunked encoding as is now.

To inhabilitate HTTP request in a branch of the computation or in the whole program you can use noHTTP:
main= keep $ initNode $ this <|> that

that= do
   noHTTP
   ...

In this program, this executes until it finds an asynchronous operation like abduce, async, react, spawn, parallel, waitEvents or empty etc. Then it executes the second term that.
All HTTP request to execute that branch will receive a 403 message.
This tutorial is about how to obtain such URLs and about how and why they work. It will also help a lot in gaining insights about how the closure serialization and remote execution in transient-universe works.
Suppose that I have this  program:
import Transient.Base
import Transient.Move
import Transient.Move.Utils

main = keep $ initNode $ 
  localIO $ putStrLn "hello world"
This program initializes a server but also is a console application. When you run it it produces:
>runghc program.hs -p start/localhost:8000
...
...
option: start
hostname of this node. (Must be reachable)? "localhost"
port to listen? 8000
Connected to port: 8000
hello world

But we can make the program to begin execution at any place in the program if it is invoked with an HTTP request with the appropriate URL. To discover the URL which does so, we insert  showURL in the place that we want to call:
main = keep $ initNode $ do
  showURL 
  localIO $ putStrLn "hello world"
Now the program will reveal the URL:
>program -p start/localhost:8000
... (additional output..)
option: start
hostname of this node. (Must be reachable)? "localhost"
port to listen? 8000
Connected to port: 8000
'http://localhost:8000/0/0/e/'
hello world

before "hello world", the program print a URL. If you invoke it using curl:
> curl http://localhost:8000/0/0/e/

you will see that the program executes again and print in the terminal:
'http://localhost:8000/0/0/e/'
hello world

The effect is as if we have executed the continuation from the location of showURL on.
You also will see that curl receives no response and stay waiting for something. That response will never appear with this program.
To receive something in response from the program, you need to tell the program to do so.  Transient has teleport which transport the closure which the program is executing to the node which is connected, in this case, the program that invoked the URL.
If we add teleport:
main = keep $ initNode $ do
  showURL
  localIO $ putStrLn "hello world"
  teleport
if we invoke curl against this program we will see a response:
> curl http://localhost:8000/0/0/e/
SMore/0/10002000/e/()/e/

What is that? Like the URL entered, the response is the serialization of the closure which ran in the server. In that response, the program says the following:

SMore: there may be more responses
0/ : I send to destination 0 in the calling program
10002000/ :  identifier of the teleport which made the response, which is also a place in the program, a closure.
e/ : I executed what is inside of the first statement (which is initNode)
showURL is executed but has no trace
()/ : is the result of  putStrLn "hello world"
e/ :  I executed the next statement (the teleport)

In the same way, the calling url contains /0/0/e/  which means: I call you from 0, I call the closure 0, that is, the beginning of the program and I tell you e/ which means execute what is inside of the first method (initNode). And that is what the program executes.
All lines which use local or localIO are serialized in the response. This program:
main = keep $ initNode $ do
  showURL
  localIO $ putStrLn "hello world"
  local $ return (42 :: Int)
  teleport
produces this response when invoked by curl:
> curl http://localhost:8000/0/0/e/
SMore/0/10002000/e/()/42/e/

Do you see the difference?
however, if I change the second 0 and put any other number. For example, 1:
>curl http://localhost:8000/0/1/e/
SMore/1/20002000/()/42/e/

the output include only the segment of the path that has been generated by the server ()/42/e/.
The reason is that the response is now sent to a closure 1 that is not 0 (the beginning of the calling program, since a transient program assumes that it is talking to another transient node) The node assumes that the caller need only what is new. That is to avoid unnecessary repetitions.
This propram:
main = keep $ initNode $ do
  showURL
  localIO $ putStrLn "hello world"
  local $ choose [1..3]
  teleport
when executed, it produces three responses:
>curl http://localhost:8000/0/1/e/
SMore/0/10002000/1/e/
SMore/0/10002000/2/e/
SMore/0/10002000/3/e/

corresponding with the three values returned by choose.. It is possible to receive a finite or infinite stream.
A program can be invoked with different URLs for executing different things. This program:
main = keep $ initNode $ hi "hello" <|> hi "world"
  where
  hi text= do
     showURL
     localIO $ putStrLn text
     teleport 
Produces this output in the console:
> program -p start/localhost:8000
...
Connected to port: 8000
'http://localhost:8000/0/0/e/'
hello
'http://localhost:8000/0/0/e/()/w/'
world 

Now there are two URLs. That is because <|> is the alternative operation in Haskell. when the first term "hi "hello" return nothing, the second is executed. Since teleport send the data to the remote caller and stop, the other term is executed. The second URL is a bit longer since it contains the execution of two extra lines of code: () of the first print "hello" and w for the second teleport.
> curl http://localhost:8000/0/0/e/
SMore/0/10002000/e/()/e/
SMore/0/20102000/e/()/w/()/e/

Now there are two responses since the two teleports are executed. Notice that they have different closure identifiers. The program display in the console the same output again.
If we execute the second URL:
curl 'http://localhost:8000/0/0/e/()/w/'
SMore/0/20102000/e/()/w/()/e/

Only one response is produced, as expected since we are addressing the second term. Now we need to quote it since curl does not like '()' in the URL.
To execute the first hi "hello" and not to execute the second, we have to instruct teleportnot to let execute alternative terms. this involves some tweaking that is not worth to mention for this tutorial since teleport is not used normally by the programmer. Usually, higher-level primitives that use teleport are used like atRemote or runAt. They are a computation between two teleports; The first teleport transport the computation to the remote node and the second transport the results back to the caller.
We will see this shortly after seeing after this example for managing console input:
main = keep $ initNode $ do
  local $ option "r" "run"
  showURL
  localIO $ putStrLn "hello world"
  local $ return (42 ::Int)
  teleport
> program -p start/localhost:8000
Connected to port: 8000
Enter  r                to: run
r
"r"

option: r
'http://localhost:8000/0/0/e/"r"/'
hello world 

option wait for "r" in the console input.  Once it is entered, since showURL is after it, the URL that is displayed includes "r" in the URL. By invoking it:
> curl'http://localhost:8000/0/0/e/"r"/'
SMore/0/30002000/e/"r"/()/42/e/

The URL will pass over option and execute from showURL on.
main = keep $ initNode $ do
    local $ option "h" "hello"
    atRemote $ do
       showURL
       localIO $ putStrLn "hello world"
Once this program is initiated (as usual) and "h" is entered, atRemote executes locally, since the node is not connected with any other (is connected with himself) .  showURL is within atRemote. If the URL printed is invoked then:
> curl 'http://localhost:8000/0/0/e/"h"/e/()/e/'
SMore/0/30004000/e/"h"/e/()/e/()/e/

A response is produced. The reason is that now there is a remote connection and the final teleport within atRemote sends the response back.
main = keep $ initNode $ hi "hello" <|> hi  "world"
  where
  hi text= atRemote $ do
     showURL
     localIO $ putStrLn text
      
produces:
> program -p start/localhost/8000
...
Connected to port: 8000
'http://localhost:8000/0/0/e/e/()/e/'
hello
'http://localhost:8000/0/0/e/w/e/()/e/'
world

if we introduce the first URL:
> curl 'http://localhost:8000/0/0/e/e/()/e/'
SMore/0/20004000/e/e/()/e/()/e/

The alternative element is not executed since the tweak mentioned early for teleport is present in  atRemote
And now something more fun is coming; This program
main = keep $ initNode $ inputNodes <|> hi
  where
  hi = do
        showURL
        localIO $ putStrLn "hello"
        let x= "hello "
        teleport
        showURL 
        localIO $ print $ x ++ "world"
        teleport
prints the following when initialized:
> runghc program.hs -p start/localhost/8000
...
Connected to port: 8000
Enter  list             to: list nodes
Enter  add              to: add a new node
'http://localhost:8000/0/0/e/f/w/'
hello
'http://localhost:8000/0/0/e/f/w/()/()/'
"hello world" 

It has two teleports, one after another. Locally, teleport
transfer his closure to himself and continue executing. That is the reason why all the program is executed.
If we invoke remotely the first:
> curl 'http://localhost:8000/0/1/e/f/w/'
SMore/1/20102000/()/e/

The program display in the console:
'http://localhost:8000/0/1/e/f/w/'
hello

It stop at the first teleport since it is a remote invocation.
We can continue from that first teleport by calling with his teleport identifier 20102000:
>curl 'http://localhost:8000/20102000/1/'

Will print in the console of the node:
'http://localhost:8000/0/0/()/()/'
"hello world"

The closure invoked includes the variable x already instantiated with "hello ". That closure with his teleport becomes a new endpoint.
This works now as long as both request share the same connection. If the connection terminates, the closure is garbage collected. Currently a transient node closes the connection after one-three minutes depending on the version. In fact this example makes use of a bug of transient which think that calling nodes reuse connections, but curl does not do so among different invocations. That is the reason why the second invocation receives no response.
Any program can be invoked via HTTP. This program uses the runAt primitive, which allows a transient program to call a copy of him in another network address and return the result back.
main = keep $ initNode $ inputNodes <|>  do
    local $ option "r" "run"
    i <- atOtherNode $ do 
       showURL
       localIO $ print "hello"
       i <- local $ threads 0 $ choose[1:: Int ..]
       localIO $ threadDelay 1000000
       return i
    localIO $ print i
   where
   atOtherNode doit= do
     node <- local $ do
           nodes <-  getNodes
           guard $ length nodes > 1
           return $ nodes !! 1
     runAt node  doit
nodes contains the list of know node. The first node is the localnode, if there is another second node, the program call it with runAt and return a stream of numbers, done by a single thread; the current one. since choose apply as much parallelism as it can with as much thread as it can, guess what would happen if we don't limit the threads available for this infinite stream.
Now we execute it in two different consoles:
The first node start:
runghc program.hs -p start/8000

The second node is started and let it know the existence of the first:
runghc program.hs -p start/3000/add/localhost/8000/n

Then in the console of this second we enter the "r" option and the remote node return a stream of increasing numbers
the first display the URL for the  invocation, since showURL is included:
...
Connected to port: 8000
Enter  list             to: list nodes
Enter  r                to: run
Enter  add              to: add a new node
'http://localhost:8000/0/30104000/e/f/w/"r"/("localhost",8000,[])/e/e/e/e/'
"hello"

In the second, a stream of values is presented. one every second:
...
option: add
Hostname of the node (none): "localhost"
port? 8000
services? ([]) 
connect to the node to interchange node lists? (n) "n"
Added node: ("localhost",8000,[])
> r

option: r
1
2
3
4
5
6
7
...

If we invoke the URL, we will see the messages that the remote node at port 8000 send to the calling node:
> curl 'http://localhost:8000/0/30104000/e/f/w/"r"/("localhost",8000,[])/e/e/e/e/' --globoff
SMore/30104000/60106000/()/1/()/e/
SMore/30104000/60106000/()/2/()/e/
SMore/30104000/60106000/()/3/()/e/
SMore/30104000/60106000/()/4/()/e/
SMore/30104000/60106000/()/5/()/e/
SMore/30104000/60106000/()/6/()/e/
SMore/30104000/60106000/()/7/()/e/
....

Serialization  can be defined by the user so if your data is serlialized with the Loggable class in the module Transient.Logged. (To be detailed)