nat-418/why-tcl.md

## why-tcl.md

      
    Raw
  

              why-tcl.md
            
          
    Why Tcl?

Introduction

I use Tcl as my scripting language of choice, and recently
someone asked me why. This article is an attempt to answer that question.
Ousterhout's dichotomy claims that there are two general categories
of programming languages:

low-level, statically-typed systems languages
high-level, dynamic scripting languages

Systems languages are best for efficiently handling large quantities of data,
implementing algorithims, and can handle significant internal complexity.
Scripting languages are best for glueing other programs together,
exploring a problem, and extending applications. Ousterhout designed Tcl with
this dichotomy in mind:

Thus I designed Tcl to make it really easy to drop down into C or C++
when you come across tasks that make more sense in a lower-level
language. This way Tcl doesn't have to solve all of the world's
problems. Stallman appears to prefer an approach where a single
language is used for everything, but I don't know of a successful
instance of this approach. Even Emacs uses substantial amounts of
C internally, no?

Tcl is a very simple language that you can learn quickly and keep in your head.
It is a mature platform, with a comprehensive standard library.
Unlike some other scripting languages, Tcl is stable and does not require
complex virtual environment or dependency management. To get started, all you
need is the interpreter tclsh and the standard library tcllib.
I reach for Tcl whenever I have a shell script that gets more than trivially
complex, because shell scripts are brittle: not only is the syntax tricky—I
always use shellcheck to make sure I catch all the necessary
double-quotes, for example—but many of the basic tools used in shell
scripting are subtly different from machine to machine—even a simple call
to echo can break. By contrast, Tcl programs will work across Linux, BSD,
macOS, and Windows.
I also like to use Tcl to prototype and test ideas. Hal Abelson called Tcl
"Lisp without a brain", and I find the strange blend of Unix and Lisp ideas
in Tcl enjoyable to work with. Tcl is command-oriented and generally
procedural, but can accomodate object-oriented, functional,
and metaprogramming.
Example programs

Recently I have been working through The Go Programming Language
by Donovan and & Kernighan, and thought I would give a few examples of
how Tcl compares to Go, Python, and shell scripts. Each example will
begin with the Go code from the book end with a performance benchmark
made using hyperfine.
Echo command-line input

To begin with, let's simply take command-line input and print it out.
Go

package main

import (
	"fmt"
	"os"
)

func main() {
	var s, sep string
	for i := 1; i < len(os.Args); i++ {
		s += sep + os.Args[i]
		sep = " "
	}
	fmt.Println(s)
}
Python

#!/usr/bin/env python

import sys

output = " ".join(sys.argv[1:])

print(output)
Tcl

#!/usr/bin/env tclsh

puts $argv
Benchmark


Command
Mean [ms]
Min [ms]
Max [ms]
Relative


./echo "Hello, world!"
0.5 ± 0.1
0.4
1.0
1.00


echo "Hello, world!"
1.0 ± 0.1
0.8
1.6
1.88 ± 0.35


tclsh ./echo.tcl "Hello, world!"
1.9 ± 0.1
1.6
2.8
3.44 ± 0.57


python ./echo.py "Hello, world!"
10.9 ± 0.3
10.3
11.7
19.90 ± 2.94


We can see that Tcl is very straightforward to write and fast to start-up.
Count duplicate lines in files

Given some files like this example.txt:
foo
hello
foo bar
hello
foo bar
bar
We produce the result:
2        foo bar
2        hello
Go

package main

import (
	"fmt"
	"io/ioutil"
	"os"
	"strings"
)

func main() {
	counts := make(map[string]int)
	for _, filename := range os.Args[1:] {
		data, err := ioutil.ReadFile(filename)
		if err != nil {
			fmt.Fprintf(os.Stderr, "dup3: %v\n", err)
			continue
		}
		for _, line := range strings.Split(string(data), "\n") {
			counts[line]++
		}
	}
	for line, n := range counts {
		if n > 1 {
			fmt.Printf("%d\t%s\n", n, line)
		}
	}
}
Shell

#!/bin/sh

set -e
set -o pipefail

for path in "$@"
do
	sort "$path" | uniq --count --repeated || {
		echo "^dup error^"
		continue
	}
done
Note how awkward it is to try and handle errors in this shell script.
Python

#!/usr/bin/env python

import collections
import sys

for i, p in enumerate(sys.argv):
    if i == 0:
        continue
    try:
        with open(p) as f:
            c = collections.Counter(f.readlines())
            for k, v in c.most_common():
                if v > 1:
                    print(v, "\t", k.replace("\n", ""))
    except Exception as e:
        print("dup: ", str(e))
        continue
Tcl

#!/usr/bin/env tclsh

proc main {paths} {
    foreach path $paths {
        try {
            set file [read [open $path r]]
        } on error {err} {
            puts "dup: $err"
            continue
        }

        foreach line [split $file \n] {
            incr count($line)
        }

        foreach {line n} [array get count] {
            if {$n > 1} {
                puts "$n\t$line"
            }
        }
    }
}

main $argv
Benchmarks

To compare these programs, I did a a few rounds of testing:
first with just one small file, then with batch of files,
and finally including a 230 megabyte torture-test.
First Round


Command
Mean [ms]
Min [ms]
Max [ms]
Relative


./dup         0.txt
0.6 ± 0.1
0.5
1.4
1.00


tclsh dup.tcl 0.txt
2.1 ± 0.2
1.8
2.8
3.36 ± 0.69


sh dup.sh     0.txt
3.3 ± 0.3
2.8
4.7
5.32 ± 1.15


python dup.py 0.txt
12.1 ± 0.5
10.9
13.6
19.25 ± 3.76


Second Round


Command
Mean [ms]
Min [ms]
Max [ms]
Relative


./dup         0.txt err 1.txt 2.txt 3.txt 4.txt
4.6 ± 0.6
3.6
7.0
1.00


python dup.py 0.txt err 1.txt 2.txt 3.txt 4.txt
17.8 ± 0.8
15.9
20.7
3.84 ± 0.50


sh dup.sh     0.txt err 1.txt 2.txt 3.txt 4.txt
30.1 ± 0.9
27.9
32.3
6.51 ± 0.82


tclsh dup.tcl 0.txt err 1.txt 2.txt 3.txt 4.txt
34.8 ± 1.4
32.0
37.9
7.53 ± 0.97


Third Round


Command
Mean [ms]
Min [ms]
Max [ms]
Relative


./dup         0.txt err 1.txt 2.txt 3.txt 4.txt torture.txt
201.8 ± 7.6
190.7
214.9
1.00


python dup.py 0.txt err 1.txt 2.txt 3.txt 4.txt torture.txt
875.6 ± 28.2
848.8
940.7
4.34 ± 0.21


tclsh dup.tcl 0.txt err 1.txt 2.txt 3.txt 4.txt torture.txt
1292.4 ± 121.4
1165.9
1484.7
6.40 ± 0.65


sh dup.sh     0.txt err 1.txt 2.txt 3.txt 4.txt torture.txt
2212.5 ± 47.7
2146.4
2293.4
10.96 ± 0.47


Although I prefer the Go and Tcl approach above of directly implementing
the solution to the problem, I do see some advantages to Python's dependence
on more complex libraries when it comes to performance in this case.
GET request

We do the equivalent of curl whatever and print the result.
Go

package main

import (
	"fmt"
	"io/ioutil"
	"net/http"
	"os"
)

func main() {
	for _, url := range os.Args[1:] {
		resp, err := http.Get(url)
		if err != nil {
			fmt.Fprintf(os.Stderr, "fetch: %v\n", err)
			os.Exit(1)
		}
		b, err := ioutil.ReadAll(resp.Body)
		resp.Body.Close()
		if err != nil {
			fmt.Fprintf(os.Stderr, "fetch: reading %s: %v\n", url, err)
			os.Exit(1)
		}
		fmt.Printf("%s", b)
	}
}
Python

#!/usr/bin/env python

import requests
import sys

url = sys.argv[1]

try:
    response = requests.get(url)
except Exception as e:
    print("fetch: ", e)
    sys.exit(1)

print(response.text)
The requests here is a third-party package that required installation
and a virtual environment to use. I don't know how to catch more
granular exceptions in this example.
Tcl

#!/usr/bin/env tclsh

package require http

set url [lindex $argv 0]

proc main {url} {
    try {
        set http [::http::geturl $url]
    } on error {err} {
        puts stderr "fetch: $err"
        exit 1
    }
    
    try {
        set html [::http::data $http]
    } on error {err} {
        puts stderr "fetch: reading $url $err"
        exit 1
    }
    
    puts $html
}

main $url
Benchmarks

The first test is with a local file served with miniserve,
in order to see how fast is it possible to complete the request.
The second test is with a remote server to simulate real-world use.
First Round


Command
Mean [ms]
Min [ms]
Max [ms]
Relative


./fetch         http://[::1]:8080
1.7 ± 0.2
1.3
2.6
1.00


sh fetch.sh     http://[::1]:8080
5.1 ± 0.4
4.4
7.0
3.07 ± 0.43


tclsh fetch.tcl http://[::1]:8080
10.2 ± 0.4
9.5
11.9
6.16 ± 0.76


python fetch.py http://[::1]:8080
89.5 ± 1.0
88.0
91.5
54.03 ± 6.34


Second Round


Command
Mean [ms]
Min [ms]
Max [ms]
Relative


sh fetch.sh     http://example.com
251.4 ± 22.7
235.2
289.2
1.00


./fetch         http://example.com
253.5 ± 22.9
232.5
282.7
1.01 ± 0.13


tclsh fetch.tcl http://example.com
259.2 ± 21.9
240.7
292.6
1.03 ± 0.13


python fetch.py http://example.com
342.8 ± 21.9
319.7
370.7
1.36 ± 0.15


It's possible I am simply not familiar enough with Python to see how to
get that granular exception handling, but I like how Tcl strikes a balance:
the code is explicit without being verbose. I don't want to fuss with
third-party libraries.
Parallel GET requests

This time we do multiple parallel requests,
and then calculate the size of the response
and how long it took to get it in seconds.
The output should look like this:
$ ./script http://example.com http://ddg.gg
0.38s     1256  http://example.com
1.08s     5999  http://ddg.gg
1.08s elapsed
Go

package main

import (
    "fmt"
    "io"
    "io/ioutil"
    "net/http"
    "os"
    "time"
)

func main() {
    start := time.Now()
    ch := make(chan string)
    for _, url := range os.Args[1:] {
        go fetch(url, ch) // start a goroutine
    }
    for range os.Args[1:] {
        fmt.Println(<-ch) // receive from channel ch
    }
    fmt.Printf("%.2fs elapsed\n", time.Since(start).Seconds())
}

func fetch(url string, ch chan<- string) {
    start := time.Now()
    resp, err := http.Get(url)
    if err != nil {
        ch <- fmt.Sprint(err) // send to channel ch
        return
    }

    nbytes, err := io.Copy(ioutil.Discard, resp.Body)
    resp.Body.Close() // avoid leaking resources
    if err != nil {
        ch <- fmt.Sprintf("while reading %s: %v", url, err)
        return
    }
    secs := time.Since(start).Seconds()
    ch <- fmt.Sprintf("%.2fs  %7d  %s", secs, nbytes, url)
}
Python

#!/usr/bin/env python

import json
import requests
import sys
import time

from concurrent.futures import ThreadPoolExecutor

def fetch_url(data):
    index, url = data
    try:
        r = requests.get(url, timeout=10)
    except requests.exceptions.ConnectTimeout:
        return

    time_taken = round(time.time()-start, 2)
    print('{}s \t{}'.format(time_taken, len(r.text)))

start = time.time()
with ThreadPoolExecutor(max_workers=10) as runner:
    for _ in runner.map(fetch_url, enumerate(sys.argv[1:])):
        pass

    runner.shutdown()

time_taken = round(time.time()-start, 2)
print('{}s elapsed'.format(time_taken))
Tcl

#!/usr/bin/env tclsh

package require http

set done false
set requests 0
set responses 0
set start [clock milliseconds]

proc main {urls} {
    global done start
    
    foreach url $urls {
        fetch $url
    }

    vwait done

    set stop [clock milliseconds]
    puts [format {%0.2fs elapsed} [expr {($stop - $start) / 1000.0}]]
}

proc fetch {url} {
    global requests

    try {
        ::http::geturl $url -command callback
        incr requests
    } on error {err} {
        puts stderr "fetch: $err"
        exit 1
    }
}

proc callback {token} {
    global done requests responses start

    try {
        set data [::http::data $token]
    } on error {err} {
        puts stderr "fetch: reading $url $err"
        exit 1
    }

    set stop [clock milliseconds] 
    set elapsed [format {%0.2fs} [expr {($stop - $start) / 1000.0}]]
    set length [string length $data]

    puts "$elapsed\t$length"

    incr responses

    if {$requests eq $responses} {
        set done true
    }
}

main $argv
Note that in the above code, expr is a command that interprets a
domain-specific language for mathematical expressions. Tcl allows you
to write your own DSLs as well.
Benchmark


Command
Mean [ms]
Min [ms]
Max [ms]
Relative


tclsh ./fetchall.tcl http://example.com http://github.com http://127.0.0.1:8080 http://127.0.0.1:1222 http://127.0.0.1:1223
245.1 ± 29.0
220.6
336.6
1.00


./fetchall http://example.com http://github.com http://127.0.0.1:8080 http://127.0.0.1:1222 http://127.0.0.1:1223
253.6 ± 64.4
213.3
444.4
1.03 ± 0.29


python ./fetchall.py http://example.com http://github.com http://127.0.0.1:8080 http://127.0.0.1:1222 http://127.0.0.1:1223
348.5 ± 27.0
321.9
400.8
1.42 ± 0.20


I was surprised to see Tcl be faster than Go on average. Perhaps this
is due to network variance, but my home connection is pretty stable.
In any case, all three programs perform acceptably but work differently.
Go of course used its version of light-weight processes, goroutines.
The Python code was multi-threaded. Tcl supports multiple threads and
coroutines, but for this task it seemed best to use the event loop and
callbacks. The Tcl program is longer, but the logic of how it works is
all on the page—Python again required third-party packages and a virtual
environment to get working.
Conclusion

Hopefully the above examples give a sense of Tcl. In general, I think the
sales pitch for Tcl is that it is simple, fast, and expressive. Tcl has
been extended for automating interactions with Expect and writing
cross-platform GUI applications with Tk. To learn more, check out these
resources:

Learn Tcl in Y Minutes
Tcl the Misunderstood
Tcl Tutorial
Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./echo "Hello, world!"`	0.5 ± 0.1	0.4	1.0	1.00
`echo "Hello, world!"`	1.0 ± 0.1	0.8	1.6	1.88 ± 0.35
`tclsh ./echo.tcl "Hello, world!"`	1.9 ± 0.1	1.6	2.8	3.44 ± 0.57
`python ./echo.py "Hello, world!"`	10.9 ± 0.3	10.3	11.7	19.90 ± 2.94
Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./dup 0.txt`	0.6 ± 0.1	0.5	1.4	1.00
`tclsh dup.tcl 0.txt`	2.1 ± 0.2	1.8	2.8	3.36 ± 0.69
`sh dup.sh 0.txt`	3.3 ± 0.3	2.8	4.7	5.32 ± 1.15
`python dup.py 0.txt`	12.1 ± 0.5	10.9	13.6	19.25 ± 3.76
Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./dup 0.txt err 1.txt 2.txt 3.txt 4.txt`	4.6 ± 0.6	3.6	7.0	1.00
`python dup.py 0.txt err 1.txt 2.txt 3.txt 4.txt`	17.8 ± 0.8	15.9	20.7	3.84 ± 0.50
`sh dup.sh 0.txt err 1.txt 2.txt 3.txt 4.txt`	30.1 ± 0.9	27.9	32.3	6.51 ± 0.82
`tclsh dup.tcl 0.txt err 1.txt 2.txt 3.txt 4.txt`	34.8 ± 1.4	32.0	37.9	7.53 ± 0.97
Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./dup 0.txt err 1.txt 2.txt 3.txt 4.txt torture.txt`	201.8 ± 7.6	190.7	214.9	1.00
`python dup.py 0.txt err 1.txt 2.txt 3.txt 4.txt torture.txt`	875.6 ± 28.2	848.8	940.7	4.34 ± 0.21
`tclsh dup.tcl 0.txt err 1.txt 2.txt 3.txt 4.txt torture.txt`	1292.4 ± 121.4	1165.9	1484.7	6.40 ± 0.65
`sh dup.sh 0.txt err 1.txt 2.txt 3.txt 4.txt torture.txt`	2212.5 ± 47.7	2146.4	2293.4	10.96 ± 0.47
Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fetch http://[::1]:8080`	1.7 ± 0.2	1.3	2.6	1.00
`sh fetch.sh http://[::1]:8080`	5.1 ± 0.4	4.4	7.0	3.07 ± 0.43
`tclsh fetch.tcl http://[::1]:8080`	10.2 ± 0.4	9.5	11.9	6.16 ± 0.76
`python fetch.py http://[::1]:8080`	89.5 ± 1.0	88.0	91.5	54.03 ± 6.34
Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`sh fetch.sh http://example.com`	251.4 ± 22.7	235.2	289.2	1.00
`./fetch http://example.com`	253.5 ± 22.9	232.5	282.7	1.01 ± 0.13
`tclsh fetch.tcl http://example.com`	259.2 ± 21.9	240.7	292.6	1.03 ± 0.13
`python fetch.py http://example.com`	342.8 ± 21.9	319.7	370.7	1.36 ± 0.15
Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`tclsh ./fetchall.tcl http://example.com http://github.com http://127.0.0.1:8080 http://127.0.0.1:1222 http://127.0.0.1:1223`	245.1 ± 29.0	220.6	336.6	1.00
`./fetchall http://example.com http://github.com http://127.0.0.1:8080 http://127.0.0.1:1222 http://127.0.0.1:1223`	253.6 ± 64.4	213.3	444.4	1.03 ± 0.29
`python ./fetchall.py http://example.com http://github.com http://127.0.0.1:8080 http://127.0.0.1:1222 http://127.0.0.1:1223`	348.5 ± 27.0	321.9	400.8	1.42 ± 0.20