The journey continues: in this session we are going to move the famous TDFS (Tableau Distributed File System) with Zookeeper service to one of our favorite operating system: Linux. The goal is the same: have each and every Tableau Server processes on Linux without the need of the Windows OS. And a short disclaimer: this is 100% unsupported by Tableau and you need valid licenses for your Linux box otherwise you are going to violate their EULA.
Previously on “Tableau Server on Linux – Part 1 – Data Engine”
And today we are not just going to install these two services on Linux. No, we’ll do a lot more! We start to transform our Single Node Tableau Server to a Cluster without even touching the GUI.
TDFS – or as Tableau calls File Store service – is installed along with the Data Engine and controls the storage of extracts. In highly available environments, the File Store ensures that extracts are synchronized to other file store nodes so they are available if one file store node stops running1. How does it work in practice? If you refresh a data source then
- Backgrounder receives an extract refresh task
- Gets the Data and pass it to the
tdeserver
process with its new unique name tdeserver
writes the new local tde file- Backgrounder connects to File Store service and report the new file
- File Store puts the file to TDFS
- TDFS implementation ensures that file is replicated to all nodes. Node configurations are stored in zookeeper under
/tdfs
zookeeper directory.
In order to use our tdeserver
without the need to copy files between Tableau Server and our Linux hosts we need zookeeper and tdfs, that’s it. So, let’s configure them.
First of all, what is Zookeeper? According to Zookeeper’s website Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Tableau uses Zookeeper to store cluster information, check who is doing what, who is available. Most of these functionalities are implemented in the Coordination Service. But enough from theory, let's jump into the practice!
Zookeeper is Tableau Server 9.0 uses Zookeeper 3.4.6 which is the latest stable release. Zookeeper is written purely in Java, thus binaries should work on all platforms where java is supported. You can download this version from any apache mirrors.
$ wget http://www.us.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
[..]
$ tar xvzf zookeeper-3.4.6.tar.gz
Installation is done, the zookeeper distribution is ready to server in your zookeeper-3.4.6
folder. To have everything up and running we need a tableau-compatible configuration. The configuration file should be named as `
$ cat zookeeper-3.4.6/conf/zoo.cfg
tickTime=2000
initLimit=30
syncLimit=2
snapCount=100000
dataDir=/home/ec2-user/zookeeper-data
clientPort=12000
maxClientCnxns=0
quorumListenOnAllIPs=true
server.1=54.203.245.18:13000:14000
server.2=54.212.254.40:13000:14000
Couple of things: server.1
should be our original Windows Tableau server while server.2
is the Linux one. The dataDir
should point to zookeepers local copy of its data, thus, you must create it with mkdir ~/zookeper-data
. Also, you should create a file called myid
inside dataDir
to tell zookeeper the node's id:
echo 2 > ~/zookeper-data/myid
Good. Now switch to the windows box and add the server.2
line to Tableau Server's zoo.conf
located in %PROGRAMDATA%\Tableau\Tableau Server\data\tabsvc\config\zookeeper\zoo.cfg
. That's it. Restart tableau server, then start our own Linux Zookeper instance with:
$ ./bin/zkServer.sh start
JMX enabled by default
Using config: /home/ec2-user/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
You can quickly check zookeeper.out
to see everything is okay.
We built a Zookeper cluster and joined to our Tableau Server. But what's inside? Well, let's have a look:
$ bin/zkCli.sh -server 127.0.0.1:12000
[zk: 127.0.0.1:12000(CONNECTED) 0] ls /
[configs, tdfs, zookeeper, clusterstate.json, aliases.json, clustercontroller, live_nodes, postgres, overseer, collections, overseer_elect]
Nice, it seems we can access everything locally from linux. Or maybe not:
[zk: 127.0.0.1:12000(CONNECTED) 1] ls /tdfs
Authentication is not valid : /tdfs
Time to authenticate ourselves. You can get the password from %PROGRAMDATA%\Tableau\Tableau Server\data\tabsvc\config\filestores.properties
:
filestore.zookeeper.username=fszkuser
filestore.zookeeper.password=95d2cb4f8464d1560db0f8276b59e4bfe2e6ad5d
Now, let's authenticate and retry read from /tdfs
directory:
[zk: 127.0.0.1:12000(CONNECTED) 2] addauth digest fszkuser:95d2cb4f8464d1560db0f8276b59e4bfe2e6ad5d
[zk: 127.0.0.1:12000(CONNECTED) 4] ls /tdfs
[hostslock, totransferperhost, status, clock, totransferperfolder, hosts, transferring]
Everything as expected. Zookeeper: job done.
And now, something different. Until now we deal only with ready to use services. Now, let's move something really tableau specific. We should start moving Tableau java packages (jars) to our Linux box. Here is what and how:
- create a new folder called
tableau-apps
. This where the code will go - create a folder as
tableau-apps/bin
. Copy alljar
files from Tableau Server'sbin/
folder recursively. If you are doin' right you should haverepo-jars
andrepo-migrate-jars
subfolders with jar files as well. You do not need everything now, but this only only part two - and we will move all services, not just TDFS! - create new folder as
tableau-apps/lib
. Just like in case of bins, copy alljar
files from Tableau Server'slib/
folder. Here you don't need recursion, first level is enough.
That's it, binaries are done. How about configuration?
Create a new folder filestore
and create the following three files:
log4j.xml
- to see what is going on:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration PUBLIC "-//LOGGER" "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
<!-- Appenders -->
<appender name="file" class="org.apache.log4j.DailyRollingFileAppender">
<param name="File" value="/home/ec2-user/filestore/filestore.log" />
<param name="DatePattern" value="'.'yyyy-MM-dd" />
<param name="encoding" value="UTF-8" />
<layout class="org.apache.log4j.EnhancedPatternLayout">
<param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss.SSS Z}{UTC} %t %X{siteName} %X{userName} %-5p %X{requestId}: %c - %m%n" />
</layout>
</appender>
<!-- 3rdparty Loggers -->
<logger name="org.apache">
<level value="warn" />
</logger>
<!-- Root Logger -->
<root>
<priority value="info" />
<appender-ref ref="file" />
</root>
</log4j:configuration>
connections.properties
- this is required to know where to connect
# 54.203.245.18 - this is our windows box
#Thu May 28 07:12:36 UTC 2015
pgsql.host=54.203.245.18
jdbc.url=jdbc\:postgresql\://54.203.245.18\:8060/workgroup
primary.host=54.203.245.18
pgsql.port=8060
primary.port=8060
And finally the filestore.properties
:
coordinationservice.hosts=localhost:12000
coordinationservice.operationretrylimit=5
coordinationservice.operationretrydelay=5000
coordinationservice.operationtimeout=30000
coordinationservice.sessiontimeout=60000
filestore.zookeeper.username=fszkuser
filestore.zookeeper.password=95d2cb4f8464d1560db0f8276b59e4bfe2e6ad5d
filestore.maxmutexretries=5
filestore.hostname=54.212.254.40
filestore.maxentriesinfilestofetch=4
filestore.root=/home/ec2-user/dataengine
filestore.port=9345
filestore.status.port=9346
filestore.transferreportintervalms=30000
filestore.reapholdoffms=7500000
filestore.inusereapholdoffms=86400000
filestore.filetypes=extract
filestore.allfileprocessingholdoffms=300000
filestore.somefileprocessingholdoffms=300000
filestore.reapfailedtransfersholdoffms=3600000
filestore_stale_folder_reap.delay_s=3600
filestore_zookeeper_cleaner.delay_s=60
filestore_missing_folder_fetch.delay_s=60
filestore_scheduled_folder_fetch.delay_s=60
filestore_scheduled_internal_folder_fetch.delay_s=60
filestore_failed_transfers_reap.frequency_s=86400
filestore.maxservertimeoffsetms=900000
worker.hosts=54.203.245.18,54.212.254.40
The windows server is still the 54.203.245.18
while 54.212.254.40
is the linux node. The filestore.root
directory should point to our data engine directory (which was created in our part 1). And don't forget to change the fszkuser
user's password. Linux part is done, switch to windows.
In addition to zookeeper authentication TDFS blocks all connections which aren't coming from worker nodes. Thus, we should add this node as working in the following files:
filestore.properties
connections.properties
connections.yaml
backgrounder.properties
clustercontroller.properties
dataengine/tdeserver_standalone0.yml
Practically you must:
- search and replace
localhost
string with the external IP of the server in all above listed files - change
worker.hosts
toworker.hosts=windows_ip,linux_ip
infilestore.properties
andtdeserver_standalone0.yml
due to whitelisting
You can find these files in %PROGRAMDATA%\Tableau\Tableau Server\data\tabsvc\config
.
Config done, let's start TDFS:
$ java -Dconnections.properties=file:///connections.properties -Dconfig.properties=file:///$PWD/filestore.properties -cp ".:../tableau-apps/bin/app-tdfs-filestore-latest-jar.jar:../tableau-apps/bin/repo-jars/*:../tableau-apps/lib/*" com.tableausoftware.tdfs.filestore.app.Main
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ec2-user/tableau-apps/bin/repo-jars/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ec2-user/tableau-apps/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ec2-user/tableau-apps/lib/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Now, you should see some nice log messages in filestore/filestore.log
:
2015-08-04 12:49:24.709 +0000 Thread-2 INFO : com.tableausoftware.tdfs.filestore.status.StatusService - Starting Status Service on port 9346
2015-08-04 12:49:24.729 +0000 main INFO : com.tableausoftware.tdfs.filestore.app.Main - FileStore Server started
2015-08-04 12:49:24.731 +0000 main INFO : com.tableausoftware.tdfs.filestore.controller.ControllerService - Registering filestore node with zookeeper...
2015-08-04 12:49:24.841 +0000 main INFO : com.tableausoftware.tdfs.filestore.controller.ControllerService - Registered filestore node with zookeeper.
If you are still with me then you just accomplished part 2: you have your TDFS and Zookeper on your Linux node in cluster mode.
A typical test case would be an extract refesh. After refresh completion we should see the generated TDE file both on Windows and Linux.
extract refresh.png
Now in the backgrounder.log we can see that it was able to communicate with TDFS:
2015-08-04 13:00:38.547 +0000 (Default,,,) pool-2-thread-1 : INFO com.tableausoftware.tdfs.common.ExtractsListHelper - Wrote extracts to file C:\ProgramData\Tableau\Tableau Server\data\tabsvc\temp\allValidFolderIds651283455269617809\allValidFolderIds1576681360460061073.tmp
2015-08-04 13:00:38.562 +0000 (Default,,,) pool-2-thread-1 : INFO com.tableausoftware.model.workgroup.service.FileStoreService - Uploaded allValidFolderIds file to File Store on host 54.203.245.18
2015-08-04 13:00:38.578 +0000 (,,,) backgroundJobRunnerScheduler-1 : INFO com.tableausoftware.backgrounder.runner.BackgroundJobRunner - Job finished: SUCCESS; name: List Extracts for TDFS Reaping; type :list_extracts_for_tdfs_reaping; notes: null; total time: 1 sec; run time: 0 sec
2015-08-04 13:00:38.578 +0000 (,,,) backgroundJobRunnerScheduler-1 : INFO com.tableausoftware.backgrounder.runner.BackgroundJobRunner - Running job of type :list_extracts_for_tdfs_propagation; no timeout; priority: 10; id: 19339; args: []
2015-08-04 13:00:38.594 +0000 (Default,,,) pool-2-thread-1 : INFO com.tableausoftware.model.workgroup.workers.ListExtractsForTDFSPropagationWorker - Deleted 0 extract_sessions created prior to last DB start time
2015-08-04 13:00:38.609 +0000 (Default,,,) pool-2-thread-1 : INFO com.tableausoftware.model.workgroup.workers.ListExtractsForTDFSPropagationWorker - done fetching orphans
2015-08-04 13:00:38.609 +0000 (Default,,,) pool-2-thread-1 : INFO com.tableausoftware.model.workgroup.workers.ListExtractsForTDFSPropagationWorker - Found 4 recent valid extract records
On Windows:
c:\ProgramData\Tableau\Tableau Server\data\tabsvc\dataengine\extract>dir "a7\c0\{BE3565D0-4390-48E8-89D8-5A254A8FC675}\comments.tde"
08/04/2015 01:00 PM 49,034 comments.tde
On Linux:
$ find dataengine/ -exec ls -l {} \; | grep -i aug | grep com
-rw-rw-r-- 1 ec2-user ec2-user 49034 Aug 4 13:01 dataengine/extract/a7/c0/{BE3565D0-4390-48E8-89D8-5A254A8FC675}/\comments.tde
Hurray, our file was replicated successfuly in our newly built cluster. This is the end, the happy end.
If you have questions or comments just let me know and stay tuned for learn about more services - running on Linux.