Skip to content

Instantly share code, notes, and snippets.

@rbuckland
Last active December 2, 2020 09:14
Show Gist options
  • Save rbuckland/e28a48155c86a8f29d0b to your computer and use it in GitHub Desktop.
Save rbuckland/e28a48155c86a8f29d0b to your computer and use it in GitHub Desktop.
Detailed notes for Installing mr4c on RedHat EL6 and Ubuntu 14.04

Installing mr4c - MapReduce for C/C++

https://github.com/google/mr4c http://google-opensource.blogspot.com.au/2015/02/mapreduce-for-c-run-native-code-in.html

General Notes

mr4c and it's dependencies will be required installs on all nodes that run your algorithms; This goes without saying, but of course if you get exceptions like below, then that is literally what is going on.

Caused by: java.lang.UnsatisfiedLinkError: Unable to load library '/data/nvme/yarn/nm/usercache/genuser/filecache/10/libmr4c.so': liblog4cxx.so.10: cannot open shared object file: No such file or directory
	at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:169)
	at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:242)
	at com.google.mr4c.nativec.jna.lib.Mr4cLibrary.<clinit>(Mr4cLibrary.java:21)
	at com.google.mr4c.nativec.jna.JnaExternalEntry.<clinit>(JnaExternalEntry.java:41)
	at com.google.mr4c.nativec.jna.JnaExternalFactory.newEntry(JnaExternalFactory.java:48)
	at com.google.mr4c.nativec.NativeAlgorithm.<init>(NativeAlgorithm.java:56)
	at com.google.mr4c.nativec.jna.JnaNativeAlgorithm.<init>(JnaNativeAlgorithm.java:52)
	at com.google.mr4c.algorithm.Algorithms$NativeFactory.create(Algorithms.java:138)
	at com.google.mr4c.algorithm.Algorithms.getAlgorithm(Algorithms.java:99)
	at com.google.mr4c.sources.ConfiguredExecutionSource.getAlgorithm(ConfiguredExecutionSource.java:112)
	at com.google.mr4c.hadoop.HadoopMapper.configure(HadoopMapper.java:52)

Ubuntu 14.04 Install Notes

Ubuntu 14.04 is pretty close to having everything we need.

Pre-setup details

  • download Apache ANT - unpack to /opt
  • create symlink for /opt/apache-ant -> /opt/apache-ant-version
  • download Apache Ivy (deps) and place ivy*.jar and lib/*.jar into ANT_HOME/lib
  • download and install openjdk - create a symlink /opt/apache-ant/bin:/opt/jdk/bin

An example of the /opt

rbuckland@host2:~$ ls -al /opt
total 20
drwxr-xr-x  5 root         root         4096 Sep  9 15:49 .
drwxr-xr-x 24 root         root         4096 Sep  9 21:20 ..
lrwxrwxrwx  1 root         root           17 Sep  9 15:49 apache-ant -> apache-ant-1.9.6/
drwxr-xr-x  6 root         root         4096 Sep  9 15:28 apache-ant-1.9.6
drwxr-xr-x  6 cloudera-scm cloudera-scm 4096 Sep  8 14:34 cloudera
lrwxrwxrwx  1 root         root           11 Sep  9 15:49 jdk -> jdk1.8.0_60
drwxr-xr-x  8 root         root         4096 Sep  9 15:47 jdk1.8.0_60

A common setup used

rbuckland@host2:~$ cat /etc/environment
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/apache-ant/bin:/opt/jdk/bin"
JAVA_HOME=/opt/jdk

Install the basic Requirements for mr4c

Install the prerequesites for mr4c

sudo apt-get install libgdal1-dev libgdal1h subversion \
build-essential autoconf libproj-dev libproj0 \
libcppunit-dev libcppunit-1.13-0 libjansson-dev libjansson4 \
git

Install log4cxx (need to compile a new version)

svn checkout http://svn.apache.org/repos/asf/incubator/log4cxx/trunk apache-log4cxx

First - Apache Runtime and it's utilities

sudo apt-get install libaprutil1 libaprutil1-dev libapr1-dev libapr1

log4cxx - The packaged version in Ubuntu is out of date. (trunk has fixes) so it needs to be installed

  1. ./autogen.sh
  2. ./configure
  3. make
  4. sudo make install
  5. sudo ldconfig # update cache of the new lib installed

Install Apache Ant

mr4c will need ant at the very end of the build to package and deploy all the jars

Install Apache Ivy into ant (lib/*.jars and deps)

  • have ant in the PATH
  • have invy installed in Ant

### Finally, clone and build mr4c

export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal
./build_all

Red Hat EL6 Install Notes

We need to install a few dependencies in order to compile mr4c. You will need the Repo for Optionals (from RedHat) - some dependencies further down will require that. https://github.com/google/mr4c#dependencies

Download the src for mr4c

git clone https://github.com/google/mr4c.git

### Installing log4cxx

RPMS for EL6 (trunk version 11)

I have already done this and packaged them up as an RPM. The RPM you will need because you need to distribute it across all your data nodes in the cluster. The details on ow it is built is below.

Here are the (S)RPMs

and the other bits

Building it on RedHat EL6

Download it first using subversion

svn checkout http://svn.apache.org/repos/asf/incubator/log4cxx/trunk apache-log4cxx

Install Apache Runtime

sudo yum install apr apr-util apr-devel apr-util-devel

EL6 - the default BinUtils has a bug

Before we begin building log4cxx, we need to upgrade binutils. The problem will be (if you raced ahead) that make fails. Also there are some odd errors when running autogen.sh

When running 'make' for log4cxx, we get the erro

/tmp/cc1ghG2e.s: Assembler messages:
/tmp/cc1ghG2e.s:1088: Error: expecting string instruction after `rep'

This is documented here as a bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57017 And noted here as will in some other apps issues mxe/mxe#404

Solution is to upgrade to binutils 2.23.52.0.1 (ld linker bug resolved)

RHEL6 has made the upgrade package available as a separate dependency.

https://rhn.redhat.com/errata/RHBA-2014-0270.html

sudo yum install devtoolset-2-binutils

log4cxx is not available on RedHat - so need to be compiled and installed

Now, Make sure you have installed the devtoolset-2-binutils above first. log4cxx - The packaged version in Ubuntu is out of date. (trunk has fixes) so it needs to be installed

Building it on RedHat has proven more tricky - a bit messy.

svn checkout http://svn.apache.org/repos/asf/incubator/log4cxx/trunk apache-log4cxx
cd apache-log4cxx
./autogen.sh # expect some strange errors here
./autogen.sh # the first time errors .. and for some reason the 2nd time it works.. (pass :-) )
./configure
make
sudo make install
sudo echo /usr/local/lib > /etc/ld.so.conf.d/usrlocal.conf
sudo ldconfig # update cache of the new lib installed

mr4c other Deps - (simpler)

Next is to tackle the extra packages we need, that are not in the default RHEL. EPEL (Extra Packaged for Ent Linux) gives us what we need.

sudo rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm

We also need ELGIS libraries and it depends on EPEL too so add that in

sudo rpm -Uvh http://elgis.argeo.org/repos/6/elgis-release-6-6_0.noarch.rpm

#### armadillo

It seems that gdal needs an old version or armadillo installed. It requires some other (in repo) dependencies: lapack and blas (both Linaer Albegra libraries), and cblas and clapack, provided by atlas.

So first install these

sudo yum -y install blas lapack atlas

And then Armadillo - (v3.800.2-1) - It can be found here

sudo rpm -Uvh http://proj.badc.rl.ac.uk/cedaservices/raw-attachment/ticket/670/armadillo-3.800.2-1.el6.x86_64.rpm

gdal

sudo yum install gdal gdal-devel

# these two lines will be required each time you setup a new shell
# the build will fail (below) will fail if the environment is not setup correctly.
export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal

cppunit

sudo yum install cppunit cppunit-devel

jansson

sudo yum install jansson jansson-devel

PROJ.4

sudo yum install proj proj-devel

apache ant (the Redhat Way)

sudo yum install apache-ant apache-ivy ant-junit

Finally, compile and install mr4c

./build_all
sudo ./deploy_all

Semi Auto Deploy mr4c Dependencies Across Data Nodes

mr4c requires that any programs lib dependencies are also installed on every node in the cluster. There is the basic minimum set required by mr4c and the following script will automate that deployment for you.

  1. Download the prebuild log4cxx RPM from log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm

  2. Save this script into the same directory on your edge node (or any workstation with access to the cluster)

  3. Run mr4c_node_prepartion.sh. This will make a .tar.gz of the dependencies required, and 2 other shell scripts.

  4. Run mr4c_node_deploy_runner.sh <dn_hostname>

    Example

    cat hostnames.txt | xargs -L1 -Ixx sh mr4c_node_deploy_runner.sh xx

This will (as root) copy the tar.gz and the simple installer across to each node in hostnames.txt

RedHat EL6 Binary Installation

sudo yum install git
git clone https://github.com/google/mr4c.git
sudo yum install apr apr-util 
mkdir mr4c_deps
wget https://www.dropbox.com/s/o7t0jsqn4ejv8eb/log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm?dl=0
sudo yum localinstall log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm*
sudo rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
sudo rpm -Uvh http://elgis.argeo.org/repos/6/elgis-release-6-6_0.noarch.rpm
sudo yum -y install blas lapack atlas
sudo rpm -Uvh http://proj.badc.rl.ac.uk/cedaservices/raw-attachment/ticket/670/armadillo-3.800.2-1.el6.x86_64.rpm
sudo yum install gdal gdal-devel
sudo yum install cppunit cppunit-devel
sudo yum install jansson jansson-devel
sudo yum install apache-ant apache-ivy
sudo yum install proj proj-devel
sudo yum install cmake gcc gcc-c++ 
sudo yum localinstall log4cxx-devel-0.11.0.trunk.20150916-19.el6.x86_64.rpm

## Other Random Issues

You will need to uprade the gcc to > 4.6. RHEL 6 by default has 4.4 https://groups.google.com/forum/#!msg/mr4c/uJf1en6iCTU/7PuI7_aDuUMJ

This .. of course needs to come from devtoolset-2

You will need to install the centos package and modify the repo URL

Then set the PATH and environment to the tools
. /opt/rh/devtoolset-2/enable
sudo wget http://people.centos.org/tru/devtools-2/devtools-2.repo -O /etc/yum.repos.d/devtools-2.repo
sudo vi /etc/yum.repos.d/devtools-2.repo
# and change the baseurl=http://people.centos.org/tru/devtools-2/$releasever/$basearch/RPMS
# to baseurl=http://people.centos.org/tru/devtools-2/6/$basearch/RPMS

then install devtoolset-2-gcc devtoolset-2-gcc-g++ devtoolset-2-binutils

When compiling you may get this error

In file included from ./src/cpp/api/gdal/gdal_api.h:20:0,
                 from src/cpp/impl/gdal/GDALLocalFile.cpp:21:
./src/cpp/api/gdal/GDALCoordTrans.h:20:28: fatal error: ogr_spatialref.h: No such file or directory
 #include "ogr_spatialref.h"
                            ^
compilation terminated.
make: *** [objs/impl/gdal/GDALLocalFile.o] Error 1

This is caused by not having gdal in the path. Fix

export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal

When running mr4c ./build_all you may come across this error

src/cpp/impl/gdal/GDALMemoryFile.cpp: In member function ‘virtual void MR4C::GDALMemoryFileImpl::storeContent(const string&, std::shared_ptr<MR4C::DataFileSource>&)’:
src/cpp/impl/gdal/GDALMemoryFile.cpp:69:4: error: ‘VSILFILE’ was not declared in this scope
    VSILFILE* fileHandle = VSIFileFromMemBuffer(
    ^
src/cpp/impl/gdal/GDALMemoryFile.cpp:69:14: error: ‘fileHandle’ was not declared in this scope
    VSILFILE* fileHandle = VSIFileFromMemBuffer(
              ^
src/cpp/impl/gdal/GDALMemoryFile.cpp: In destructor ‘MR4C::GDALMemoryFile::~GDALMemoryFile()’:
src/cpp/impl/gdal/GDALMemoryFile.cpp:144:9: warning: deleting object of polymorphic class type ‘MR4C::GDALMemoryFileImpl’ which has non-virtual destructor might cause undefined behaviour [-Wdelete-non-virtual-dtor]
  delete m_impl;

Your gdal is the wrong version. It needs to be > 1.10 according to the mr4c dependencies list.

ec2-user@cloudera1 mr4c]$ sudo yum info  gdal
Loaded plugins: amazon-id, rhui-lb, security
Installed Packages
Name        : gdal
Arch        : x86_64
Version     : 1.7.3

What I have found is that it will compile with 1.9.x which goes in the EPEL RedHat repos. So .. a choice - so far I have not found issues with 1.9 vs needing 1.10 but again, we are not using gdal directly. I suggest that you upgrade to 1.10 regardless.

the fast cheat path for compilation is

sudo yum upgrade gdal gdal-devel
#!/bin/sh
#
# RHEL 6
# ramon@thebuckland.com
# 16 Sep 2015
#
# this was the first cluster - all devel files are not needed ..
# see below for a leaner operation
#
if [ ! -d mr4c_node_deploy ]; then
mkdir mr4c_node_deploy
cd mr4c_node_deploy
if [ ! -f ../log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm ]; then
echo STOPPING: You need a prebuilt log4cxx RPM. Download from https://www.dropbox.com/s/o7t0jsqn4ejv8eb/log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm?dl=0
exit
else
cp ../log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm .
fi
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
wget http://elgis.argeo.org/repos/6/elgis-release-6-6_0.noarch.rpm
wget http://proj.badc.rl.ac.uk/cedaservices/raw-attachment/ticket/670/armadillo-3.800.2-1.el6.x86_64.rpm
cd ..
fi
rm mr4c_node_deploy.tar.gz
tar cvfz mr4c_node_deploy.tar.gz mr4c_node_deploy/
#
# Create a runner script that installs all the MR4C dependencies on a node
#
cat << EOF > mr4c_build_on_node.sh
#!/bin/sh
tar xvfz mr4c_node_deploy.tar.gz
cd mr4c_node_deploy
# Repos and the one odd one out
rpm -Uvh epel-release-latest-6.noarch.rpm
rpm -Uvh elgis-release-6-6_0.noarch.rpm
# needed for armadillo
yum -y install blas lapack atlas
rpm -Uvh armadillo-3.800.2-1.el6.x86_64.rpm
rpm -Uvh log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm
# RPMs
yum -y install apr apr-util apr-devel apr-util-devel
yum -y install devtoolset-2-binutils
# the devel libs are not required on the run nodes
yum -y install gdal # gdal-devel
yum -y install cppunit # cppunit-devel
yum -y install jansson #jansson-devel
yum -y install proj #proj-devel
echo /usr/local/lib > /etc/ld.so.conf.d/usrlocal.conf
ldconfig
EOF
cat << 'EOOF' > mr4c_node_deploy_runner.sh
#!/bin/sh
HOST=$1
echo "::: Deploying mr4c dependencies onto $HOST"
scp mr4c_node_deploy.tar.gz mr4c_build_on_node.sh root@$HOST:~
ssh root@$HOST 'sh mr4c_build_on_node.sh'
EOOF

Deploying jobs onto your cluster requires that each execution node has the required libraries that mr4c needs, in order that your job executes. (see below for more information on why and what 8. Dependent Native Libraries.md)

Through trial and error, it has been found that you can't easily include these native libraries as dependenicies as part of the algorithm configuration, but rather have to install it on each node in your cluster.

So without further delay, here is the full list

  • log4cxx - use the RPM I built above
  • jansson -
  • gdal
  • proj4
  • (and the way gdal needs an older version of armadillo ) - and armadillo is manual and needs lapack and atlas and blas

This is what I ran on every data node in the cluster to make it 'work' (the tests and examples)

    curl 'https://www.dropbox.com/s/9una8sm9en9sjv0/log4cxx-devel-0.11.0.trunk.20150916-19.el6.x86_64.rpm?dl=0' -o /tmp/log4cxx-devel-0.11.0.trunk.20150916-19.el6.x86_64.rpm
    sudo yum -y localinstall /tmp/log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm
    sudo yum -y localinstall https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
    sudo yum -y localinstall http://elgis.argeo.org/repos/6/elgis-release-6-6_0.noarch.rpm
    sudo yum -y localinstall http://proj.badc.rl.ac.uk/cedaservices/raw-attachment/ticket/670/armadillo-3.800.2-1.el6.x86_64.rpm
    sudo yum -y install jansson gdal proj4 blas lapack atlas
    sudo yum -y upgrade gdal

Remote deployment

Remote deployment of depenencies onto / across your cluster is best handled with a private key (pem) and sudo from a user. The default settings in redhat prohibit sudo over a non-tty - so you will need to disable that

@see http://unix.stackexchange.com/questions/122616/why-do-i-need-a-tty-to-run-sudo-if-i-can-sudo-without-a-password

vi /etc/sudoers and change the following (comment it out)
...
Defaults    requiretty  <-- comment out this line

Once you have that in place you can do things like

$ cat installs
sudo yum -y install jansson
$ cat clusterhosts |  xargs -L1 -IYY ssh -i mysecret.pem user@YY  `cat installs`

Install Requirements

log4cxx may(is?) needed on every node (as mr4c does not seem to copy it across).

When I try to run algorithms without it on the node they fail. When I try to include the libary as a native dependency using the JSON file, mr4c seems to locate the library, but then it fails to execute.

log4cxx is found in /usr/lib64 My quick way to deploy it across the cluster

cat cluster-hosts | xargs -L1 -IYY scp -i specialkey.pem ~/Dropbox/web-shared/development/log4cxx/log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm user@YY:/tmp
cat cluster-hosts | xargs -L1 -IYY ssh -i specialkey.pem user@YY sudo yum localinstall /tmp/log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm

The environment needs to be right. This is for RHEL

[user@cloudera1 test]$ cat ../../environment_setup
#!/bin/sh

. /opt/rh/devtoolset-2/enable
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/lib64
export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal
[user@cloudera1 test]$ . ../../environment_setup

When you launch the mr4c tests, (mr4c/test) depending on your cluster you may need to edit the run jobs. Essentially, the tests pass in the URLs into hadoop for the Name Node and the remove Data Nodes will use these URLs to communicate back to the cluster.

So .. check mr4c/test/conf/* and mr4c/test/bin/* for the right jobtracker nameNode Host names.

What Libs do I need ?

When you are running your own algorithm, and use 3rd party libraries sometimes it will fail to load. At first it seems complex as to how it operates, but after some unpicking the following can be made.

  1. mr4c will bundle all the dependant native libraries you specify in your job.json file. For example

     "algoConfig" : { "inline" : { "artifact" : "Map",
     "name" : "map", 
     "type" : "NATIVEC",
     "extras" : ["libarmadillo.so.3","gdal","mr4cgeo"],
    

When mr4c bundles up and executes your job, it will list the various libraries it found. For example:

2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: jna.platform.library.path=/usr/lib64:/lib64:/usr/lib:/lib
2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: LD_LIBRARY_PATH=/home/ec2-user/build/mr4c/tutorial/example7_yarn/lib:/usr/local/mr4c/geospatial/dist:/opt/rh/devtoolset-2/root/usr/lib64:/opt/rh/devtoolset-2/root/usr/lib:/usr/lib64:/usr/local/lib
2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: MR4C native library found at [/usr/local/lib/libmr4c.so]
2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Loading native algorithm library [Map]
2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Native algorithm library found at [/home/ec2-user/build/mr4c/tutorial/example7_yarn/lib/libMap.so]
2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Loading extra native library [freexl]
2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Extra native library found at [/usr/lib64/libfreexl.so.1]
2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Loading extra native library [libarmadillo.so.3]
2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Extra native library found at [/home/ec2-user/build/mr4c/tutorial/example7_yarn/lib/libarmadillo.so.3]
2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Loading extra native library [gdal]
2015-09-24 08:09:33,676 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Extra native library found at [/usr/lib64/libgdal.so]
2015-09-24 08:09:33,677 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Loading extra native library [mr4cgeo]
2015-09-24 08:09:33,677 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Extra native library found at [/usr/local/mr4c/geospatial/dist/libmr4cgeo.so]
2015-09-24 08:09:33,677 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: End loading native libraries

You can see here which libraries mr4c is bundling, all the ones configured in the .json file. Now if ANY of the native libaries depend on another, then either, put it also in the list (.json file) or have it installed on all data nodes.

You will know it has failed to load them by looking at the yarn logs -applicationId <application_xxxxxxxxxx_xxxx> eg:

the first hint that a load failed

A stack trace will show the libgdal failed to load.. and looking at the initial starts of the log, you can see it was loaded in.

            Caused by: java.lang.UnsatisfiedLinkError: Unable to load library 'Map': libgdal.so.1: cannot open shared object file: No such file or directory
            	at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:169)
            	at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:242)
            	at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:205)
            	at com.google.mr4c.nativec.jna.JnaUtils.doLoadLibrary(JnaUtils.java:127)
            	at com.google.mr4c.nativec.jna.JnaUtils.loadLibraryWithTiming(JnaUtils.java:118)
            	at com.google.mr4c.nativec.jna.JnaUtils.loadLibrary(JnaUtils.java:102)
            	at com.google.mr4c.nativec.jna.JnaNativeAlgorithm.loadNativeLibraries(JnaNativeAlgorithm.java:63)
            	at com.google.mr4c.nativec.NativeAlgorithm.init(NativeAlgorithm.java:61)
            	at com.google.mr4c.algorithm.Algorithms.getAlgorithm(Algorithms.java:102)
            	at com.google.mr4c.sources.ConfiguredExecutionSource.getAlgorithm(ConfiguredExecutionSource.java:112)
            	at com.google.mr4c.hadoop.HadoopMapper.configure(HadoopMapper.java:52)
            	... 22 more
look up a little for these lines

Look up the yarn logs a little further and you will find that (in this case) libgdal had a dependency on libgeos_c which was NOT loaded into the .json config.

            Loading libgdal.so failed (java.lang.UnsatisfiedLinkError: /opt/data/yarn/nm/usercache/ec2-user/filecache/143/libgdal.so: libgeos_c.so.1: cannot open shared object file: No such file or directory)
            Loading libmr4cgeo.so failed (java.lang.UnsatisfiedLinkError: /opt/data/yarn/nm/usercache/ec2-user/filecache/145/libmr4cgeo.so: libgdal.so.1: cannot open shared object file: No such file or directory)
            Loading libMap.so failed (java.lang.UnsatisfiedLinkError: /opt/data/yarn/nm/usercache/ec2-user/filecache/146/libMap.so: libgdal.so.1: cannot open shared object file: No such file or directory)
            Loading libgdal.so failed (java.lang.UnsatisfiedLinkError: /opt/data/yarn/nm/usercache/ec2-user/filecache/143/libgdal.so: libgeos_c.so.1: cannot open shared object file: No such file or directory)
            Loading libmr4cgeo.so failed (java.lang.UnsatisfiedLinkError: /opt/data/yarn/nm/usercache/ec2-user/filecache/145/libmr4cgeo.so: libgdal.so.1: cannot open shared object file: No such file or directory)
            Loading libMap.so failed (java.lang.UnsatisfiedLinkError: /opt/data/yarn/nm/usercache/ec2-user/filecache/146/libMap.so: libgdal.so.1: cannot open shared object file: No such file or directory)

So if you have a library that has a large dependency list. You may want to consider putting each onto the cluster DN's because this way it will be less to explicitly define and also less to configure (and less to transfer in of course).

How does it locate the libraries ?

This may be wrapped up in how JNA works, but once on the cluster, if NATIVE lib 1, depends on NATIVE lib 2.. and NATIVE lib 2, was called, libspecial.so.3.440, but the dependency is marked as libspecial.so.3 then mr4c/jna will possibly error saying it cannot find the file. The reason is down to what is copied in.

This was discovered by trial and error.

  • libarmadillo is a dependency for mr4c. but I wanted to "not" install it on the data nodes and see what happened.

  • gdal has libarmadillo.so.3 down as a dependency.

  • I added ["armadillo"] to the "extras" list. And mr4c copied across libarmadillo.so.3.800.2 This is correct because of the symlinks

      [ec2-user@cloudera1 example7_yarn]$ ls -al /usr/lib64/libarma*
      lrwxrwxrwx. 1 root root    23 Sep 24 08:21 /usr/lib64/libarmadillo.so.3 -> libarmadillo.so.3.800.2
      -rwxr-xr-x. 1 root root 26608 Apr 14  2013 /usr/lib64/libarmadillo.so.3.800.2
    

but when the job went to run, mr4c failed (actually the JNA portion fails) because it was looking for libarmadillo.so.3 and did not find it. The problem is that hdfs does not support symlinks - and mr4c does not rename the file as it was originally named / dependend on.

So, the fix is simple enough:

  1. either add the native lib to the cluster (data nodes) so that it is there wher NATIVE lib 1 tries to load
  2. make a local copy of the lib with the "non-versioned" name, AND add the filename as a dependency (mr4c supports a filename as well as the library name) e.g. "extras" : ["freexl", "libarmadillo.so.3","gdal","mr4cgeo"]
  3. BAD - remove the soft symlink and create a hard symlink of the library

I think this is a bug (what would be good is to copy the "found" library, into the job.jar as the expected (searched for) filename. eg: copy /usr/lib64/libarmadillo.so.3.800.2 into the job.jar as libarmadillo.so.3

From time to time when you first develop your application (change perhaps) to run on mr4c, you may get a stack exception that looks like the below.

Exception in thread "main" java.lang.UnsatisfiedLinkError: Unable to load library 'MyLibSpecial': /home/user/build/project1/MyLibSpecial/lib/libMyLibSpecial.so: undefined symbol: _ZN15TApplicationImp11ShowMembersER16TMemberInspector
        at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:169)
        at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:242)
        at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:205)
        at com.google.mr4c.nativec.jna.JnaUtils.doLoadLibrary(JnaUtils.java:127)
        at com.google.mr4c.nativec.jna.JnaUtils.loadLibraryWithTiming(JnaUtils.java:118)
        at com.google.mr4c.nativec.jna.JnaUtils.loadLibrary(JnaUtils.java:102)
        at com.google.mr4c.nativec.jna.JnaNativeAlgorithm.loadNativeLibraries(JnaNativeAlgorithm.java:63)
        at com.google.mr4c.nativec.NativeAlgorithm.init(NativeAlgorithm.java:61)
        at com.google.mr4c.algorithm.Algorithms.getAlgorithm(Algorithms.java:102)
        at com.google.mr4c.sources.ConfiguredExecutionSource.getAlgorithm(ConfiguredExecutionSource.java:112)
        at com.google.mr4c.sources.ConfiguredExecutionSource.getAlgorithm(ConfiguredExecutionSource.java:107)
        at com.google.mr4c.AlgoRunner.validateExecutionSource(AlgoRunner.java:106)
        at com.google.mr4c.AlgoRunner.<init>(AlgoRunner.java:102)
        at com.google.mr4c.AlgoRunner.<init>(AlgoRunner.java:86)
        at com.google.mr4c.hadoop.HadoopAlgoRunner.getAlgoRunner(HadoopAlgoRunner.java:242)
        at com.google.mr4c.hadoop.RemoteAlgoRunner.addFiles(RemoteAlgoRunner.java:82)
        at com.google.mr4c.hadoop.RemoteAlgoRunner.doBuildJob(RemoteAlgoRunner.java:77)
        at com.google.mr4c.hadoop.HadoopAlgoRunner.buildJob(HadoopAlgoRunner.java:99)
        at com.google.mr4c.hadoop.HadoopAlgoRunner.execute(HadoopAlgoRunner.java:87)
        at com.google.mr4c.hadoop.RemoteAlgoRunner.main(RemoteAlgoRunner.java:50)

What is going on is that mr4c has found 'your' library, but it has been unable to locate the depedant shared libraries, specifically the one that provides '_ZN15TApplicationImp11ShowMembersER16TMemberInspector' To resolve this, firstly the answer is that you are missing a library in the "extras" : [ "libName" ] section of your mr4c.json file.

The second part is to work out from which shared library this symbol comes (and thus which shared lib you need to declare).

Simply, I had compiled the 3rd party shared library, and compiled against it's include/*.h files.

Running

find /home/custom/dir -name '*.so*' -exec nm --print-file-name --defined-only --dynamic {} \;  | grep _ZN15TApplicationImp11ShowMembersER16TMemberInspector

will locate the shared library in this case "root"/libCore.so

And then you need to set about making sure it's path, where the lib is located, is in the LD_LIBRARY_PATH so that mr4c can locate the file.

/home/user/builds/root/lib/libCore.so:000000000029ba24 T _ZN15TApplicationImp11ShowMembersER16TMemberInspector

Check two things

  1. Make sure your LD_LIBRARY_PATH has the path to the library above in question
  2. Make sure that when you compiled the shared library, that you linked it to the lib in question (-lsomelib)

The following error was encountered a number of times during initial development. The cause is a stale Linked Library (older compile).

The marker that it is stale linked is that mr4c cannot "load" the Native Algorithm. You want to see the Loading native ... and the End loading ...

2015-09-28 21:58:27,856 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Loading native algorithm library [Worker]
2015-09-28 21:58:27,860 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Native algorithm library found at [/home/ec2-user/build/someproj/lib/libWorker.so]
...
2015-09-28 21:58:27,860 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: End loading native libraries

These two lines are real markers that the library could be loaded without weird unreferenced symbols. Of course, the loading on the Gateway workstation will "work" there because the LD_LIBRARY_PATH will be okay. You must make sure that all the libs you specified, worked out, that are required during the linking phase, are also in the <mr4c>project.json file.

The only other hint that it may be the cause (this stack dump could occur in other scenarios) is that it is at the point that mr4c is trying to load the Algorthm (from the macro that calles ".create()")

This error

C  [libmr4c.so+0x1849e4]  CExternalAlgorithm_getSerializedAlgorithm+0x10

in the stack is what signifies this problem. So check your compilation targets, outputs, all shared libs - are they up to date. Are the paths correct (mine were not).

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f5f4398b9e4, pid=22701, tid=140047407781632
#
# JRE version: OpenJDK Runtime Environment (7.0_65-b17) (build 1.7.0_65-mockbuild_2014_07_14_06_19-b00)
# Java VM: OpenJDK 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libmr4c.so+0x1849e4]  CExternalAlgorithm_getSerializedAlgorithm+0x10
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/jvm-22701/hs_error.log
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
/usr/local/bin/mr4c_hadoop_remote: line 23: 22701 Aborted                 (core dumped) java -cp "$MR4C_CLASSPATH" -Djna.library.path=$MR4C_LIBPATH -Dmr4c.hadoop.algorithm.classpath=$MR4C_ALGORITHM_CLASSPATH -Dmr4c.log4j=$MR4C_LOG4J_CONFIG -Dmr4c.site=$MR4C_SITE $MR4C_JAVA_OPTS com.google.mr4c.hadoop.RemoteAlgoRunner $MR4C_JAR_WITH_LIBS $*

Inside the dump file:

Current thread (0x00007f5f4c00d000):  JavaThread "main" [_thread_in_native, id=22702, stack(0x00007f5f53ee1000,0x00007f5f53fe2000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000000

and the java JVM call stack.

Stack: [0x00007f5f53ee1000,0x00007f5f53fe2000],  sp=0x00007f5f53fdf7c0,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libmr4c.so+0x1849e4]  CExternalAlgorithm_getSerializedAlgorithm+0x10
C  [jna8562777558505049598.tmp+0x12034]  ffi_call_unix64+0x4c

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.sun.jna.Native.invokePointer(JI[Ljava/lang/Object;)J+0
j  com.sun.jna.Function.invokePointer(I[Ljava/lang/Object;)Lcom/sun/jna/Pointer;+6
j  com.sun.jna.Function.invokeString(I[Ljava/lang/Object;Z)Ljava/lang/String;+3
j  com.sun.jna.Function.invoke([Ljava/lang/Object;Ljava/lang/Class;Z)Ljava/lang/Object;+544
j  com.sun.jna.Function.invoke(Ljava/lang/Class;[Ljava/lang/Object;Ljava/util/Map;)Ljava/lang/Object;+214
j  com.sun.jna.Library$Handler.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;+341
j  com.sun.proxy.$Proxy16.CExternalAlgorithm_getSerializedAlgorithm(Lcom/google/mr4c/nativec/jna/lib/Mr4cLibrary$CExternalAlgorithmPtr;)Ljava/lang/String;+16
j  com.google.mr4c.nativec.jna.JnaExternalAlgorithm.getSerializedAlgorithm()Ljava/lang/String;+7
j  com.google.mr4c.nativec.ExternalAlgorithmSerializer.deserializeAlgorithm(Lcom/google/mr4c/nativec/ExternalAlgorithm;)Lcom/google/mr4c/algorithm/AlgorithmSchema;+9
j  com.google.mr4c.nativec.NativeAlgorithm.loadAlgorithm()V+22
j  com.google.mr4c.nativec.NativeAlgorithm.init()V+9
j  com.google.mr4c.algorithm.Algorithms.getAlgorithm(Lcom/google/mr4c/config/algorithm/AlgorithmConfig;Lcom/google/mr4c/algorithm/AlgorithmEnvironment;)Lcom/google/mr4c/algorithm/Algorithm;+81
j  com.google.mr4c.sources.ConfiguredExecutionSource.getAlgorithm(Lcom/google/mr4c/algorithm/AlgorithmEnvironment;)Lcom/google/mr4c/algorithm/Algorithm;+13
j  com.google.mr4c.sources.ConfiguredExecutionSource.getAlgorithm()Lcom/google/mr4c/algorithm/Algorithm;+8
j  com.google.mr4c.AlgoRunner.validateExecutionSource()V+4
j  com.google.mr4c.AlgoRunner.<init>(Lcom/google/mr4c/sources/ExecutionSource;Lcom/google/mr4c/config/category/MR4CConfig;)V+42
j  com.google.mr4c.AlgoRunner.<init>(Lcom/google/mr4c/AlgoRunner$AlgoRunnerConfig;)V+12

This happens right after the Algorithm has finished loading it's native shared libs.

When running the Linker after compiling all your source, GCC will not warn you if there are any undefined symbols that you forgot to link in (-lsomelib). To resolve this, add -Wl,-z,defs to the linker command. This will ensure that GCC errors.

For example, without

[ec2-user@cloudera1 workerLib]$ make -f worker.makefile
rm -rf ./lib
rm -rf ./objs
mkdir -p ./lib
mkdir -p ./objs
g++ -c -I/usr/local/boost_1_59_0/include -I/usr/local/mr4c/native/include -I./src/cpp -fPIC -Wall -std=c++0x -o ./objs/worker.o ./src/cpp/worker.cpp
g++ -L/usr/local/mr4c/native/lib -L/usr/local/boost_1_59_0/lib -rdynamic -shared -fPIC ./objs/worker.o -o ./lib/libWorker.so -lmr4c -lrt -lm -pthread

With (telling the linker to warn/error on missing symbols)

...
    g++ -L/usr/local/mr4c/native/lib -L/usr/local/boost_1_59_0/lib -rdynamic -shared -fPIC ./objs/worker.o -o ./lib/libWorker.so -lmr4c -lrt -lm -pthread -Wl,-z,defs
./objs/worker.o: In function `__static_initialization_and_destruction_0(int, int)':
worker.cpp:(.text+0xc97): undefined reference to `boost::system::generic_category()'
worker.cpp:(.text+0xca3): undefined reference to `boost::system::generic_category()'
worker.cpp:(.text+0xcaf): undefined reference to `boost::system::system_category()'
collect2: error: ld returned 1 exit status
make: *** [libWorker] Error 1

:-) VERY Helpful. Without this you will tear your hair out with missing libraries and other nasties suring mr4c launch.

You may get a stack trace from loading an extra (3rd Party) library.

2015-09-28 22:08:37,893 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Loading extra native library [somelibrary]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000001300, pid=32653, tid=140387226277632
#
# JRE version: OpenJDK Runtime Environment (7.0_65-b17) (build 1.7.0_65-mockbuild_2014_07_14_06_19-b00)
# Java VM: OpenJDK 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  0x0000000000001300

The "problem" frame 0x0000000000001300 has come up a few times; and it seems to relate to a missing dependency of that library you included.

Take a look at the library in question and determine what "it" depends on

[ec2-user@cloudera1 myproject]$ readelf -d /usr/local/3rdparty/lib/libsomelibrary.so | grep NEED
 0x0000000000000001 (NEEDED)             Shared library: [libboost_chrono.so.1.59.0]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgomp.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000006ffffffe (VERNEED)            0x2a8a0
 0x000000006fffffff (VERNEEDNUM)         5

Of these librarues, boost_chrono will be required, (the others are system 'available'). So add that to your <mr4c>project.json file.

This is perhaps the hardest issue to resolve, and one I am still working on.

Your code looks good - you link an extra library and ..

...
2015-09-29 02:32:33,543 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: End loading native libraries
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fd8fde8e9e4, pid=23280, tid=140570224473856
#
# JRE version: OpenJDK Runtime Environment (7.0_65-b17) (build 1.7.0_65-mockbuild_2014_07_14_06_19-b00)
# Java VM: OpenJDK 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libmr4c.so+0x1849e4]  CExternalAlgorithm_getSerializedAlgorithm+0x10
#
# Core dump written. Default location: /home/ec2-user/build/project/app/core or core.23280
#
# An error report file with more information is saved as:
# /tmp/jvm-23280/hs_error.log
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Looking into the error report we see that it was trying to call CExternalAlgorithm_getSerializedAlgorithm which is the porting that is loading your Algorithm from the Macro MR4C_REGISTER_ALGORITHM(map,Map::create());

...
Stack: [0x00007fd90e3bb000,0x00007fd90e4bc000],  sp=0x00007fd90e4b97c0,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libmr4c.so+0x1849e4]  CExternalAlgorithm_getSerializedAlgorithm+0x10
C  [jna3284628887308968805.tmp+0x12034]  ffi_call_unix64+0x4c

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.sun.jna.Native.invokePointer(JI[Ljava/lang/Object;)J+0
j  com.sun.jna.Function.invokePointer(I[Ljava/lang/Object;)Lcom/sun/jna/Pointer;+6
j  com.sun.jna.Function.invokeString(I[Ljava/lang/Object;Z)Ljava/lang/String;+3
j  com.sun.jna.Function.invoke([Ljava/lang/Object;Ljava/lang/Class;Z)Ljava/lang/Object;+544
j  com.sun.jna.Function.invoke(Ljava/lang/Class;[Ljava/lang/Object;Ljava/util/Map;)Ljava/lang/Object;+214
j  com.sun.jna.Library$Handler.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;+341
j  com.sun.proxy.$Proxy16.CExternalAlgorithm_getSerializedAlgorithm(Lcom/google/mr4c/nativec/jna/lib/Mr4cLibrary$CExternalAlgorithmPtr;)Ljava/lang/String;+16
j  com.google.mr4c.nativec.jna.JnaExternalAlgorithm.getSerializedAlgorithm()Ljava/lang/String;+7
j  com.google.mr4c.nativec.ExternalAlgorithmSerializer.deserializeAlgorithm(Lcom/google/mr4c/nativec/ExternalAlgorithm;)Lcom/google/mr4c/algorithm/AlgorithmSchema;+9
j  com.google.mr4c.nativec.NativeAlgorithm.loadAlgorithm()V+22
j  com.google.mr4c.nativec.NativeAlgorithm.init()V+9
j  com.google.mr4c.algorithm.Algorithms.getAlgorithm(Lcom/google/mr4c/config/algorithm/AlgorithmConfig;Lcom/google/mr4c/algorithm/AlgorithmEnvironment;)Lcom/google/mr4c/algorithm/Algorithm;+81
j  com.google.mr4c.sources.ConfiguredExecutionSource.getAlgorithm(Lcom/google/mr4c/algorithm/AlgorithmEnvironment;)Lcom/google/mr4c/algorithm/Algorithm;+13
j  com.google.mr4c.sources.ConfiguredExecutionSource.getAlgorithm()Lcom/google/mr4c/algorithm/Algorithm;+8
j  com.google.mr4c.AlgoRunner.validateExecutionSource()V+4
j  com.google.mr4c.AlgoRunner.<init>(Lcom/google/mr4c/sources/ExecutionSource;Lcom/google/mr4c/config/category/MR4CConfig;)V+42
j  com.google.mr4c.AlgoRunner.<init>(Lcom/google/mr4c/AlgoRunner$AlgoRunnerConfig;)V+12

This issue is currently unresolved.

This is a little unusual, and I have not the time to chase exactly which / why. The reccommendation is to ensure that ALL the libs you are loading in the .json file, are "ALSO" only the ones specified for linking.

I have come across a scenario where the lib was linked, and loaded in the .json file, but the code did not use it at all.

There is a linker flag --as-needed which could be used to strip it out, but at the time I was not using it - and it seems that that the "extra" lib was causing issues.

The error (the JVM crash) occurs right at the first point of loading your shared library. (you won't see the next normal line of Native algorithm library found at ... instead it dies before that.

2015-09-29 22:28:54,672 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: MR4C native library found at [/usr/local/lib/libmr4c.so]
2015-09-29 22:28:54,672 INFO  mr4c.java.nativec.jna.JnaNativeAlgorithm: Loading native algorithm library [Map]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000001300, pid=8382, tid=140276540757760
#
# JRE version: OpenJDK Runtime Environment (7.0_65-b17) (build 1.7.0_65-mockbuild_2014_07_14_06_19-b00)
# Java VM: OpenJDK 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  0x0000000000001300
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/jvm-8382/hs_error.log
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment