https://github.com/google/mr4c http://google-opensource.blogspot.com.au/2015/02/mapreduce-for-c-run-native-code-in.html
mr4c and it's dependencies will be required installs on all nodes that run your algorithms; This goes without saying, but of course if you get exceptions like below, then that is literally what is going on.
Caused by: java.lang.UnsatisfiedLinkError: Unable to load library '/data/nvme/yarn/nm/usercache/genuser/filecache/10/libmr4c.so': liblog4cxx.so.10: cannot open shared object file: No such file or directory
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:169)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:242)
at com.google.mr4c.nativec.jna.lib.Mr4cLibrary.<clinit>(Mr4cLibrary.java:21)
at com.google.mr4c.nativec.jna.JnaExternalEntry.<clinit>(JnaExternalEntry.java:41)
at com.google.mr4c.nativec.jna.JnaExternalFactory.newEntry(JnaExternalFactory.java:48)
at com.google.mr4c.nativec.NativeAlgorithm.<init>(NativeAlgorithm.java:56)
at com.google.mr4c.nativec.jna.JnaNativeAlgorithm.<init>(JnaNativeAlgorithm.java:52)
at com.google.mr4c.algorithm.Algorithms$NativeFactory.create(Algorithms.java:138)
at com.google.mr4c.algorithm.Algorithms.getAlgorithm(Algorithms.java:99)
at com.google.mr4c.sources.ConfiguredExecutionSource.getAlgorithm(ConfiguredExecutionSource.java:112)
at com.google.mr4c.hadoop.HadoopMapper.configure(HadoopMapper.java:52)
Ubuntu 14.04 is pretty close to having everything we need.
- download Apache ANT - unpack to /opt
- create symlink for /opt/apache-ant -> /opt/apache-ant-version
- download Apache Ivy (deps) and place ivy*.jar and lib/*.jar into ANT_HOME/lib
- download and install openjdk - create a symlink /opt/apache-ant/bin:/opt/jdk/bin
An example of the /opt
rbuckland@host2:~$ ls -al /opt
total 20
drwxr-xr-x 5 root root 4096 Sep 9 15:49 .
drwxr-xr-x 24 root root 4096 Sep 9 21:20 ..
lrwxrwxrwx 1 root root 17 Sep 9 15:49 apache-ant -> apache-ant-1.9.6/
drwxr-xr-x 6 root root 4096 Sep 9 15:28 apache-ant-1.9.6
drwxr-xr-x 6 cloudera-scm cloudera-scm 4096 Sep 8 14:34 cloudera
lrwxrwxrwx 1 root root 11 Sep 9 15:49 jdk -> jdk1.8.0_60
drwxr-xr-x 8 root root 4096 Sep 9 15:47 jdk1.8.0_60
A common setup used
rbuckland@host2:~$ cat /etc/environment
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/apache-ant/bin:/opt/jdk/bin"
JAVA_HOME=/opt/jdk
Install the prerequesites for mr4c
sudo apt-get install libgdal1-dev libgdal1h subversion \
build-essential autoconf libproj-dev libproj0 \
libcppunit-dev libcppunit-1.13-0 libjansson-dev libjansson4 \
git
svn checkout http://svn.apache.org/repos/asf/incubator/log4cxx/trunk apache-log4cxx
First - Apache Runtime and it's utilities
sudo apt-get install libaprutil1 libaprutil1-dev libapr1-dev libapr1
log4cxx - The packaged version in Ubuntu is out of date. (trunk has fixes) so it needs to be installed
- ./autogen.sh
- ./configure
- make
- sudo make install
- sudo ldconfig # update cache of the new lib installed
mr4c will need ant at the very end of the build to package and deploy all the jars
Install Apache Ivy into ant (lib/*.jars and deps)
- have ant in the PATH
- have invy installed in Ant
### Finally, clone and build mr4c
export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal
./build_all
We need to install a few dependencies in order to compile mr4c. You will need the Repo for Optionals (from RedHat) - some dependencies further down will require that. https://github.com/google/mr4c#dependencies
git clone https://github.com/google/mr4c.git
### Installing log4cxx
I have already done this and packaged them up as an RPM. The RPM you will need because you need to distribute it across all your data nodes in the cluster. The details on ow it is built is below.
Here are the (S)RPMs
and the other bits
- log4cxx-devel-0.11.0.trunk.20150916-19.el6.x86_64.rpm
- log4cxx-debuginfo-0.11.0.trunk.20150916-19.el6.x86_64.rpm
- log4cxx-0.11.0.trunk.20150916-19.el6.src.rpm
Download it first using subversion
svn checkout http://svn.apache.org/repos/asf/incubator/log4cxx/trunk apache-log4cxx
sudo yum install apr apr-util apr-devel apr-util-devel
Before we begin building log4cxx, we need to upgrade binutils. The problem will be (if you raced ahead) that make fails. Also there are some odd errors when running autogen.sh
When running 'make' for log4cxx, we get the erro
/tmp/cc1ghG2e.s: Assembler messages:
/tmp/cc1ghG2e.s:1088: Error: expecting string instruction after `rep'
This is documented here as a bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57017 And noted here as will in some other apps issues mxe/mxe#404
Solution is to upgrade to binutils 2.23.52.0.1 (ld linker bug resolved)
RHEL6 has made the upgrade package available as a separate dependency.
https://rhn.redhat.com/errata/RHBA-2014-0270.html
sudo yum install devtoolset-2-binutils
log4cxx is not available on RedHat - so need to be compiled and installed
Now, Make sure you have installed the devtoolset-2-binutils above first. log4cxx - The packaged version in Ubuntu is out of date. (trunk has fixes) so it needs to be installed
Building it on RedHat has proven more tricky - a bit messy.
svn checkout http://svn.apache.org/repos/asf/incubator/log4cxx/trunk apache-log4cxx
cd apache-log4cxx
./autogen.sh # expect some strange errors here
./autogen.sh # the first time errors .. and for some reason the 2nd time it works.. (pass :-) )
./configure
make
sudo make install
sudo echo /usr/local/lib > /etc/ld.so.conf.d/usrlocal.conf
sudo ldconfig # update cache of the new lib installed
Next is to tackle the extra packages we need, that are not in the default RHEL. EPEL (Extra Packaged for Ent Linux) gives us what we need.
sudo rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
We also need ELGIS libraries and it depends on EPEL too so add that in
sudo rpm -Uvh http://elgis.argeo.org/repos/6/elgis-release-6-6_0.noarch.rpm
#### armadillo
It seems that gdal needs an old version or armadillo installed. It requires some other (in repo) dependencies: lapack and blas (both Linaer Albegra libraries), and cblas and clapack, provided by atlas.
So first install these
sudo yum -y install blas lapack atlas
And then Armadillo - (v3.800.2-1) - It can be found here
sudo rpm -Uvh http://proj.badc.rl.ac.uk/cedaservices/raw-attachment/ticket/670/armadillo-3.800.2-1.el6.x86_64.rpm
sudo yum install gdal gdal-devel
# these two lines will be required each time you setup a new shell
# the build will fail (below) will fail if the environment is not setup correctly.
export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal
sudo yum install cppunit cppunit-devel
sudo yum install jansson jansson-devel
sudo yum install proj proj-devel
sudo yum install apache-ant apache-ivy ant-junit
./build_all
sudo ./deploy_all
mr4c requires that any programs lib dependencies are also installed on every node in the cluster. There is the basic minimum set required by mr4c and the following script will automate that deployment for you.
-
Download the prebuild log4cxx RPM from log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm
-
Save this script into the same directory on your edge node (or any workstation with access to the cluster)
-
Run mr4c_node_prepartion.sh. This will make a .tar.gz of the dependencies required, and 2 other shell scripts.
-
Run mr4c_node_deploy_runner.sh <dn_hostname>
cat hostnames.txt | xargs -L1 -Ixx sh mr4c_node_deploy_runner.sh xx
This will (as root) copy the tar.gz and the simple installer across to each node in hostnames.txt
sudo yum install git
git clone https://github.com/google/mr4c.git
sudo yum install apr apr-util
mkdir mr4c_deps
wget https://www.dropbox.com/s/o7t0jsqn4ejv8eb/log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm?dl=0
sudo yum localinstall log4cxx-0.11.0.trunk.20150916-19.el6.x86_64.rpm*
sudo rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
sudo rpm -Uvh http://elgis.argeo.org/repos/6/elgis-release-6-6_0.noarch.rpm
sudo yum -y install blas lapack atlas
sudo rpm -Uvh http://proj.badc.rl.ac.uk/cedaservices/raw-attachment/ticket/670/armadillo-3.800.2-1.el6.x86_64.rpm
sudo yum install gdal gdal-devel
sudo yum install cppunit cppunit-devel
sudo yum install jansson jansson-devel
sudo yum install apache-ant apache-ivy
sudo yum install proj proj-devel
sudo yum install cmake gcc gcc-c++
sudo yum localinstall log4cxx-devel-0.11.0.trunk.20150916-19.el6.x86_64.rpm
## Other Random Issues
You will need to uprade the gcc to > 4.6. RHEL 6 by default has 4.4 https://groups.google.com/forum/#!msg/mr4c/uJf1en6iCTU/7PuI7_aDuUMJ
This .. of course needs to come from devtoolset-2
You will need to install the centos package and modify the repo URL
Then set the PATH and environment to the tools
. /opt/rh/devtoolset-2/enable
sudo wget http://people.centos.org/tru/devtools-2/devtools-2.repo -O /etc/yum.repos.d/devtools-2.repo
sudo vi /etc/yum.repos.d/devtools-2.repo
# and change the baseurl=http://people.centos.org/tru/devtools-2/$releasever/$basearch/RPMS
# to baseurl=http://people.centos.org/tru/devtools-2/6/$basearch/RPMS
then install devtoolset-2-gcc devtoolset-2-gcc-g++ devtoolset-2-binutils