Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save gilv/9d1e60aef201e76ffdba to your computer and use it in GitHub Desktop.
Save gilv/9d1e60aef201e76ffdba to your computer and use it in GitHub Desktop.
Documentation on the integration between Apache Spark and SoftLayer object store.
-------------------------------------------------------------------------
Copyright IBM Corp. 2015, 2015 All Rights Reserved
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
Limitations under the License.
-------------------------------------------------------------------------
@author: Gil Vernik
Apache Spark integration with SoftLayer object store
****************************************************
Background
----------
Spark accesses OpenStack Swift via hadoop-openstack library. Current Swift driver
implements V2.0 authentication model based on the Keystone.
Authentication with swiftauth (V1.0) is provided in the pending patch
https://issues.apache.org/jira/browse/HADOOP-10420 which is not yet merged into Hadoop.
SoftLayer Object Store requires v1.0 authentication model and a temporary solution is
to download Hadoop sources locally, apply the patch, build and deploy into local Maven
repository. Hadoop can be any version from 2.4.0 and up.
Hadoop patch for SoftLayer object store
---------------------------------------
We demonstrate version 2.6.0
1. Download hadoop-2.6.0-src.tar.gz and extract it under hadoop-2.6.0 folder
2. Download file https://issues.apache.org/jira/secure/attachment/12662347/HADOOP-10420-007.patch
and save it under hadoop-2.6.0/ folder. Change directory to this folder.
3. Check that changes are visible , by executing: git apply --stat HADOOP-10420-007.patch
4. Check that patch can be applied, by executing: git apply --check HADOOP-10420-007.patch
5. Apply the path git apply HADOOP-10420-007.patch
6. Navigate to /hadoop-2.6.0/hadoop-tools/hadoop-openstack folder and
execute : mvn -DskipTests package
7. After build is successful , install this jar into Maven repository by
executing: mvn -DskipTests install
Building Spark with SoftLayer Object store
------------------------------------------
Now that Hadoop-openstack is build and ready to be used, we can build the Spark.
Instructions how to configure and build Spark with Swift support can be found at
https://spark.apache.org/docs/latest/storage-openstack-swift.html
Configuring Spark
-----------------
V1 Authentication model requires different keys comparing to the Keystone model.
The following keys should be configured in the <spark-home>/conf/core-site.xml.
We demonstrate the configuration for SoftLayer Dallas server.
<configuration>
<property>
<name>fs.swift.impl</name>
<value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value>
</property>
<property>
<name>fs.swift.service.dal05.auth.url</name>
<value>https://dal05.objectstorage.softlayer.net/auth/v1.0</value>
</property>
<property>
<name>fs.swift.service.dal05.http.port</name>
<value>8080</value>
</property>
<property>
<name>fs.swift.service.dal05.public</name>
<value>true</value>
</property>
<property>
<name>fs.swift.service.dal05.location-aware</name>
<value>false</value>
</property>
<property>
<name>fs.swift.service.ibm.dal05.endpoint.prefix</name>
<value>endpoints</value>
</property>
<property>
<name>fs.swift.service.dal05.apikey</name>
<value>API_KEY</value>
</property>
<property>
<name>fs.swift.service.dal05.username</name>
<value>ACCOUNT:USER</value>
</property>
<property>
<name>fs.swift.service.dal05.use.get.auth</name>
<value>true</value>
</property>
</configuration>
Usage Example
-------------
Assume SoftLayer object store contains container "sparky".
In the ./bin/scalla-shell you can access container "sparky" with
scala> val data = sc.textFile("swift://sparky.dal05/*")
scala> data.count()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment