Created
April 21, 2015 08:21
-
-
Save gilv/9d1e60aef201e76ffdba to your computer and use it in GitHub Desktop.
Documentation on the integration between Apache Spark and SoftLayer object store.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
------------------------------------------------------------------------- | |
Copyright IBM Corp. 2015, 2015 All Rights Reserved | |
Licensed under the Apache License, Version 2.0 (the "License"); | |
you may not use this file except in compliance with the License. | |
You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software | |
distributed under the License is distributed on an "AS IS" BASIS, | |
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
See the License for the specific language governing permissions and | |
Limitations under the License. | |
------------------------------------------------------------------------- | |
@author: Gil Vernik | |
Apache Spark integration with SoftLayer object store | |
**************************************************** | |
Background | |
---------- | |
Spark accesses OpenStack Swift via hadoop-openstack library. Current Swift driver | |
implements V2.0 authentication model based on the Keystone. | |
Authentication with swiftauth (V1.0) is provided in the pending patch | |
https://issues.apache.org/jira/browse/HADOOP-10420 which is not yet merged into Hadoop. | |
SoftLayer Object Store requires v1.0 authentication model and a temporary solution is | |
to download Hadoop sources locally, apply the patch, build and deploy into local Maven | |
repository. Hadoop can be any version from 2.4.0 and up. | |
Hadoop patch for SoftLayer object store | |
--------------------------------------- | |
We demonstrate version 2.6.0 | |
1. Download hadoop-2.6.0-src.tar.gz and extract it under hadoop-2.6.0 folder | |
2. Download file https://issues.apache.org/jira/secure/attachment/12662347/HADOOP-10420-007.patch | |
and save it under hadoop-2.6.0/ folder. Change directory to this folder. | |
3. Check that changes are visible , by executing: git apply --stat HADOOP-10420-007.patch | |
4. Check that patch can be applied, by executing: git apply --check HADOOP-10420-007.patch | |
5. Apply the path git apply HADOOP-10420-007.patch | |
6. Navigate to /hadoop-2.6.0/hadoop-tools/hadoop-openstack folder and | |
execute : mvn -DskipTests package | |
7. After build is successful , install this jar into Maven repository by | |
executing: mvn -DskipTests install | |
Building Spark with SoftLayer Object store | |
------------------------------------------ | |
Now that Hadoop-openstack is build and ready to be used, we can build the Spark. | |
Instructions how to configure and build Spark with Swift support can be found at | |
https://spark.apache.org/docs/latest/storage-openstack-swift.html | |
Configuring Spark | |
----------------- | |
V1 Authentication model requires different keys comparing to the Keystone model. | |
The following keys should be configured in the <spark-home>/conf/core-site.xml. | |
We demonstrate the configuration for SoftLayer Dallas server. | |
<configuration> | |
<property> | |
<name>fs.swift.impl</name> | |
<value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value> | |
</property> | |
<property> | |
<name>fs.swift.service.dal05.auth.url</name> | |
<value>https://dal05.objectstorage.softlayer.net/auth/v1.0</value> | |
</property> | |
<property> | |
<name>fs.swift.service.dal05.http.port</name> | |
<value>8080</value> | |
</property> | |
<property> | |
<name>fs.swift.service.dal05.public</name> | |
<value>true</value> | |
</property> | |
<property> | |
<name>fs.swift.service.dal05.location-aware</name> | |
<value>false</value> | |
</property> | |
<property> | |
<name>fs.swift.service.ibm.dal05.endpoint.prefix</name> | |
<value>endpoints</value> | |
</property> | |
<property> | |
<name>fs.swift.service.dal05.apikey</name> | |
<value>API_KEY</value> | |
</property> | |
<property> | |
<name>fs.swift.service.dal05.username</name> | |
<value>ACCOUNT:USER</value> | |
</property> | |
<property> | |
<name>fs.swift.service.dal05.use.get.auth</name> | |
<value>true</value> | |
</property> | |
</configuration> | |
Usage Example | |
------------- | |
Assume SoftLayer object store contains container "sparky". | |
In the ./bin/scalla-shell you can access container "sparky" with | |
scala> val data = sc.textFile("swift://sparky.dal05/*") | |
scala> data.count() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment