Skip to content

Instantly share code, notes, and snippets.

@maheshakya
Created May 28, 2015 03:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save maheshakya/596132803ef24a5a0f8f to your computer and use it in GitHub Desktop.
Save maheshakya/596132803ef24a5a0f8f to your computer and use it in GitHub Desktop.
Spark Linear regression test
6 148 72 35 0 336 627 50 1
1 85 66 29 0 266 351 31 0
8 183 64 0 0 233 672 32 1
1 89 66 23 94 281 167 21 0
0 137 40 35 168 431 2288 33 1
5 116 74 0 0 256 201 30 0
3 78 50 32 88 310 248 26 1
10 115 0 0 0 353 134 29 0
2 197 70 45 543 305 158 53 1
8 125 96 0 0 0 232 54 1
4 110 92 0 0 376 191 30 0
10 168 74 0 0 380 537 34 1
10 139 80 0 0 271 1441 57 0
1 189 60 23 846 301 398 59 1
5 166 72 19 175 258 587 51 1
7 100 0 0 0 300 484 32 1
0 118 84 47 230 458 551 31 1
7 107 74 0 0 296 254 31 1
1 103 30 38 83 433 183 33 0
1 115 70 30 96 346 529 32 1
3 126 88 41 235 393 704 27 0
import sys
from pyspark import SparkContext
from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD
from numpy import array
# Load and parse data
def parse_point(line):
values = [float(x) for x in line.split(',')]
return LabeledPoint(values[0], values[1:])
sc = SparkContext(appName='LinearRegression')
# Add path to your dataset.
data = sc.textFile('dummy_data_sest.csv')
parsedData = data.map(parse_point)
# Build the model
model = LinearRegressionWithSGD.train(parsedData)
# Check model weight vector
print(model.weights)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment