Skip to content

Instantly share code, notes, and snippets.

@krishnanraman
Last active April 6, 2016 17:48
Show Gist options
  • Save krishnanraman/6346307 to your computer and use it in GitHub Desktop.
Save krishnanraman/6346307 to your computer and use it in GitHub Desktop.
Trivia: Generate primes between 1 and 100 million with Scalding Map-reduce ( well, just Map :) thanks @squarecog
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv) {
FILE * fp;
int i = 0;
fp = fopen ("numbers", "w+");
for(i = 1;i<100000000;i++) {
fprintf(fp, "%d\n", i);
}
fclose(fp);
return(0);
}
import com.twitter.scalding._
class PrimesMapReduce(args:Args) extends Job(args) {
val input = Tsv("numbers", 'number)
val output = Tsv("primes")
// sixTest : Every prime >3 can be expressed as 6n-1 or 6n+1. Handle the primes 2 or 3 separately
def isPrime(x:Int) = {
x match {
case 1 => false
case twoOrThree if (twoOrThree ==2 || twoOrThree == 3) => true
case sixTest if ((sixTest+1)%6 == 0 || (sixTest-1)%6 == 0) => (3 to math.sqrt(sixTest).toInt by 2).filter(d => sixTest%d==0).size == 0
case _ => false
}
}
input
.filter('number) {
x:Int => isPrime(x)
}
.write(output)
}
Trivia problem: Generate primes between 1 and 100 million with Scalding
Input: Numbers from 1 to 100 million, 1 per line - use a simple C program for this
Output: Filter input for primes using the filter function on pipes in Scalding
Execution time: 13:11:00 to 13:25:54 = 15 minutes in hdfs-local mode
Part files: 27
$ ls
part-00000 part-00002 part-00004 part-00006 part-00008 part-00010 part-00012 part-00014 part-00016 part-00018 part-00020 part-00022 part-00024 part-00026 part-00001 part-00003 part-00005 part-00007 part-00009 part-00011 part-00013 part-00015 part-00017 part-00019 part-00021 part-00023 part-00025
$ tail part-00026
99999787
99999821
99999827
99999839
99999847
99999931
99999941
99999959
99999971
99999989
$ more part-00000
2
3
5
7
11
13
17
19
23
29
31
Verify: Is 99999989 a prime ? http://primes.utm.edu/curios/page.php/99999989.html
How many primes between 1 and 100 million ?
$ wc -l part*
305004 part-00000
267880 part-00001
240917 part-00002
226206 part-00003
223190 part-00004
220649 part-00005
218586 part-00006
217030 part-00007
215116 part-00008
214170 part-00009
212899 part-00010
211929 part-00011
210716 part-00012
209982 part-00013
209057 part-00014
208377 part-00015
207515 part-00016
207115 part-00017
206321 part-00018
205839 part-00019
204994 part-00020
204775 part-00021
204086 part-00022
203646 part-00023
203174 part-00024
202923 part-00025
99359 part-00026
5761455 total
Only 5.76 million primes between 1 to 100 million. ( you could compute this with a Reduce step - Exercise for the motivated student with too much time on his hands )
@krishnanraman
Copy link
Author

wilf - No, 121 will not be on my list, nor will any other 6n-1 composite. Notice I use trial division upon all 6n-1 composites ( (3 to math.sqrt(sixTest).toInt by 2).filter(d => sixTest%d==0).size == 0 )
Only the primes will survive that.
In fact, lets look at the part file.
$ more part-00000
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97
101
103
107
109
113
127
131
137
139

No 121.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment