Skip to content

Instantly share code, notes, and snippets.

View vpavlenko's full-sized avatar
🇦🇲

Vitaly Pavlenko vpavlenko

🇦🇲
View GitHub Profile

The pyspark documentation doesn't include an example for the aggregateByKey RDD method. I didn't find any nice examples online, so I wrote my own.

Here's what the documetation does say:

aggregateByKey(self, zeroValue, seqFunc, combFunc, numPartitions=None)

Aggregate the values of each key, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of the values in this RDD, V. Thus, we need one operation for merging a V into a U and one operation for merging two U's, The former operation is used for merging values within a partition, and the latter is used for merging values between partitions. To avoid memory allocation, both of these functions are allowed to modify and return their first argument instead of creating a new U.

reduceByKey and aggregateByKey are much more efficient than groupByKey and should be used for aggregations as much as possible.

@vpavlenko
vpavlenko / dabblet.css
Created February 19, 2014 07:47 — forked from basvasilich/dabblet.css
блочные элементы не выравниваются,
.wrapper
{
padding-left: 4em;
text-align: right;
}
.wrapper p
{
/*
блочные элементы не выравниваются,
@vpavlenko
vpavlenko / dabblet.css
Created February 19, 2014 07:42 — forked from basvasilich/dabblet.css
font-family != font.ttf,
body
{
/*
font-family != font.ttf,
но может состоять из одного шрфита
не забывать про кавычки
если есть пробел в названии
*/
@vpavlenko
vpavlenko / dabblet.css
Created February 19, 2014 05:43 — forked from basvasilich/dabblet.css
1em = 100%
/*
1em = 100%
1ex = 1/2em или высоте x
*/
body
{
font-size: 5em;
}
.em-box {
@vpavlenko
vpavlenko / dabblet.css
Created February 19, 2014 05:39 — forked from basvasilich/dabblet.css
процентные значения берутся как правило
/*
процентные значения берутся как правило
от родителя
*/
html, body
{
height: 100%;
}
.main
@vpavlenko
vpavlenko / dabblet.css
Created February 19, 2014 05:35 — forked from basvasilich/dabblet.css
Untitled
body {
font-size: 36px;
}
.outer /* , .outer > div */ {
color: red; /* строчное свойство - наследуется */
border: 2px solid black; /* блочное свойство - не наследуется */
}
@vpavlenko
vpavlenko / dabblet.css
Created February 19, 2014 05:21 — forked from basvasilich/dabblet.css
любые стили для :visited
/*
любые стили для :visited
отключены в webkit по
соображениям безопастности
*/
:visited
{
/* разрешается менять только color */
color: red;
text-decoration: none;
@vpavlenko
vpavlenko / dabblet.css
Created February 19, 2014 05:15 — forked from basvasilich/dabblet.css
Untitled
.parent li:first-child
{
aborder: 1px solid red;
}
.parent li:last-child
{
aborder: 1px solid red;
}
@vpavlenko
vpavlenko / dabblet.css
Created February 19, 2014 05:13 — forked from basvasilich/dabblet.css
самый распространенный: все потомки
/* самый распространенный: все потомки */
.parent .child
{
aborder: solid 3px red
}
/*
все селекторы можно уточнять
именами элементов ol.parent li.child
*/
@vpavlenko
vpavlenko / dabblet.css
Created February 19, 2014 05:09 — forked from basvasilich/dabblet.css
2 id невалидно
/*Селекторы. Class, ID*/
body
{
font-size: 32px
}
#myid
{
color: red;