Skip to content

Instantly share code, notes, and snippets.

View rxin's full-sized avatar

Reynold Xin rxin

View GitHub Profile
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*

I was curious about the results reported here, which reports that Scala's mutable maps are slower than Java's: http://www.infoq.com/news/2011/11/yammer-scala

In my tests, Scala's OpenHashMap equals or beats java's HashMap:

Insertion 100k elements (String keys) time in ms:

  • scala HashMap: 92.75
  • scala OpenHashMap: 14.03125
  • java HashMap: 15.78125
@rxin
rxin / lzf-1k.dump
Created July 14, 2014 06:59
Compression codec buffer allocation
Every 1000 com.ning.compress.lzf.LZFOutputStream instances allocate 198976424 bytes.
Every 1000 org.xerial.snappy.SnappyOutputStream instances allocate 67660104 bytes.
@rxin
rxin / snappy-framed-input-1k.dump
Created July 15, 2014 06:58
snappy framed vs non framed jmap
num #instances #bytes class name
----------------------------------------------
1: 6443 80317016 [B
2: 15102 1939472 <methodKlass>
3: 15102 1749784 <constMethodKlass>
4: 1029 1161976 <constantPoolKlass>
5: 1029 959320 <instanceKlassKlass>
6: 891 697312 <constantPoolCacheKlass>
7: 3680 245496 [C
@rxin
rxin / gist:37554d9f3e2e93220884
Last active August 29, 2015 14:05
Scala @inline final val impact on accessors
scala> class A {
| private[this] val abc = 1
|
| def print() = println(abc)
| }
# abc gets inlined by the compiler as a field
[[syntax trees at end of cleanup]] // <console>
package $line4 {
@rxin
rxin / dstat
Created September 15, 2014 08:06
High sys usage with Transparent Huge Pages (THP) enabled
date/time | used buff cach free|usr sys idl wai hiq siq| read writ| recv send
15-09 04:55:05|52.6G 56.8M 176G 11.4G| 4 2 82 12 0 0| 527M 0 | 0 198B
15-09 04:55:06|52.6G 56.8M 176G 10.9G| 3 2 80 15 0 0| 542M 64k| 581B 36k
15-09 04:55:07|52.6G 56.8M 177G 10.4G| 3 1 82 13 0 0| 535M 0 | 0 0
15-09 04:55:08|52.6G 56.8M 177G 9.87G| 2 1 85 12 0 0| 520M 0 | 0 506B
15-09 04:55:09|52.6G 56.8M 178G 9558M| 2 1 84 12 0 0| 549M 0 | 260B 520B
15-09 04:55:10|52.6G 56.8M 179G 9009M| 3 1 82 14 0 0| 557M 0 | 104B 594B
15-09 04:55:11|52.6G 56.8M 179G 8463M| 3 2 83 13 0 0| 530M 72k| 104B 272B
15-09 04:55:12|52.6G 56.8M 180G 7940M| 3 2 81 15 0 0| 532M 0 | 200B 888B
15-09 04:55:13|52.6G 56.8M 180G 7417M| 3 2 82 12 0 0| 510M 0 | 0 198B
@rxin
rxin / gist:6be132f46b72c27d8f89
Created November 1, 2014 21:22
test.scala on constructor parameter shadowing
class LegalPerson(name: String) {
def aaaaaaaaaa = name
}
class DoomedPerson(name: String) extends LegalPerson(name) {
def curName = name
}
@rxin
rxin / UnsafeBenchmark.arrayTraversal
Created March 13, 2015 07:38
Unsafe vs primitive array traversal speed
# {method} &apos;arrayTraversal&apos; &apos;()J&apos; in &apos;com/databricks/unsafe/util/benchmark/UnsafeBenchmark&apos;
0x000000010a8c9ae0: callq 0x000000010a2165ee ; {runtime_call}
0x000000010a8c9ae5: data32 data32 nopw 0x0(%rax,%rax,1)
0x000000010a8c9af0: mov %eax,-0x14000(%rsp)
0x000000010a8c9af7: push %rbp
0x000000010a8c9af8: sub $0x30,%rsp
0x000000010a8c9afc: mov (%rsi),%r13d
0x000000010a8c9aff: mov 0x18(%rsi),%rbp
0x000000010a8c9b03: mov 0x8(%rsi),%rbx
0x000000010a8c9b07: mov %rsi,%rdi
In [1]: df = sqlContext.read.json("examples/src/main/resources/people.json")
In [2]: df.withColumn('a b', df.age)
Out[2]: DataFrame[age: bigint, name: string, a b: bigint]
In [3]: df.withColumn('a b', df.age).write.parquet('test-parquet.out')
15/06/03 01:14:56 ERROR InsertIntoHadoopFsRelation: Aborting job.
java.lang.RuntimeException: Attribute name "a b" contains invalid character(s) among " ,;{}() =". Please use alias to rename it.
at scala.sys.package$.error(package.scala:27)
+----+-----+---+--------+---------+--------+---------+-------+-------+------+------+----+--------+--------+----+------+
|year|month|day|dep_time|dep_delay|arr_time|arr_delay|carrier|tailnum|flight|origin|dest|air_time|distance|hour|minute|
+----+-----+---+--------+---------+--------+---------+-------+-------+------+------+----+--------+--------+----+------+
|2013| 1| 1| 517.0| 2.0| 830.0| 11.0| UA| N14228| 1545| EWR| IAH| 227.0| 1400| 5.0| 17.0|
|2013| 1| 1| 533.0| 4.0| 850.0| 20.0| UA| N24211| 1714| LGA| IAH| 227.0| 1416| 5.0| 33.0|
|2013| 1| 1| 542.0| 2.0| 923.0| 33.0| AA| N619AA| 1141| JFK| MIA| 160.0| 1089| 5.0| 42.0|
|2013| 1| 1| 544.0| -1.0| 1004.0| -18.0| B6| N804JB| 725| JFK| BQN| 183.0| 1576| 5.0| 44.0|
|2013| 1| 1| 554.0| -6.0| 812.0| -25.0| DL| N668DN| 461| LGA| ATL| 116.0| 762| 5.0| 54.0|
+----+-----+---+--------+---------+--------+---------+-------+--