- If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
- Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
- Pay particular attention to the number of partitions when using
flatMap
, especially if the following operation will result in high memory usage. TheflatMap
op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output offlatMap
to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
# Configure homebrew permissions to allow multiple users on MAC OSX. | |
# Any user from the admin group will be able to manage the homebrew and cask installation on the machine. | |
# allow admins to manage homebrew's local install directory | |
chgrp -R admin /usr/local | |
chmod -R g+w /usr/local | |
# allow admins to homebrew's local cache of formulae and source files | |
chgrp -R admin /Library/Caches/Homebrew |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<filetype binary="false" description="Logstash Config" name="Logstash Config"> | |
<highlighting> | |
<options> | |
<option name="LINE_COMMENT" value="#" /> | |
<option name="COMMENT_START" value="" /> | |
<option name="COMMENT_END" value="" /> | |
<option name="HEX_PREFIX" value="" /> | |
<option name="NUM_POSTFIXES" value="" /> | |
<option name="HAS_BRACES" value="true" /> | |
<option name="HAS_BRACKETS" value="true" /> |
- Install IntelliJ + Scala Plugin
- Don’t do the Coursera courses yet.
- Don’t do the “red book” Functional Programming in Scala yet.
- Do: http://underscore.io/books/
- Essential Scala
- Essential Play
- Essential Slick
- Do Scala for the Impatient: https://www.amazon.com/Scala-Impatient-Cay-S-Horstmann/dp/0321774094
- The Neophyte’s Guide to Scala http://danielwestheide.com/scala/neophytes.html
OlderNewer