Skip to content

Instantly share code, notes, and snippets.

View hendrysuwanda's full-sized avatar
🏠
Working from home

Hendry Suwanda hendrysuwanda

🏠
Working from home
  • Depok, Jawa Barat - Indonesia
View GitHub Profile

Quick Tips for Fast Code on the JVM

I was talking to a coworker recently about general techniques that almost always form the core of any effort to write very fast, down-to-the-metal hot path code on the JVM, and they pointed out that there really isn't a particularly good place to go for this information. It occurred to me that, really, I had more or less picked up all of it by word of mouth and experience, and there just aren't any good reference sources on the topic. So… here's my word of mouth.

This is by no means a comprehensive gist. It's also important to understand that the techniques that I outline in here are not 100% absolute either. Performance on the JVM is an incredibly complicated subject, and while there are rules that almost always hold true, the "almost" remains very salient. Also, for many or even most applications, there will be other techniques that I'm not mentioning which will have a greater impact. JMH, Java Flight Recorder, and a good profiler are your very best friend! Mea

@ravibhure
ravibhure / git_rebase.md
Last active March 3, 2024 06:54
Git rebase from remote fork repo

In your local clone of your forked repository, you can add the original GitHub repository as a "remote". ("Remotes" are like nicknames for the URLs of repositories - origin is one, for example.) Then you can fetch all the branches from that upstream repository, and rebase your work to continue working on the upstream version. In terms of commands that might look like:

Add the remote, call it "upstream":

git remote add upstream https://github.com/whoever/whatever.git

Fetch all the branches of that remote into remote-tracking branches, such as upstream/master:

git fetch upstream

@epiphani
epiphani / CDHTez.md
Last active February 14, 2024 08:03
Getting Tez enabled on CDH5.4+

So Hive in CDH is horribly, painfully slow. Cloudera ships Hive 1.1, which is actually moderately modern. It is, however, very badly configured out of the box and patched with custom code from Cloudera. With a bit of effort, we managed to improve hive performance considerably. We really shouldn't have to do this, but Cloudera is actively working against supporting a performant Hive.

First, building Tez was fairly straightforward. Using the instructions at https://github.com/apache/tez/blob/master/docs/src/site/markdown/install.md, the only change was to use the version string "2.6.0" for the build. I believe that was the default. Don't use the CDH string, it won't work.

At the bottom of the installation instructions, there's mention of the fact that to use the local hadoop jars (rather than those packaged with tez) you must unpack the jars in HDFS rather than using the tarball. In this case, unpack the tez-minimal tarball and upload the contents to /apps/tez-0.7.0 (or whatever you prefer). Don't fo