Skip to content

Instantly share code, notes, and snippets.

@henryr
henryr / screen-shot.png
Last active August 17, 2017 23:05
Screenshot of /backends tab
See screenshot
use tpch_parquet;
with subquery as (select /* +straight_join */ count(*) from lineitem a join /* +shuffle */
orders b on a.l_orderkey = b.o_orderkey where a.l_orderkey = b.o_orderkey and l_shipdate
between "1992-01-01" and "1994-01-01" and o_orderdate between "1992-01-01" and
"1994-01-01" group by a.l_orderkey having count(*) > 9999999999) select * from (select *
from subquery) a1, (select * from subquery) a2, (select * from subquery) a3, (select *
from subquery) a4, (select * from subquery) a5, (select * from subquery) a6, (select *
from subquery) a7, (select * from subquery) a8, (select * from subquery) a9, (select *
from subquery) a10, (select * from subquery) a11, (select * from subquery) a12, (select *
from subquery) a13, (select * from subquery) a14, (select * from subquery) a15, (select *
use tpch_300_parquet;
with subquery as (select /* +straight_join */ count(*) from lineitem a join /* +shuffle */ orders b on a.l_orderkey = b.o_orderkey where a.l_orderkey = b.o_orderkey and l_shipdate between "1992-01-01" and "1994-01-01" and o_orderdate between "1992-01-01" and "1994-01-01" group by a.l_orderkey having count(*) > 9999999999) select * from (select * from subquery) a1, (select * from subquery) a2, (select * from subquery) a3, (select * from subquery) a4, (select * from subquery) a5, (select * from subquery) a6, (select * from subquery) a7, (select * from subquery) a8, (select * from subquery) a9, (select * from subquery) a10, (select * from subquery) a11, (select * from subquery) a12, (select * from subquery) a13, (select * from subquery) a14, (select * from subquery) a15, (select * from subquery) a16, (select * from subquery) a17, (select * from subquery) a18, (select * from subquery) a19, (select * from subquery) a20, (select * from subquery) a21, (select * from subquery) a22, (select * fro
=============================================================================
== Baseline KRPC
== many_independent_fragments takes 378s
- 13.12% impalad impalad [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
- tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
- 93.95% tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)
- tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)
- 98.16% operator new[](unsigned long)
29.20% impala::RowDescriptor::RowDescriptor(impala::RowDescriptor const&)
@henryr
henryr / pre-commit
Last active September 19, 2016 21:59
#!/bin/bash
# If this is an amend, not a new commit, need to take diff against HEAD~1, not HEAD, to do
# formatting.
RELATIVE_COMMIT="HEAD"
GIT_CMD="$(ps -ocommand= -p $PPID)"
if [[ ${GIT_CMD} == *"--amend"* ]]
then
RELATIVE_COMMIT="HEAD~1"
fi
#!/bin/bash
# If this is an amend, not a new commit, need to take diff against HEAD~1, not HEAD, to do
# formatting.
RELATIVE_COMMIT="HEAD"
GIT_CMD="$(ps -ocommand= -p $PPID)"
if [[ ${GIT_CMD} == *"--amend"* ]]
then
RELATIVE_COMMIT="HEAD~1"
fi
@henryr
henryr / gist:8655574
Created January 27, 2014 19:22
External consistency?
The claim for 'external consistency' is as following (all quotes are from the journal paper):
"external-consistency invariant: if the start of a transaction T2 occurs after the commit of a transaction T1, then the commit time-stamp of T2 must be greater than the commit timestamp of T1."
But when assigning a commit timestamp, section 4.2.1 has:
"The commit timestamp s must be greater than or equal to all pre- pare timestamps (to satisfy the constraints discussed in Section 4.1.3), greater than TT.now().latest at the time the coordinator received its commit message, and greater than any timestamps the leader has assigned to previous transactions (again, to pre- serve monotonicity)."
This lead me down the following path:
1. Is TT.now().latest monotonic increasing? Presumably not, otherwise the third requirement (about being larger than any previous transaction) would be implicit, and also epsilon is not monotonic increasing.
import java.io.ByteArrayOutputStream;
public class TestByteArray {
static byte[] chunk = new byte[1024 * 1024];
public static void main(String[] args) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int numChunks = 2 * 1024 * 1024;
for (int i = 0; i < numChunks; ++i) {
long start = System.currentTimeMillis();
Let's say this: an operation is durable if it is effects persist following any sequence of crash-restart failures, including a total restart. Note that this implies the entire cluster is available after the failure sequence; we can weaken this to "sufficiently many" (e.g. N/2+1) nodes are available after the failure sequence if we want.
Meeting this requirement naturally requires persistent storage, because any in-memory only approach can't survive a total restart. Since the failures may not be staggered, this also rules out anti-entropy style dissemination where restarted nodes are told about already committed operations.
Do we want to propose a parameterised form of durability, where we tolerate up to F crash-restart faults before sacrificing durability? Maybe: then you'd be able to achieve such durability by guaranteeing that data were written to F+1 nodes (i.e. the DF formulation). Or if you didn't care much about the recency of the version available after F failures, you could have a very weak requirem
# [thread 140550751581952 also had an error]
[thread 140550776760064 also had an error]
[thread 140550785152768 also had an error]
[thread 140550759974656 also had an error]
[thread 140550734796544 also had an error]
[thread 140550676047616 also had an error]
C [impalad+0x985df7] impala::HdfsOp::Execute() const+0x119
#
# An error report file with more information is saved as:
# /home/henry/src/cloudera/impala/hs_err_pid14403.log