Henry Robinson henryr

## gist:603178
2010-09-29 10:28:27
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):

"org.apache.hadoop.hdfs.server.datanode.DataBlockScanner@1278dc4c" daemon prio=10 tid=0x0000000050e67000 nid=0x153b waiting on condition [0x0000000042a7b000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:625)
        at java.lang.Thread.run(Thread.java:619)

"org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@406754d6" daemon prio=10 tid=0x0000000050e64800 nid=0x1537 runnable [0x000000004297a000]

## gist:603227
2010-09-29 11:03:14,157 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage DS-622878021-127.0.0.1-50010-1285783394153
2010-09-29 11:03:14,160 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
2010-09-29 11:03:40,476 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel     ip=/127.0.0.1   cmd=create      src=/user/root/core-site.xml    dst=null        perm=root:supergroup:rw-r--r--
2010-09-29 11:03:40,486 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1(excluded: 127.0.0.1:50010)
2010-09-29 11:03:40,488 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call addBlock(/user/root/core-site.xml, DFSClient_1907075850, null) from 127.0.0.1:39665: error: java.io.IOException: File /user/root/core-site.xml could only be replicated to 0 nodes, instead of 1
java.io.IOException

## gist:606834
 URL url = new URL("http://localhost:" + server.getPort());
 xceiver = new HttpTransceiver(url);
 proxy = (NozzleIPC) SpecificRequestor.getClient(NozzleIPC.class, xceiver);

## gist:1977470
#include <time.h>

long get_time() {
  struct timespec tp;
  clock_gettime(CLOCK_MONOTONIC, &tp);
  return (long)((tp.tv_sec * 1000000000) + tp.tv_nsec);
}

#include "../hadoop-common/hadoop-hdfs-project/hadoop-hdfs

## gist:5423880
hive> create table oh_hive(col int) partitioned by (part int);
OK
Time taken: 0.091 seconds
hive> insert into table oh_hive partition(part) select NULL, NULL from functional.alltypes limit 1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Loading data to table hank.oh_hive partition (part=null)
        Loading partition {part=__HIVE_DEFAULT_PARTITION__}
Partition hank.oh_hive{part=__HIVE_DEFAULT_PARTITION__} stats: [num_files: 1, num_rows: 0, total_size: 3, raw_data_size: 0]
Table hank.oh_hive stats: [num_partitions: 1, num_files: 1, num_rows: 0, total_size: 3, raw_data_size: 0]

## gist:6218031
# [thread 140550751581952 also had an error]
[thread 140550776760064 also had an error]
[thread 140550785152768 also had an error]
[thread 140550759974656 also had an error]
[thread 140550734796544 also had an error]
[thread 140550676047616 also had an error]
C  [impalad+0x985df7]  impala::HdfsOp::Execute() const+0x119
#
# An error report file with more information is saved as:
# /home/henry/src/cloudera/impala/hs_err_pid14403.log

## gist:7903324
Let's say this: an operation is durable if it is effects persist following any sequence of crash-restart failures, including a total restart. Note that this implies the entire cluster is available after the failure sequence; we can weaken this to "sufficiently many" (e.g. N/2+1) nodes are available after the failure sequence if we want.

Meeting this requirement naturally requires persistent storage, because any in-memory only approach can't survive a total restart. Since the failures may not be staggered, this also rules out anti-entropy style dissemination where restarted nodes are told about already committed operations.

Do we want to propose a parameterised form of durability, where we tolerate up to F crash-restart faults before sacrificing durability? Maybe: then you'd be able to achieve such durability by guaranteeing that data were written to F+1 nodes (i.e. the DF formulation). Or if you didn't care much about the recency of the version available after F failures, you could have a very weak requirem

## gist:8344024
import java.io.ByteArrayOutputStream;

public class TestByteArray {

  static byte[] chunk = new byte[1024 * 1024];
  public static void main(String[] args) {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    int numChunks = 2 * 1024 * 1024;
    for (int  i = 0; i < numChunks; ++i) {
      long start = System.currentTimeMillis();

## gist:8655574
The claim for 'external consistency' is as following (all quotes are from the journal paper):

"external-consistency invariant: if the start of a transaction T2 occurs after the commit of a transaction T1, then the commit time-stamp of T2 must be greater than the commit timestamp of T1."
But when assigning a commit timestamp, section 4.2.1 has:

"The commit timestamp s must be greater than or equal to all pre- pare timestamps (to satisfy the constraints discussed in Section 4.1.3), greater than TT.now().latest at the time the coordinator received its commit message, and greater than any timestamps the leader has assigned to previous transactions (again, to pre- serve monotonicity)."

This lead me down the following path:

1. Is TT.now().latest monotonic increasing? Presumably not, otherwise the third requirement (about being larger than any previous transaction) would be implicit, and also epsilon is not monotonic increasing.

## com.cloudera.impala.planner.PlannerTest.testJoinOrder
com.cloudera.impala.planner.PlannerTest.testJoinOrder

Failing for the past 1 build (Since Failed#594 )
Took 0.21 sec.
add description
Error Message

section PLAN of query:
select
  n_name,
	2010-09-29 10:28:27
	Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):

	"org.apache.hadoop.hdfs.server.datanode.DataBlockScanner@1278dc4c" daemon prio=10 tid=0x0000000050e67000 nid=0x153b waiting on condition [0x0000000042a7b000]
	java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:625)
	at java.lang.Thread.run(Thread.java:619)

	"org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@406754d6" daemon prio=10 tid=0x0000000050e64800 nid=0x1537 runnable [0x000000004297a000]
	2010-09-29 11:03:14,157 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage DS-622878021-127.0.0.1-50010-1285783394153
	2010-09-29 11:03:14,160 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
	2010-09-29 11:03:40,476 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/127.0.0.1 cmd=create src=/user/root/core-site.xml dst=null perm=root:supergroup:rw-r--r--
	2010-09-29 11:03:40,486 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1(excluded: 127.0.0.1:50010)
	2010-09-29 11:03:40,488 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call addBlock(/user/root/core-site.xml, DFSClient_1907075850, null) from 127.0.0.1:39665: error: java.io.IOException: File /user/root/core-site.xml could only be replicated to 0 nodes, instead of 1
	java.io.IOException
	URL url = new URL("http://localhost:" + server.getPort());
	xceiver = new HttpTransceiver(url);
	proxy = (NozzleIPC) SpecificRequestor.getClient(NozzleIPC.class, xceiver);
	#include <time.h>

	long get_time() {
	struct timespec tp;
	clock_gettime(CLOCK_MONOTONIC, &tp);
	return (long)((tp.tv_sec * 1000000000) + tp.tv_nsec);
	}

	#include "../hadoop-common/hadoop-hdfs-project/hadoop-hdfs
	hive> create table oh_hive(col int) partitioned by (part int);
	OK
	Time taken: 0.091 seconds
	hive> insert into table oh_hive partition(part) select NULL, NULL from functional.alltypes limit 1;
	Total MapReduce jobs = 1
	Launching Job 1 out of 1
	Loading data to table hank.oh_hive partition (part=null)
	Loading partition {part=__HIVE_DEFAULT_PARTITION__}
	Partition hank.oh_hive{part=__HIVE_DEFAULT_PARTITION__} stats: [num_files: 1, num_rows: 0, total_size: 3, raw_data_size: 0]
	Table hank.oh_hive stats: [num_partitions: 1, num_files: 1, num_rows: 0, total_size: 3, raw_data_size: 0]
	# [thread 140550751581952 also had an error]
	[thread 140550776760064 also had an error]
	[thread 140550785152768 also had an error]
	[thread 140550759974656 also had an error]
	[thread 140550734796544 also had an error]
	[thread 140550676047616 also had an error]
	C [impalad+0x985df7] impala::HdfsOp::Execute() const+0x119
	#
	# An error report file with more information is saved as:
	# /home/henry/src/cloudera/impala/hs_err_pid14403.log
	Let's say this: an operation is durable if it is effects persist following any sequence of crash-restart failures, including a total restart. Note that this implies the entire cluster is available after the failure sequence; we can weaken this to "sufficiently many" (e.g. N/2+1) nodes are available after the failure sequence if we want.

	Meeting this requirement naturally requires persistent storage, because any in-memory only approach can't survive a total restart. Since the failures may not be staggered, this also rules out anti-entropy style dissemination where restarted nodes are told about already committed operations.

	Do we want to propose a parameterised form of durability, where we tolerate up to F crash-restart faults before sacrificing durability? Maybe: then you'd be able to achieve such durability by guaranteeing that data were written to F+1 nodes (i.e. the DF formulation). Or if you didn't care much about the recency of the version available after F failures, you could have a very weak requirem
	import java.io.ByteArrayOutputStream;

	public class TestByteArray {

	static byte[] chunk = new byte[1024 * 1024];
	public static void main(String[] args) {
	ByteArrayOutputStream baos = new ByteArrayOutputStream();
	int numChunks = 2 * 1024 * 1024;
	for (int i = 0; i < numChunks; ++i) {
	long start = System.currentTimeMillis();
	The claim for 'external consistency' is as following (all quotes are from the journal paper):

	"external-consistency invariant: if the start of a transaction T2 occurs after the commit of a transaction T1, then the commit time-stamp of T2 must be greater than the commit timestamp of T1."
	But when assigning a commit timestamp, section 4.2.1 has:

	"The commit timestamp s must be greater than or equal to all pre- pare timestamps (to satisfy the constraints discussed in Section 4.1.3), greater than TT.now().latest at the time the coordinator received its commit message, and greater than any timestamps the leader has assigned to previous transactions (again, to pre- serve monotonicity)."

	This lead me down the following path:

	1. Is TT.now().latest monotonic increasing? Presumably not, otherwise the third requirement (about being larger than any previous transaction) would be implicit, and also epsilon is not monotonic increasing.
	com.cloudera.impala.planner.PlannerTest.testJoinOrder

	Failing for the past 1 build (Since Failed#594 )
	Took 0.21 sec.
	add description
	Error Message

	section PLAN of query:
	select
	n_name,