Henry Robinson henryr

## gist:7903324
Let's say this: an operation is durable if it is effects persist following any sequence of crash-restart failures, including a total restart. Note that this implies the entire cluster is available after the failure sequence; we can weaken this to "sufficiently many" (e.g. N/2+1) nodes are available after the failure sequence if we want.

Meeting this requirement naturally requires persistent storage, because any in-memory only approach can't survive a total restart. Since the failures may not be staggered, this also rules out anti-entropy style dissemination where restarted nodes are told about already committed operations.

Do we want to propose a parameterised form of durability, where we tolerate up to F crash-restart faults before sacrificing durability? Maybe: then you'd be able to achieve such durability by guaranteeing that data were written to F+1 nodes (i.e. the DF formulation). Or if you didn't care much about the recency of the version available after F failures, you could have a very weak requirem

## gist:6218031
# [thread 140550751581952 also had an error]
[thread 140550776760064 also had an error]
[thread 140550785152768 also had an error]
[thread 140550759974656 also had an error]
[thread 140550734796544 also had an error]
[thread 140550676047616 also had an error]
C  [impalad+0x985df7]  impala::HdfsOp::Execute() const+0x119
#
# An error report file with more information is saved as:
# /home/henry/src/cloudera/impala/hs_err_pid14403.log

## gist:5423880
hive> create table oh_hive(col int) partitioned by (part int);
OK
Time taken: 0.091 seconds
hive> insert into table oh_hive partition(part) select NULL, NULL from functional.alltypes limit 1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Loading data to table hank.oh_hive partition (part=null)
        Loading partition {part=__HIVE_DEFAULT_PARTITION__}
Partition hank.oh_hive{part=__HIVE_DEFAULT_PARTITION__} stats: [num_files: 1, num_rows: 0, total_size: 3, raw_data_size: 0]
Table hank.oh_hive stats: [num_partitions: 1, num_files: 1, num_rows: 0, total_size: 3, raw_data_size: 0]

## gist:1977470
#include <time.h>

long get_time() {
  struct timespec tp;
  clock_gettime(CLOCK_MONOTONIC, &tp);
  return (long)((tp.tv_sec * 1000000000) + tp.tv_nsec);
}

#include "../hadoop-common/hadoop-hdfs-project/hadoop-hdfs

## gist:606834
 URL url = new URL("http://localhost:" + server.getPort());
 xceiver = new HttpTransceiver(url);
 proxy = (NozzleIPC) SpecificRequestor.getClient(NozzleIPC.class, xceiver);

## gist:603227
2010-09-29 11:03:14,157 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage DS-622878021-127.0.0.1-50010-1285783394153
2010-09-29 11:03:14,160 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
2010-09-29 11:03:40,476 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel     ip=/127.0.0.1   cmd=create      src=/user/root/core-site.xml    dst=null        perm=root:supergroup:rw-r--r--
2010-09-29 11:03:40,486 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1(excluded: 127.0.0.1:50010)
2010-09-29 11:03:40,488 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call addBlock(/user/root/core-site.xml, DFSClient_1907075850, null) from 127.0.0.1:39665: error: java.io.IOException: File /user/root/core-site.xml could only be replicated to 0 nodes, instead of 1
java.io.IOException

## gist:603178
2010-09-29 10:28:27
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):

"org.apache.hadoop.hdfs.server.datanode.DataBlockScanner@1278dc4c" daemon prio=10 tid=0x0000000050e67000 nid=0x153b waiting on condition [0x0000000042a7b000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:625)
        at java.lang.Thread.run(Thread.java:619)

"org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@406754d6" daemon prio=10 tid=0x0000000050e64800 nid=0x1537 runnable [0x000000004297a000]
	Let's say this: an operation is durable if it is effects persist following any sequence of crash-restart failures, including a total restart. Note that this implies the entire cluster is available after the failure sequence; we can weaken this to "sufficiently many" (e.g. N/2+1) nodes are available after the failure sequence if we want.

	Meeting this requirement naturally requires persistent storage, because any in-memory only approach can't survive a total restart. Since the failures may not be staggered, this also rules out anti-entropy style dissemination where restarted nodes are told about already committed operations.

	Do we want to propose a parameterised form of durability, where we tolerate up to F crash-restart faults before sacrificing durability? Maybe: then you'd be able to achieve such durability by guaranteeing that data were written to F+1 nodes (i.e. the DF formulation). Or if you didn't care much about the recency of the version available after F failures, you could have a very weak requirem
	# [thread 140550751581952 also had an error]
	[thread 140550776760064 also had an error]
	[thread 140550785152768 also had an error]
	[thread 140550759974656 also had an error]
	[thread 140550734796544 also had an error]
	[thread 140550676047616 also had an error]
	C [impalad+0x985df7] impala::HdfsOp::Execute() const+0x119
	#
	# An error report file with more information is saved as:
	# /home/henry/src/cloudera/impala/hs_err_pid14403.log
	hive> create table oh_hive(col int) partitioned by (part int);
	OK
	Time taken: 0.091 seconds
	hive> insert into table oh_hive partition(part) select NULL, NULL from functional.alltypes limit 1;
	Total MapReduce jobs = 1
	Launching Job 1 out of 1
	Loading data to table hank.oh_hive partition (part=null)
	Loading partition {part=__HIVE_DEFAULT_PARTITION__}
	Partition hank.oh_hive{part=__HIVE_DEFAULT_PARTITION__} stats: [num_files: 1, num_rows: 0, total_size: 3, raw_data_size: 0]
	Table hank.oh_hive stats: [num_partitions: 1, num_files: 1, num_rows: 0, total_size: 3, raw_data_size: 0]
	#include <time.h>

	long get_time() {
	struct timespec tp;
	clock_gettime(CLOCK_MONOTONIC, &tp);
	return (long)((tp.tv_sec * 1000000000) + tp.tv_nsec);
	}

	#include "../hadoop-common/hadoop-hdfs-project/hadoop-hdfs
	URL url = new URL("http://localhost:" + server.getPort());
	xceiver = new HttpTransceiver(url);
	proxy = (NozzleIPC) SpecificRequestor.getClient(NozzleIPC.class, xceiver);
	2010-09-29 11:03:14,157 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage DS-622878021-127.0.0.1-50010-1285783394153
	2010-09-29 11:03:14,160 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
	2010-09-29 11:03:40,476 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/127.0.0.1 cmd=create src=/user/root/core-site.xml dst=null perm=root:supergroup:rw-r--r--
	2010-09-29 11:03:40,486 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1(excluded: 127.0.0.1:50010)
	2010-09-29 11:03:40,488 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call addBlock(/user/root/core-site.xml, DFSClient_1907075850, null) from 127.0.0.1:39665: error: java.io.IOException: File /user/root/core-site.xml could only be replicated to 0 nodes, instead of 1
	java.io.IOException
	2010-09-29 10:28:27
	Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):

	"org.apache.hadoop.hdfs.server.datanode.DataBlockScanner@1278dc4c" daemon prio=10 tid=0x0000000050e67000 nid=0x153b waiting on condition [0x0000000042a7b000]
	java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:625)
	at java.lang.Thread.run(Thread.java:619)

	"org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@406754d6" daemon prio=10 tid=0x0000000050e64800 nid=0x1537 runnable [0x000000004297a000]