This is a walkthrough on the proposal of Error Values to confirm whether the proposal is enough functional to fix issue 18183.
As described in issue 18183, the kernel of the issue is error observability on a function that interacts with multiple external processes concurrently because the returned error values could be mapped values of external processes' faults. Without error observability, it's hard to understand what the error values really mean and the circumstances really are.
-
The new proposed features inspection and printing work fine.
-
No new exposed API. The net package of the standard library seems to be under a metastable condition and it's better to avoid adding new API instantly until we draw the shape of the new transport-layer API that can accommodate various upcoming features from secure and multipath-capable transport technologies.
This walkthrough does not try to validate the proposal from the user experience point of view.
As a straightforward approach for this walkthrough, we may introduce a new internal type that can hold multiple error values and returns a list of error values like the following:
// An errorList holds multiple error values.
type errorList []error
// Error implements the Error method of built-in error interface.
func (el errorList) Error() string {
if len(el) == 0 {
return "<nil>"
}
return fmt.Sprint(el)
}
// Unwrap implements the Unwrap method of errors.Wrapper interface.
func (el errorList) Unwrap() error {
if len(el) == 0 {
return nil
}
return el[1:]
}
// Format implements the Format method of errors.Formatter interface.
func (el errorList) Format(p errors.Printer) error {
if len(el) == 0 {
return nil
}
p.Print(el[0])
return el[1:]
}
// An example of modified Dial function:
func Dial(network, address string) (Conn, error) {
var el errorList
// Dial runs multiple dial racers,
// which discover the destination and service identifiers,
// and try to establish connections concurrently,
// eventually faces a list of errors when all attempts fail.
for ... {
// Receive a result of discovery and connection setup
// racing.
if err != nil {
el = append(el, err)
continue
}
// Cancel all inflight racers.
return Conn, nil
}
return nil, &OpError{Err: el}
}
To satisfy the real world use cases, a verification test case should include at least two different error values caused by two different faults: a DNS fault and an IP transport fault. In near future, we probably face a bit complicated circumstances, for example, living in some walled garden that provides IPv4 transport for internal use only and IPv6 transport for everywhere, and external IPv4 connectivity is provided by some IPv6 transition mechanism. DNS is still active and it also carries a heavy burden on WebPKI bootstrapping and DANE/RPKI, and various failures caused by operational faults such as lame delegation and PMTU misconfiguration for DNS transport still exist.
&OpError{
Op: "dial",
Net: "tcp",
Err: errorList{
&DNSError{
// transfers an error caused by
// a misconfiguration on a recursive or
// authortative server
},
&OpError{
Err: &DNSError{
// transfers an error caused by
// a misconfiguration on an IP
// transport for DNS message exchange
},
},
&OpError{
Err: &os.SyscallError{
// transfers an error caused by some
// IP routing or transport fault
},
},
&OpError{
Err: errors.New{
// of course, the package is full of bugs
},
},
},
}
As expected, users can display a list of error values, also extract each error value from the returned error value using errors.Wrapper interface, errors.Is and errors.As functions.
It's undesirable that users need to implement own inspection functionality when they want to localize or diagnose faults using error values, but that's unavoidable because the net package of the standard library cannot guarantee that the all of the error values are platform agnostic.
To mitigate the pain of having own inspection functionality, it might be better to consider adding more contextual information to the existing exposed error types such as DNSError.
The implementation doesn't expose any new API and it's possible to replace the internal type with more efficient one when necessary.
On 11th October 2018, ICANN will execute the first root zone KSK rollover plan. That perhaps might cause some DNS failure, caused by a misconfiguration on a recursive or authoritative server, on some circumstances. Under the circumstances, the existing Dial function returns only
// See issue 18183
&OpError{
Op: "dial",
Net: "tcp",
Addr: &TCPAddr{/* IPv6 address */},
Err: &os.SyscallError{/* no route or unreachable */}, // should be &DNSError{/* no record or timeout */}
}
and might mislead users into fixing unrelated IPv6 routing or transport.
With the proposal of Error Values and modifications to the net package of the standard library, such misleading information can be prevented.