In this paper, the authors talk about "often silent" corrupt execution errors (CEEs) that cause ephemeral computational errors for a class of computations. They observe that small (valid) code changes that makes heavy use of rarely used instructions can lead to large shifts in reliability (due to manufacturing defects). These were not detected during manufacturing tests, and cannot always be mitigated by microcode updates. Such cores can give incorrect results for some inputs and can be obscured by undiagnosed software bugs. When a core develops such a behavior, they term it as mercurial. Authors observe on the order of few mercurial cores per several thousand machines. The rate seen by their automatic detector is gradually increasing, but they don't know if this reflects a change in the underlying rate.
CEEs may be detected nearly immediately with self-checks, exceptions, or seg-faults. In other cases, wrong answers are detected early, too late in computation, or never detected. Bad me