Comments on https://pubs.rsna.org/doi/10.1148/radiol.222176
-
The study looks at how the use of a faked “AI assistant” affects the judgment of radiologists evaluating mammograms with a suspicious lesion, rating on a scale with these values: 2 -> benign; 3 -> probably benign; 4 -> suspicious; 5 -> highly suggestive of malignancy.
-
They extracted a set of 50 mammograms from a big archive. Either those mammograms had already been scored, or they had an experienced radiologist score them afresh (not sure which). I'm going to call this the pre-experiment score. In either case, the appropriateness of the score was confirmed. It's not clear, but I think it went like this: For "suspicious" and "highly suggestive", they only included mammograms for which a biopsy actually found a malignancy; for "benign" and "probably benign"; they only included mammograms if four years had since passed without otherwise discovering a problem.
-
The participants were shown the image, an AI score, and asked to produce