BoneXpert knows its own limitation

A new paper documents BoneXpert’s accuracy and self-validation

As AI systems for radiology develop into being more and more autonomous, it becomes increasingly important that they are competent about their own limitation. BoneXpert is a prime example of autonomous AI: The most common usage is that the radiologist accepts BoneXpert’s rating and only looks for other findings in the hand radiograph such as signs of abnormalities indicating an underlying disorder. A recent survey showed that 83% of BoneXpert users never override the automated rating or override it in less than 5% of the cases. The system can even be used to completely replace the radiologist’s viewing of the image.

When the radiologists hand over the quantitative interpretation to the AI, they assume that the system can do the interpretation accurately. They are perhaps less aware that they also effectively assume that the system will refuse to do the interpretation if the image is outside its domain of validity.

The problem is, however, that this aspect has been neglected by most AI radiology vendors.

The new paper published by Scientific Reports focuses on this important aspect. It describes how BoneXpert’s self-validation mechanism was designed to not only validate the image as a whole but also to validate that each bone is properly recognised. BoneXpert gives visual feedback on the analysis of each bone, making the validation transparent, or “explained”. BoneXpert rejects only 0.4% of valid hand X-rays, and its accuracy (0.33 y) is far better than the human accuracy (0.58 y), and the system makes 12 times fewer significant errors (>1.5 y) than a human rater, showing that the self-validation effectively curbs the tail of the error distribution.

To illustrate BoneXpert’s bone-by-bone self-validation, consider this girl (RSNA image 4210). She underwent surgery to move the index finger to the thumb position because her own thumb was absent or not functional. The second metacarpal was removed in the process.

BoneXpert refused to recognise the thumb, because its bones do not agree with BoneXpert’s internal model (i.e. BoneXpert’s knowledge) of the thumb.

(Technically this is a pollicised index finger, see Pollicis is genitive of pollex meaning thumb)

A typical deep learning method would only report an overall bone age without any details, and the doctor would doubt that the AI has sufficient understanding of this image.

You can test BoneXpert’s self-validation mechanism for free on your own images using BoneXpert Online.