BoneXpert validation study in Turkey

Researchers from Koc University Hospital in Istanbul have published a study validating two bone age systems: BoneXpert and Vuno.

292 images were rated with BoneXpert and Vuno Med bone age and compared to a reference formed by two manual ratings.

The accuracy of the two systems was found to be the same, as illustrated in the below Bland Altman plot for girls

Accuracy is one aspect of an automated method – the ability to explain the result is another, and here the two systems are very different, as the authors illustrated by juxtaposing the outputs from BoneXpert and Vuno:

BoneXpert’s main bone age result, 7.54 y, is derived as an average over the 21 tubular bones with equal weights on the bones. In addition, BoneXpert reports a carpal bone age – the average over the 7 carpals.

These results are “explained” by showing the contour of each bone as well as its bone age. Occasionally, a bone is left out due to abnormal shape, but not in this example.

Vuno’s “explanation” is a heat map, depicting where the deep neural net output is most sensitive to changes in the image. In this example, it showed a high intensity in radius, ulna, PP2 and DP3. Since the most reliable bone age is derived by averaging over as many bones as possible, it is of concern that the method collected information from mainly these four bones. BoneXpert’s output suggested that PP2 has bone age 9.0 y, much larger than the average 7.5 y. BoneXpert assigns the same weight to all 21 bones, which tends to average out differences between the bones and this provides precision and standardisation. In a follow-up exam, the Vuno method could – at its own discretion – decide to focus on a different small subset of bones, and this would raise the question whether a change in bone age was due a biological change, or merely a result of emphasising a different subset of bones.

The full article pdf is freely available here

Bone age validation study from Leipzig

Radiologists at the University Hospital in Leipzig have published a comparison of three automated bone age methods: BoneXpert, BoneView form Gleamer and PANDA from Image BiopsyLab

The study collected images of 306 children covering the age range 1–18 years. This wide range allowed the study to reveal how the methods behaved at low and high bone ages.

A reference bone age (denoted  “Ground Truth”) was formed as the average of three human ratings.

The overall agreement between each AI and the reference was almost the same for the three methods, although BoneXpert still showed the best correlation

BoneXpert is the only method that covers the full bone age range 0-19 yr, and it is interesting to see how the other methods handled the low- and high-bone age ends.

BoneView rejected images if the chronological age of the DICOM file was below 3 y and also rejected images with a bone age 17 and above.

PANDA accepted all images, but gave large errors below 5 years as is clearly visible. Also, for females with reference bone age above 15 y, PANDA tended to give predicted bone age not much larger than 15 y – a kind of saturation effect.

The corresponding author is Dr Daniel Gräfe and the full text is available online

Swiss study compares two bone age algorithms

The accuracy of bone age determination by BoneXpert and Panda in 188 images was reported at ECR.

Federica Zanca from Leuven, together with four co-authors from Switzerland, presented a study comparing two bone age algorithms, BoneXpert from Visiana and Panda (based on deep learning) from ImageBiopsyLab. The study included 188 images taken under real-world conditions across 11 centres in Switzerland. The ground truth was provided by an exceptionally reliable manual rater. BoneXpert’s intended use includes autonomous use, while Panda’s is intended to only assist the radiologist. However, in this study, both algorithms were used without human interference.

The mean absolute deviation between the algorithm and the ground truth was 0.36 y for BoneXpert and 0.42 y for Panda, and this difference was significant with p=0.01.

In the Bland Altman plots below one can clearly see that the agreement is better with BoneXpert.

Figure 1: BoneXpert versus radiologist

 

Figure 2: Panda versus radiologist

The plots reveal that there are markedly fewer large deviations with BoneXpert.

So what is the clinical signficance of the different performance? The authors adressed this by defining the clinically acceptable limit of agreement to be ±1 year, and they found twice as many such significant deviations for Panda.

The table below summarises all the findings.

The poster is available though myESR (requiring log in)

Table: The deviation between the bone age algorithm and the radiologist

BoneXpert Panda
Mean Absolute Deviation 0.36 y 0.42 y
Root Mean Square Deviation 0.47 y 0.55 y
Number of deviations > 1 y 7 14
Largest deviation 1.2 y 1.9 y