BUSGen
enhances breast
ultrasound
diagnosis.
a,
Breast cancer early diagnosis
involved the
identification of DCIS
(early-stage cancer) from benign
lesions, which was considered
difficult for radiologists
based on ultrasound images.
b,
Comparison of BUS-DM (red) with
Baseline-CLIP (blue) in
the early diagnosis task for
benign-DCIS classification.
BUS-DM achieved a higher AUC of
0.900 (95% CI:
0.849–0.938) compared to the
Baseline-CLIP with an AUC
of 0.846 (95% CI: 0.785–0.891;
P-value=0.0002).
c, Comparison
of BUS-DM with
board-certified radiologists
(n=9; 11 years of
experience on average) in breast
cancer early diagnosis.
The ROC curves of BUS-DM (red
curve) and diagnostic
results of radiologists (dots)
show that BUS-DM
outperformed radiologists by a
large margin. The colors
(blue, green, and orange) of the
dots represent
radiologists' results calculated
via different
thresholds. d,
Accuracy improvements of
radiologists with the assistance
of BUS-DM. We report
the accuracy of radiologists in
breast cancer early
diagnosis, as well as their
accuracy after considering
BUS-DM predictions. Accuracy is
calculated using the
threshold of BI-RADS 4A.
e, The data
scaling curves of test loss
(upper part of the left
plot) and AUC score (lower part
of the left plot) of
diagnostic models trained on
different scales of real
collected data (dark purple) and
BUSGen generated data
(light purple). The curves for
real and generated data
closely align at small data
scales, with the generated
data continuously enhancing
downstream performance as
the number of generated samples
increases. By scaling up
the generated data to 1 million
samples, we developed
BUS-DM (AUC: 0.929; 95% CI:
0.910-0.947) that achieved
comparable performance to NYU-AI
(trained on 288,767
real samples; AUC: 0.927; 95%
CI: 0.907-0.959), and
outperformed Baseline-CLIP (AUC:
0.876; 95% CI:
0.869-0.914; P-value=0.0006) on
the BUSI test set
(n=780). f,
Comparison of BUS-DM (red)
with Baseline-CLIP (blue) on the
internal diagnosis test
set for benign-malignant
classification (n=579). BUS-DM
achieved a higher AUC of 0.953
(95% CI: 0.932–0.968)
compared to the Baseline-CLIP
with an AUC of 0.925 (95%
CI: 0.902–0.946;
P-value=0.0006).
g,
Comparison of BUS-DM (red) with
Baseline-CLIP (blue) on
the external diagnosis test set
for benign-malignant
classification (n=227). BUS-DM
achieved a higher AUC of
0.951 (95% CI: 0.915–0.975)
compared to the
Baseline-CLIP with an AUC of
0.913 (95% CI: 0.868–0.946;
P-value=0.0007). Note that
BUS-DM, trained only on
generated data, enjoyed better
generalization ability
than baseline models trained on
real data.
h, Comparison
of BUS-DM with
board-certified radiologists
(n=9) of the diagnosis task
(benign-malignant
classification) on the external
test
set. The ROC curves of BUS-DM
(red curve) and diagnostic
results of radiologists (dots)
show that BUS-DM
outperformed the average
performance of radiologists.
***P-value<0.001.