Paper Title: Image quality assessment by overlapping task-specific and task-agnostic measures:
application to prostate multiparametric MR images for cancer segmentation
Author(s) and Year: Shaheer U. Saeed, Wen Yan, Yunguan Fu, Francesco Giganti, Qianye
Yang, Zachary M. C. Baum, Mirabela Rusu, Richard E. Fan, Geoffrey A. Sonn, Mark Emberton,
Dean C. Barratt, Yipeng Hu (2022)
Journal: Machine Learning for Biomedical Imaging (MELBA) (open access)
Article title: Automatic Retakes
Have you ever had to retake a photograph on your phone but the sun was shining way too
brightly in the background, causing the subject to appear with a halo? Maybe your arm just
wasn’t within reach and those 10 family members just couldn’t fit inside the frame, leaving
someone just ever so slightly on the outskirts?
These are two types of general image quality (IQ) problems that occur while capturing medical
images returning low quality (LQ) images. The first example of overexposure is when the image
contains what is called an artifact or noise. The second is when the image is marred by the
difficulty in capturing the required information (i.e. all 10 family members). These two problems
are not mutually exclusive – an image can be both noisy and return only 9 and a 1⁄2 family
The image quality can greatly affect what the user plans to do afterwards, for example edit and
frame the photograph. In prostate cancer, an image of the patient is captured using medical
imaging technology, e.g. a Magnetic Resonance Imaging (MRI) scan. The MRI image is then
used to localise regions of interest (such as tumours) and is interpreted by clinicians who have
relevant expertise. The interpretation of such images is also known as downstream tasks i.e.
the task that the image is used for after its capture.
A LQ image may not necessarily affect its use later. For example, an image may have noise that
is far away from the tumour’s location. Then, the image can still be used for diagnosis.
Conversely, if the noise is near the tumour, it could cause difficulty in diagnosis later on. For the
rest of this post, I’ll refer to these impacts as task-agnostic and task-specific.
If a LQ image does not affect its use later, they are known as task-agnostic (TA) images.
If a LQ image does affect its use later on, they are known as task-specific (TS) images.
The use of an image later on after its capture is referred to as the downstream task.
It may appear straightforward, but capturing good quality and clinically relevant medical images
to detect, grade, and identify prostate cancer tumors from MRI images is challenging because of
its strong dependency on image quality. Consequently, there can be significant variation amongst clinicians, and between 7-14% missed cancers. Inter-clinician variation in labels could
return conflicting results when building machine learning models.
Why should we try to ‘redo’ noisy images? We could automatically remove these images so that
it would not affect the task performance. However, fine tuning the images to correct artifacts
can return more streamlined and efficient imaging protocols, in contrast to simply removing an
image just because it contains sufficient noise that affects performance.
In their work, the authors focus on identifying poor quality images which suffer from artifacts
which could be corrected instead of being rejected. Clinicians can either recapture the image, or
perform necessary corrections before saving the image. The Big Picture: to create automatic IQ
assessment methods for 2 main purposes: reproducibility and reduced inter-human variability.
Their proposed model has 2 main components. The first component trains a deep learning
model to identify LQ images using a TA model. Simply put, if the image has noise and artifacts,
then the model will not be able to reconstruct the original image it was presented and will flag it
In the second component, the authors use the TA model alongside a new TS model. The
importance of the TA model and TS model can be weighted using suitable parameters,
depending on the user’s needs. This allows users to have the flexibility to decide what they care
about and much more: the performance of the downstream task VS the amount of noise
in the image. For example, if the user is interested in filtering all LQ images, then the TS model
is given priority, and vice versa.
Each image is evaluated individually, unlike other typical models where images are evaluated in
groups to get an average metric. Therefore, individual LQ images due to clinical challenges VS
artifacts can be identified.
The results of their experiments show several findings – one of the more important findings was
that accounting for different user needs by incorporating weighting flexibility in the TA and TS
model allows for varying definitions of IQ assessment to be learnt. Such flexibility would ensure
that the algorithm would be more generalisable when considering different types of medical
images and downstream tasks.
Moving forward, could their method be incorporated into training programmes for clinicians in
training? For example, would a new trainee be able to identify images that have artifacts that
can be reacquired VS those which are clinically challenging? If so, then they would be able to
retake the image on-the-fly and reduce the resources required later on. Conversely, if the image
is difficult to interpret because of additional complexities, then other senior clinicians could be brought in to help analyse the image. In a high-pressure and time sensitive environment, these
time-saving steps are not trivial and could help streamline and improve the medical image