Clinical Trial Imaging: A Focus on Reader Agreement

Joseph Pierro and David Raunig | |

The radiology literature has reviewed variations in radiologists’ interpretations for more than 70 years. In fact, an evaluation of “roentgenographic methods”1 commenced as early as a World War II  demonstrating that variability between and among readers was a larger factor in discordance than image size (or the image recording medium).  Many experts also cite a classic 1959 study by L. Harvey Garland where he reported on the intra-reader and inter-reader variability in chest radiographassessments2.

Our past blogs provide a background and explain the purpose of Blinded Independent Central Reads (BICR).  Here we focus on the many factors, which influence overall reader performance and reader discordance. Even though some factors can be somewhat controlled, there is still a considerable risk from the inherently variable nature of medical image interpretations; BICR provides a great service to mitigate much of that risk.

Experienced imaging partners will train readers on the study and perform periodic assessments of reader performance during the entire study

The factors which impact reader agreement may be broken down into several categories:

Imaging Expertise:

Experts who evaluate imaging studies (e.g. radiologists, cardiologists, etc) are deemed competent based on:

  • Appropriate medical training obtained in residency, and/or subspecialty fellowships
  • Medical board certifications, expertise and experience gained during clinical practice (academic or private practice)
  • Experience with the study population and imaging modality (ies) used
  • Experience with similar clinical research studies/therapeutic areas including experience performing prior blinded reads (which is more important than you might think)
  • Level of continuing education required for state or geographic licensing including continual learning by quality or peer review processes; academic status and number of publications in the study area of interest.


Clinical Study Protocol Design:

Key components include the disease and subject population, i.e. where some imaging assessments are more complex or challenging to assess response to treatment).  For example, in ovarian, lung, breast, or prostate cancer, it’s difficult to select potential measurable target lesions, evaluate lymph nodes, or determine reliable lesion borders to measure over time (e.g. mesothelioma).   Additionally, differences in the mechanism of action of novel or standard of care treatments may have an unexpected impact on image interpretations. Other assorted study design elements play a part in increasing the complexity of medical image assessments, including:

  • Study arms/treatments
  • Imaging modality
  • Schedule of imaging intervals
  • Standard of care versus specialized imaging protocols
  • Standard validated response assessment or novel assessment requirements
  • Inclusion of areas of disease treated with radiation therapy
  • inclusion of clinical information and the assessment endpoints (safety or efficacy related) used as adjudication variables including endpoint agreement rules


Operational Aspects of Imaging Assessments:

  • Pre-study reader training on protocol criteria, read platform display and software systems, including the level of guidance the software provides to readers (e.g., automated response determination, assistance with disease identification, measurement, tracking, etc.)
  • Lesion selection criteria and guidance of different areas of disease (e.g. local versus nodal versus hematogenous spread of gastrointestinal cancer)
  • Optimal lesion selection according to the criteria (e.g. number and quality of target versus non-target lesions for RECIST 1.1)
  • Imaging quality or incomplete exams which may lead to nonevaluable timepoints and missing data (e.g. baseline contrast-enhanced CT of the liver demonstrated a well-defined hypervascular tumor which may not be visible on the post-treatment imaging time point which only included a non-contrast CT exam of the liver)
  • Timeline constraints that affect both the time available to perform study reads and the reader’s attention in the detection of new disease


“It is well known that variability exists in the interpretation of radiological examinations, even among experienced radiologists. Disagreements among radiologists differ in frequency, degree of severity, and potential consequences to the patients.”2 Early in their training, medical students are taught to consider individual aspects of health care providers and that even experienced experts may substantially disagree with each other’s assessment.

All of these factors are managed as a means of controlling reader variability when imaging labs implement a BICR. And, experienced labs will utilize a restricted pool of experienced readers who possess an appropriate level of experience with the study indication, study response endpoints and imaging modality(s) used in the study.

Experienced imaging partners will train readers on the study requirements using sample images to align readers’ approach (reader calibration) to applying the study requirements/assessments to the image reads and perform periodic assessments of reader performance during the entire study.


Joseph Pierro is the Medical Director of Imaging at ERT and David Raunig is the Senior Principal Imaging Statistician at ERT.



  1. Birkelo, C. C. and others. Tuberculosis case finding; a comparison of the effectiveness of various roentgenographic and photofluorographic methods, J. Amer. Med. Assoc. 1331:359-66, 1947.
  2. Abujudeh, H.H et al Abdominal and pelvic computed tomography (CT) interpretation: discrepancy rates among experienced radiologists; Eur Radiol (2010) 20: 1952–1957