BreastScreen Health Care: Investigation into

advertisement
1
TO
Dr Don Mackie (Chief Medical Officer)
Jill Lane (Director National Services Purchasing)
FROM
Dr Marli Gregory (Clinical Advisor, BreastScreen Aotearoa)
Jacqui Akuhata-Brown (Manager National Screening Unit)
DATE
17th April 2012
SUBJECT
BreastScreen Health Care:
Investigation into Internal Report and Recommendations for Further
Action.
EXECUTIVE SUMMARY
A recent internal report (the Report) from BreastScreen Health Care (BSHC) raised concerns of a
possibility that:


there may have been systemic problems with the quality of reading screening mammograms
at BSHC, and
a number of individual women may have experienced harm from delays in diagnosis of
cancer beyond what might be expected in a screening programme.
Accordingly the National Screening Unit (NSU) carried out a number of initial investigatory steps in
order to assess the likelihood that the Report findings reflect actual programme failure.
The purpose of this paper is to:



report on the NSU’s investigation findings,
provide an assessment of the materiality of the Report findings, and
identify options for further investigation and/or action.
There is a specific risk with screening programmes in so far as false negatives findings (and therefore
the potential for cancers not being detected in individual women) are an accepted aspect of
screening. In this context, and from the available information, there can be reasonable confidence
that there has not been unacceptable harm to women in this instance.
Various options could be pursued in order to investigate further but these all have a significant risk of
not providing any greater certainty than what already exists.
The results of this investigation, the routine audit carried out in 2011 and some of the results from
recent independent monitoring reports indicate that ongoing work is required to improve the quality
and level of performance at BSHC.
Recommendations
1. No further investigations should be carried out to establish whether there was systemic failure
of screening or harm to individual women during the period covered by the Report.
2. Resources should be fully directed at working with Southern DHB in supporting lasting quality
improvement at BSHC.
3. Further work should be carried out to refine the formal monitoring parameters for BSA.
4. Communicating the outcome of this investigation should include a clear description of the risks
as well as the benefits of population based screening programmes. This should include
clarifying that the result of a screening test provides a statement about the probability of
disease being present rather than a diagnosis.
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
2
On 13th March 2012 the National Screening Unit (NSU) was provided with the report of an ad hoc
review (the Report) carried out by a radiologist (Dr Y) formerly employed by BreastScreen Health
Care (BSHC). The review findings suggested that 47 out of 108 women (43.5%) who had previous
mammograms which could be reviewed may have experienced delays in identification of
malignancies. A subsequent reread of the mammograms of the 47 potentially affected women by a
second radiologist (Dr Z, the clinical director of another BSA provider) indicated that 28 (25 per cent)
of the women identified may have had delays in identification of malignancies.
.
The implications of the Report findings are that:


there may have been systemic problems with the quality of reading screening mammograms
at BSHC, and
a number of individual women may have experienced harm from delays in diagnosis of
cancer beyond what might be expected in a screening programme.
The potential for harm arising from both false positive and false negative findings is a recognised risk
of all screening programmes (hence the need for rigorous quality assurance and monitoring regimes).
The benefits of formal screening programmes lie at the population level by reducing mortality and
disability rates through early detection and treatment of cancer (or pre-cancerous conditions). The
screening test itself provides an individual woman with a probability statement about the likelihood
that disease is present – not a definitive diagnosis. As a statement of probability false negatives are
inevitable – consequently it is inevitable that women in screening are exposed to the possibility of
harm. Quality assurance mechanisms within organised screening programmes are aimed at ensuring
that, as far as possible, the benefits of screening are maximised and inevitable harm, minimised.
Questions of harm in a screening programme must be framed as unacceptable levels of harm as a
consequence of the programme deviating from defined standards.
Taken in isolation, the Report findings do not provide sufficient evidence to establish whether there
has been systemic under-reporting nor whether individuals may have been harmed. Accordingly the
NSU carried out a number of initial investigatory steps in order to assess the likelihood that the Report
findings reflect actual programme failure. The investigation included:





an external peer review of the Report,
a review of BSHC’s performance parameters from routine monitoring,
a search of the published literature on findings from reviews similar to that carried out by Dr Y,
a re-read of the mammograms in question by an external panel as part of a set including
control mammograms, and
external review of BSHC radiologists’ performance statistics.
The purpose of this paper is to:



report on the NSU’s investigation findings,
provide an assessment of the materiality of the Report findings, and
identify options for further investigation and/or action.
1) Investigation Findings
a) Peer Review and BSHC Performance
The Chair of the BreastScreen Aotearoa (BSA) Independent Monitoring Group (IMG),
Professor Richard Taylor, was asked to review the Report. Professor Taylor’s report is
attached as Appendix One. Key points from his report follow.
 The standard monitoring indices for BSHC as assessed by the IMG Report for
January 2009 to December 2010 including trends (to December 2011) have been
satisfactory, and the latest Interval Cancer Report (still in draft) shows no elevated
rates in BSHC. The time period for these reports overlaps that of the Report.
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
3
 Although for some monitoring indices BSHC is worse than other Lead Providers, this
would also be the case for some other Lead Providers.
 There is a significant issue of observer bias in the Report and its subsequent review
by Dr Z, since both radiologists that read the prior mammograms were aware of the
definitive diagnosis of cancer, and probably also the localisation of the lesion.
 There is evidence that the proportions of small cancers ≤10 mm and ≤15 mm in this
subset of 27 women are less than BSA targets, and less than that reported by
BSHC for January 2009 to December 2010. The routine monitoring indices for
small cancer detection for BSHC are in the target range based on the established
criteria, especially for subsequent screens (which constitute the majority of screens,
and were the only screens considered in the Report).
Although not statistically significant, the point estimates for detection of cancers less
than 15 mm diameter are very low.
The most recent Independent Monitoring Reports of screening and assessment (for the
periods July 2008 to June 2010 and January 2009 to December 2010) identify that BSHC is
mostly on target or exceeds targets for most biennial indicators. However targets not
achieved include:
 a high rate of referral to assessment for initial screens (target <10%)
14.3%* (July ’08 - June ’10)
11.6% ns (Jan ’09 – Dec ’10)
* = statistically significant, ns = not significant
 a low percentage of women being offered assessment within timeframes and a
suggested link between this delay and the high recall rate for first screens (target >
90%)
48%* (July ’08 - June ’10)
54%* (Jan ’09 – Dec ’10)
 a low percentage of cancers from referral to assessment for initial screens (target
>9%)
4.6%* (July ’08 - June ’10)
5.8% ns (Jan ’09 – Dec ’10)
 a very low rate of detection for invasive cancers <15mm for initial screens (target
>30.5/10,000 screens)
5.9 ns (July ’08 - June ’10)
7.5 ns (Jan ’09 – Dec ’10)
b) Literature Review of Mammogram Review Methodologies and Investigations into Breast
Screening Programme Failures
It is well recognised that different methods of mammogram review will lead to greater or
lesser numbers being classified as false negative or missed. The general principles for
these reviews are:
1. the less blinded the reviewers, the higher the proportion of false negatives identified,
2. the more reviewers there are, the lower the proportion of false negatives identified,
and
3. the more non-cancer (controls) in the mammogram set, the lower the proportion of
false negatives identified.
This leads to huge variation in identification of false negatives, even within the same set.
For example, when a group of five radiologists reviewed interval cancer mammograms, the
false negative rate varied from 6% (if 5/5 agreement required), through 14% (majority
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
4
opinion), to 38% (if 1/5 agreement required) 1 The variance in methods of review of
mammograms makes it difficult to compare results from different studies.
A summary of published findings from broadly similar studies is provided in Appendix Two.
In summary, it can be seen that the methods used in radiological review are not
standardised, and can make a dramatic difference to recorded rates of false negatives. A
range of false negative rates at mammography from 14-50% can be observed in the
literature on similar studies. The false negative rates of 25% (Dr Z) and 43.5% (Dr Y) are
within this range.
c) Seeded Set Review
The mammograms of 44 women subsequently diagnosed with cancer (identified by Dr Y)
were seeded into a set along with 76 ‘normal’ mammograms chosen at random from the
same period (1st January 2007 to 31st December 2008). The ‘case’ mammograms were
distributed randomly among the ‘control’ mammograms and the resultant set was read
independently by three experienced, accredited BreastScreen radiologists.
In assessing the findings the panel then replicated the Interval Cancer Review Process
defined in the BSA National Policy and Quality Standards 2008 (Appendix Q, page 157).
The consensus finding of the panel was that twenty eight cases would have been recalled
for lesions that subsequently proved to be cancer and which were therefore interpreted as
having a delayed diagnosis (false negative). This represents 23% false negatives in the
seeded set review which compares favourably with findings from published studies using
similar methods.
d) BSHC Radiologist Performance Statistics
De-identified BSHC radiologist screening statistics for the period, and subsequently, have
been reviewed by the Clinical Directors of two other BSA providers. In summary they found
(verbal and email communication to Dr Gregory):
o for the prevalent (initial) screening round – high rates of recall to assessment, low
cancer detection rates (low positive predictive value), high false positive rate,
o for incident (subsequent) screening rounds – low rates of recall to assessment,
cancer detection rates at or just below targets, high false negatives rates with
sensitivity rates all in the low 80% range, and
o they did not identify a radiologist who was an outlier in terms of poor performance.
The reviewers’ overall comment was that from these data they would consider the
radiologists to be adequate within a larger unit and when paired with radiologists with better
performance; but that as a small unit, they should not be reading each other’s films.
2) Assessment and Conclusions
There are two related questions to answer:

has there been a systemic failure in the quality of reading screening mammograms
(beyond acceptable parameters for a screening programme),
and if so,

1
has any individual woman, or women, experienced harm?
Britton, P. D., J. McCann, et al. (2001). "Interval cancer peer review in East Anglia: implications for monitoring
doctors as well as the NHS breast screening programme." Clinical Radiology 56(1): 44-49.
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
5
Given that there is no ‘gold standard’ test that can provide a definitive answer to these
questions all the available information must be weighed in order to come to a determination
‘on the balance of probabilities’. Table One provides a framework for assessing the available
information.
Table One: Information Sources for Assessment
Has there been systemic under-reporting?
Have any women experienced harm?
a) Data from service level benchmarks
a) This would be established on the basis of
clear evidence of failure at service or
individual practitioner level.
o
IMG reports
o
interval cancer reports
b) Individual radiologist performance
o
reviews of radiologist performance
data
o
external reviews of interval cancers
c) Ad hoc internal reports
o
bench mark to published comparable
studies
o
peer review by Professor Taylor
o
seeded set review results
d) Formal quality audits (IANZ audit)
a) Assessment of evidence for systemic failure
Table Two provides a summary of the information currently available in respect to the
question of systemic under-reporting (see Section 1 of this paper for details).
Table Two: Information on Systemic Under-Reporting
Commentary
IMG reports
BSHC is a small provider (in terms of the volume of
women screened) and so has wide confidence intervals
around most performance measures. BSHC performs
adequately across standard indices but is a poor
performer (relative to other providers) on some indices.
Of note in the latest IMG report is a very low small
invasive cancer detection rate (7.5 per 10,000 screens,
target 30.5 per 10,000)
Interval cancer reports
The rate of interval cancers is a proxy outcome measure
of programme quality. The most recent (draft) interval
cancer report shows no increase in rate for BSHC.
De-identified radiologist
statistics
Review of statistics by two external radiologists found:

relatively high false positive rates for prevalent
round screening,

relatively high false negative rates for incident
round screening (sensitivity in low 80%), and

no outliers in terms of poor performance.
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
6
Interval cancer external
reviews as required by
NPQS.
Provided for BSHC by BreastScreen Waitemata and
North (BSWN) but appears not to have occurred in the
last eighteen months.
Ad hoc report and
international comparisons
The original ad hoc report (Dr Y) suggested that 47 out of
108 women (43.5%) diagnosed with cancer had had
previous mammograms ‘misread’. The follow up by Dr Z
suggested 25% ‘misreads’. The published literature
indicates that, depending on the rigor of the re-read
methodology, ‘misread’ rates can range between 14-50%.
Seeded-set review
The false negative finding from the seeded set review
was 23% which is comparable with the results of
published studies using similar methods.
IANZ audit
The formal audit conducted by IANZ in 2011 identified a
number of quality issues – some of which point toward
fundamental problems with quality assurance processes
at BSHC and some of which were identified in previous
audits.
Taken as a whole these data suggest that BSHC is a small provider that performs
adequately in regard to national monitoring parameters but for which there are concerns
around a number of quality measures.
The study method used by Dr Y does not establish definitively whether there has been
systemic under-reporting. Of note is the fact that the proportion of ‘misreads’ is within
the range of the findings of similar published studies. However the relatively high rate of
potentially missed small cancers does align with recent IMG reports and with the
suggestion that there is low sensitivity by radiologists in incident round screening.
b) Conclusions
Based on the information available, and on the balance of probabilities, there is
insufficient evidence to conclude that there has been a failure in the quality of reading
screening mammograms beyond acceptable parameters for a screening programme.
Accordingly, and acknowledging the recognised risks inherent in all screening
programmes, there can be a reasonable degree of confidence that it is unlikely that
women have experienced unacceptable harm.
The caveat on these conclusions is that further in-depth analysis could be carried out to
test the questions, which might provide a greater level of confidence, and these are
explored in Section 3.
2) Options for Further Action
Given that there is no gold standard for determining the threshold for when more intensive
investigation should be carried out, we looked at the published literature from formal
investigations in other countries – the results are provided in Appendix Three.
In summary all three published investigations arose as the results of concerns around
individual radiologist performance. The two from the NHS were as a result of peers
identifying poor performance at assessment clinics. The Canadian investigation resulted from
concerns around a screening radiologist working in isolation and outside of a formal quality
assurance system.
The circumstances that gave rise to these investigations (failure at assessment versus
screening, identification of outlier performance by peers and professional isolation) are not
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
7
directly comparable to our situation and so cannot provide a guide for decision making on
whether further investigation should be carried out here.
In deciding what, if any, further actions should be carried out consideration needs to be given
to:



the degree of confidence in the findings of the initial investigation,
the costs, benefits and impact of further investigation, and
the likelihood of conclusive findings from further investigation.
a) Follow-up quality issues from IANZ audit
In this option no further investigations would be taken in regard to the Report. The focus
for the NSU and SDHB would be on addressing the quality issues identified in the IANZ
audit report. Specific support and monitoring could also be put in place to lift the
sensitivity of screening and to raise the small cancer detection rate.
Professor Taylor’s recommendations on refining BSA monitoring parameters could be
progressed further by the NSU.
The strength of this option is that it focusses attention and resources on quality
improvement. The risk is that if there has been a systematic failure and if there has been
harm then this will not be identified.
b) Detailed investigation of cancer case series
This option involves a detailed case notes review of all 108 women with cancer identified
in the Report. The information gathered would be for the initial purpose of descriptive
epidemiology and hypothesis generation for subsequent investigation. The purpose
would be to identify individual women who are ‘outliers’ in respect to any particular
‘exposures’ (such as radiologist or location or time of screening) and hence make a
determination of likelihood of individual harm.
This would involve additional costs and would take some time to complete. There is also
a high probability that the results will again be inconclusive. There would also be an
outstanding question of what would be the result if the same sort of analysis was carried
out with another BSA provider.
The strength of this option is that it lifts the probability (although it does not guarantee)
that if there was harm then it will be identified. The risks are that such an investigation
will be time consuming, costly and potentially inconclusive while undermining confidence
in screening.
c) Carry out a larger blinded reread study
This involves a thoroughgoing formal reread of the mammograms of all 108 cases
identified in the Report. A robust methodology would be based on a formal study
protocol with features such as:



using three or more control mammograms for every case,
being re-read by a panel of 3 or more suitably qualified screening radiologists
overseas,
with each set merged with the radiologists’ day-to-day work.
This would take considerable time to organise and would be very expensive. As there
would be no bench-mark to other BSA providers (unless another one was selected as a
control) the risk that the findings are inconclusive is high.
The strengths and risks for this option are as for option b).
d) Offer interval screening mammography to concerned women
This option would involve communication to the eligible population in Otago-Southland
that the investigation had not demonstrated that there had been a systemic failure of
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
8
screening. However women would be provided with a free ‘interval’ screening
mammogram (that is, before their next scheduled screening) if they had any concerns.
This option is based on the fact that it is impossible to be sure that no harm has occurred
and therefore to provide concerned women with reassurance. However there is a risk of
this option providing a mixed message to women in saying that the service is considered
to be safe enough to continue but to offer additional mammography ‘just in case’.
On balance we consider that:

there is a specific risk in screening programmes in so far as false negative
findings (and therefore the potential for cancers not being detected in individual
women) are an inevitable and accepted aspect of screening,

in this context, and from the available information, there can be reasonable
confidence that there has not been unacceptable harm to women in this
instance, and

the uncertainty of achieving any clearer outcome as well as the risks involved
with further studies mean that additional investigation to determine the materiality
of the Report is unwarranted.
Accordingly we make the following recommendations.
1. That no further investigations be carried out to establish whether there was
systemic failure of screening or harm to individual women during the period
covered by the Report.
2. Resources should be fully directed at working with Southern DHB in supporting
lasting quality improvement at BSHC.
3. Further work should be carried out to refine the formal monitoring parameters for
BSA.
4. Communicating the outcome of this investigation must include a clear description
of the risks as well as the benefits of population based screening programmes.
This should include clarifying that the result of a screening test provides a
statement about the probability of disease being present rather than a diagnosis.
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
9
APPENDIX ONE
BSHC Mammography Screening Issue Taylor R. 10 April 2012 1
Comments on the BreastScreen Health Care (Southern South Island) mammography screening
issue
Dr Richard Taylor
MBBS(Syd), DTMH(Lon), FRCP(UK), PhD(Syd), FAFPHM.
Professor, School of Public Health and Community Medicine
UNSW, Sydney Australia
r.taylor@unsw.edu.au
Summary
1. The standard monitoring indices for BSHC as assessed by the IMG Report for Jan 2009-Dec 2010
including trends (Dec 2011) have been satisfactory, and the latest Interval Cancer Report (still in
draft) shows no elevated rates in BSHC. The time period for these reports overlaps that of the Audit
Report [Dr Y 2012]. However, BSHC has shown consistently worse performance, even if not
statistically significant, and consistently low rank in relation to other Lead Providers for some indices.
For example, the initial screen cancer detection rates.
2. Although for some monitoring indices BSHC is worse than other Lead Providers, this would also be
the case for some other Lead Providers. Small numbers make interpretation difficult which is why
95% confidence intervals are supplied. In any comparison between areas there will be a range from
best to worst for all indices, but this does not necessarily mean that the worst performers are
unsatisfactory according to set criteria, and there may be only small differences between areas.
3. The Audit Report [Dr Y 2012] could have benefited by advice on a more structured methodology
and reporting framework, but this may not have been possible in the circumstances. It would also
have benefited by a review of the scientific literature on published findings from similar studies. It is
commendable that such an investigation was carried out.
4. There is a significant issue of observer bias in the radiologist assessments, since both radiologists
that read the prior mammograms were aware of the definitive diagnosis of cancer, and probably also
the localisation of the lesion. The re-readings by Dr Z of cases which Dr Y considered as “possible
misses” indicate that she would have recalled around 2/3. Further assessment of the readings of the
prior mammograms by inclusion into a Test Set for naive radiologists, unaware of the diagnosis and
circumstances, would provide better evidence of whether such women should have been recalled.
However, in Test Sets there is a higher expectation of abnormality and lesions for recall than in
screening practice, and readers also may be more attentive when reading test sets than in routine
practice.
5. The recall to assessment in BSHC is a little below average of 3% (desirable target <4%), but is
similar to 2 other Lead Providers, and higher than one other. BSHC Mammography Screening Issue
Taylor R. 10 April 2012 2
6. The information and discussion on tumour size in the Audit Report [Dr Y 2012] is difficult to follow,
and mean values can be misleading because of outliers. The revised Spreadsheet data provides
information on 28 women which could have been subject to late diagnosis because of possible
missed lesions on prior mammograms (of the 47 patients mentioned in the Report by Dr Y); these
cases consist of 1 case of DCIS and 27 patients with solid tumours. There is evidence that the
proportions of small cancers ≤10 mm and ≤15 mm in this subset of 27 women are less than BSA
targets, and less than that reported by BSHC for Jan09-Dec10. The routine monitoring indices for
small cancer detection for BSHC are in the target range based on the set criteria, especially for
subsequent screens, which constitute the majority of screens, and were the only screens considered
in the Audit Report [Dr Y 2012] (since there is no prior film for initial screens).
7. In the Spreadsheet subset of 27 patients with solid tumours, the node negative rate (approximately
60%) is lower than the BSA target (>75%) and lower than that achieved by BSHC and BSA over
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
10
Jan09-Dec10, both of which were on target. The proportion of node negative patients from the
Spreadsheet data was higher than that reported by Dr Y for the “indeterminate‟ group.
8. Before conclusions are drawn on whether there is excessive late diagnosis because of possible
missed lesions on prior mammograms in BSHC, normative information should be obtained from: (a)
published studies from other mammographic screening programs; and (b) expert opinion from New
Zealand and international radiologists with extensive experience with mammographic screening.
Similar studies of other Lead Providers could be conducted, but this may produce more issues.
9. Increased recall of suspicious changes on screening mammograms for further assessment, which
may increase cancer detection, needs to be balanced by the higher recall and investigation of
mammographic changes which are subsequently found not to be malignant. Information on this
balance can be found in the scientific literature, and can be examined in test sets to some extent.
9. The hypothesised explanation (lack of „breast awareness‟) for absence of elevated rates of interval
cancers, but diagnosis at the next screen, of lesions possibly missed on the prior mammogram,
should be investigated further with a properly constructed study.
10. My opinion of the much higher palpability of lumps by surgeons compared with women is that it is
partly related to prior knowledge of surgeons of a possible solid lesion visualised on mammography,
and its localisation.
Recommendations
Further studies to be considered
1. Investigate whether women in this instance at BSHC should have been recalled on the basis of the
prior mammogram by constructed studies:
(a) further assessment of the readings of the prior mammograms by inclusion into a Test Set for naive
radiologists (of differing experience and other predictors of expertise), unaware of the diagnosis and
circumstances;
(b) obtain information on normative performance of radiologists in similar circumstances from a review
of published papers and reports from other screening programs, expert opinion, and possibly a review
of other Lead Providers.
2. Investigation of the reason why the possibly missed cancers did not translate into elevated interval
cancer rates for BSHC, examining proffered hypotheses.
Monitoring interpretation and policy
Depending on conclusions from above investigations:
3. In assessment of monitoring indices, weight more heavily in the future evidence of consistently
worse performance in relation to targets, even if not statistically significant, and consistently low rank
in relation to other Lead Providers.
4. Standards for performance indicators could be reviewed and readjusted.
5. New monitoring indices could be introduced related to radiology reading performance.
Part I
Comments on: PRELIMINARY FINDINGS, IMG REPORT BREAST CANCER AUDIT
Dr Y (2011/12) covers July 2008 to June 2010 (24 months)
1. Page 1, Background.
1.1. BSHC have been consistently lowest compared with other Lead Providers for detection rate of
DCIS and invasive breast cancer (combined), for initial screens and subsequent screens trend
information based on 2 year data, with the upper 95% CIs and lower than the NZ average for
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
11
subsequent screens in the last few reports [IMG Report for Jan 2009-Dec 2010 (Dec2011)]. There are
no targets set for this parameter.
1.2. BSHC have manifested lower point estimates compared with targets for cancer detection based
on initial screens from trend data for the following indices, although these rates are not statistically
lower than the target based on 95% Confidence Intervals (CIs): proportion invasive cancers <15mm,
initial screens, 2 years data (low but not significantly so); Invasive cancers <15mm per 10,000 women
screened, initial screens, 2 years data (low but not significantly so); proportion invasive cancers
<10mm, initial screens, 2 years (low but not significantly so); invasive cancers <10mm per 10,000
women screened, initial screens, 2 years data (low but not significantly so) [IMG Report for Jan 2009Dec 2010 (Dec2011)].
In a mature program like BSA (NZ), initial screens are mostly concentrated in younger women in the
target age range, and lesions are more difficult to detect on mammograms in younger women
because of residual breast tissue. Although initial screens become much less numerous as a
screening program matures, once the proportion of initial screens stabilisers, they may be considered
as a sensitive quality indicator of mammography readings, even though they will not affect overall
performance because of their small number.
1.3. Other indicators of cancer detection from trend data appear satisfactory in relation to targets and
to NZ average [IMG Report for Jan 2009-Dec 2010 (Dec2011)], and this needs to be considered in
terms of BSHC performance. Small numbers are involved in parameter estimates, which is why 95%
CIs are calculated. However, consistently low estimates for the same indices, even though not
individually statistically significant, ought to be considered.
1.4. I agree that ranking in comparison to other Lead Providers ought to be considered if this is
consistent for certain indices and the differences are of reasonable magnitude.
1.5. The most recent BSA Interval Cancer Report (still in draft), which covers screening up to and
including 2007, with follow-up of interval cancers to 2009, did not indicate higher interval cancers for
BSHC compared to other Lead Providers. In fact, BSHC display lower point estimates for 1st or 2nd
year interval cancers, following initial or subsequent screens, compared to other Lead Providers. This
report covers interval cancers diagnosed up to and including 2009 (from screens up to and including
2007) and overlaps the period of the Audit Report [Dr Y] and the most recent IMG Report for Jan
2009-Dec 2010 (Dec2011).
2. Page 1, Purpose of the audit
2.1. Although it is commendable that such an audit has been undertaken, it would have been better if
there had been some input which would have refined the questions so each could have been
addressed by a specific methodology. It would also have benefited by a review of the scientific
literature on published findings from similar studies. However, this may not have been possible in the
circumstances.
3. Page 2, What I Did
3.1. This is the Methodology section, although it is not labelled as such. It would have been preferable
for a more nuanced and explicit rendering of the methodology in relation to the questions, including
methods of analysis, but in view of the circumstances, this may not have been possible.
4. Page 3, Results and analyses
In general, commentary or discussion of Results should not occur in this section, but rather in a
subsequent section. Below I will address the results, as well as the comments on them.
4.1. Tumour size
Most of the material here is background or explanation of methods, not Results.
I find the discussion on DCIS difficult to follow; it appears to be considered because of the effect on
average lesion size of solid tumours if DCIS is included. Similarly, I have difficulty understanding the
relevance of descriptive material on histological (morphological) type. In any case, despite the
implications related to size of tumour of DCIS or morphological type (lobular), these were not
excluded from subsequent analyses.
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
12
Data are presented on average tumour size for various subsets, and claims are later made that these
are larger than desirable. Normally tumour size data are presented as proportions of cancers between
certain cut-offs, such as: =<5mm, >5mm to =<10mm, >10mm to =<15mm, and =<10mm, =<15 mm,
and so on. Data are not usually presented as means since this parameter is significantly affected by
outliers.
4.2. Page 6, Recalls
Since approximately 40% of mammograms were read by 3 radiologists, this is the proportion in which
the lesion was missed for a recall by one of them. One would need comparative data from other
services and the scientific literature to properly evaluate this finding.
The recall to assessment in BSHC is a little below average of 3% (desirable target <4%) but is similar
to BSAL and BSSL and higher than BSCtoC [IMG Report for Jan 2009-Dec 2010 (Dec2011)].
4.3. Page 6. Prior Screens
The table on Prior Screens is a little difficult to read because of inadequate headings. It appears that
around 7% had evidence of a lesion suggestive of cancer on prior screens according to Dr Y.
However, this increases to 34% of cases if the indeterminate category is included. Dr Y indicates that
the „intermediate‟ category consists of “potential misses”, but does not indicate that they all should
have been recalled, or that she would have recalled all of them. It is stated in the Audit Report [Dr Y
2012] that a total of 34% of cases in „indeterminate‟ and „probably malignant‟ categories on prior
screens had “features of cancer”. This statement could imply that all or some should have been
recalled.
Dr Y admits to possible observer bias because she was aware of the diagnosis, and had seen the
recent mammogram which displayed and localised the tumour. The intention as a follow-up was to
use these images as part of a Test Set - along with normal mammograms and those with benign
lesions – to determine if other radiologists would have recalled them. It appears that this did not
occur. It should be noted that the expectation of malignant lesions in a Test Set is much higher than
found in screening mammography practice.
4.4. Page 8, Interval cancers
The monitoring evidence from the IMG Report for Jan 2009-Dec 2010 (Dec2011) does not suggest
that the cancer detection rate in BSHC is „poor‟ according to the standards set. Most screens are
subsequent, and the trend data for subsequent screens indicate that the <15 mm invasive cancer
detection rate and proportion of cancers detected <15 mm are satisfactory, with the point estimates
on target, and similar or better to BSCtoC. For the most recent biennium (Jan 2009-Dec 2010), the
subsequent screen invasive cancer detection rate =<10 mm was on target, although lower than other
areas, and the proportion of invasive cancers =<10 mm was on target and the second lowest, slightly
higher than BSAL. The subsequent invasive cancer rate (all sizes) was on target, and although the
lowest, was almost the same as BSSL.
The most recent BSA Interval Cancer (draft) report, which covers screens to and including 2007, with
follow-up to 2009, does not show higher interval cancers for BSHC compared to other Lead
Providers; in fact, the point estimates are lower for 1st or 2nd year intervals from initial or subsequent
screens. The period of cancer diagnosis in the most recent BSA Interval Cancer (draft) Report
overlaps the Audit Report [Dr Y].
It is hypothesised that cancers which should have appeared as intervals were diagnosed at the next
screen because the BSHC area consists of women who are not “breast aware”. This hypothesis
would require comparative investigation.
4.5 Page 8, Palpable lesions
It is stated that detected cancers are larger than desirable, but it is unclear upon which data this
statement is based. It appears to be based on mean size, but this is not usually used for categorising
tumour size. However, the revised Spreadsheet data (see below) indicate a low proportion of ≤10 mm
and ≤15mm tumours compared to targets in the subset of 27 women with solid tumours that should
possibly have been recalled on the prior mammogram.
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
13
There follows a series of hypotheses concerning why only around 7% of women reported lumps at
screening, but a lump was felt by the surgeon in 57%. However, the palpation by the surgeon was
presumably informed by prior knowledge that a lesion was seen on the mammogram and in a
particular location. The question is whether a palpable lump would have been found in asymptomatic
women without prior information from a mammogram. Without knowledge of the distribution of sizes
of breast cancers and the sizes of breasts they were in, it is difficult to determine what proportion
should have been palpable to women or clinicians.
4.6. Page 9, Nodal involvement
The most recent IMG Report for Jan 2009-Dec 2010 (Dec2011) indicates that BSHC at 76% was
better than target (75%) for the proportion of node negative cancers, and ahead of 2 other services.
The Audit Report [Dr Y] indicated approximately 70% node negative for the period July 2008 to June
2010 (24 months) which overlaps the period of the Dec 2011 IMG Report.
The Audit Report [Dr Y] states that only around 50% of malignant or indeterminate lesions (47 cases)
seen on a prior screen were node negative. This information is contradictory when compared to the
Spreadsheet data (below).
The revised Spreadsheet data supplied (see below) indicate that, for the subset of cases which may
have been missed on prior mammography (27 invasive solid tumours), the node negative proportion
was approximately 60%.
5. Overall comments
This states that the purpose of the audit was to indicate that there was a problem.
Part II
REPORT BY DR Z, with regards to a visit to BreastScreen Health Care, Dunedin over the
weekend of 10/3/2012 and 11/3/201
Dr Z examined mammograms of some of the cases of possible missed diagnosis identified by Dr Y
(4/3/12), and would have recalled 7 of 9.
9-11/3/12 Dr Z examined 35 prior screens (3 missing) from cancers diagnosed 2009-10 which were
categorised by Dr Y as „indeterminate‟ and were “potential misses”, and she would have recalled 20
(57%). If we assume that Dr Z would have recalled all 9 cases of „probably malignant‟, then for the
categories of „indeterminate‟ and „probably malignant‟ together, Dr Z would have recalled 29/44
(67%) of those classified as “potential misses” by Dr Y.
Dr Z also examined 40 prior subsequent screening mammograms from 2011 breast cancer cases (not
covered by the Audit Report), and would have recalled 9 (23%).
Of course, in all instances above, Dr Z was aware of the subsequent confirmed diagnosis of breast
cancer, as was Dr Y.
Part III
SPREADSHEET DATA
The revised Spreadsheet (April 2012) provides information on 28 women, including 1 case of DCIS,
which could have been subject to late diagnosis because of possible missed lesions on prior
mammograms (of the 47 patients mentioned in the Report by Dr Y). Thus there are 27 women with
solid invasive tumours for analysis.
Tumour size
As set out in the following table, women were classified according to the size (diameter) of the tumour
(mm) according to various standard criteria. If there were multiple tumours, the largest lesion was
chosen for each woman. On this basis, 17 cases or 63% were T1 according to TNM staging, 8 or
29.6% were T2, and 2 or 7.4% were T3.
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
14
Employing the small cancer detection target categories used by screening programs, including BSA, 7
or 25.9% were ≤10 mm (target: ≥ 30%) and 12 or 44.4% were ≤15 mm (target: >50%). That is, in this
subset of patients, both small cancer detection proportions are below target, although these figures
are based on small numbers, as evidenced by wide 95% CIs.
For women screened by BSHC Jan09-Dec10, the ≤10 mm proportion was 29.8% (95% CIs: 20.340.7%) based on 25 cases, which was, with BSAL, the lowest of all Lead Providers, although on
target (≥ 30%); the BSA proportion for the same period was higher at 37.2% (95% CIs 34.5-39.9%)
For women screened by BSHC Jan09-Dec10, the ≤15 mm proportion was 52.4% (95% CIs: 41.263.4%) based on 44 cases, which was on target (>50%); the BSA proportion for the same period was
56.8% (95% CIs 54.0-59.6%).
There is evidence that the proportions of small cancers ≤10 mm and ≤15 mm in this subset of 27
women which could have been subject to late diagnosis because of possible missed lesions on prior
mammograms are less than target and less than that reported by BSHC for Jan09-Dec10.
Sizes of solid tumours. Revised Spreadsheet data. April 2012-04-06
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
15
APPENDIX TWO.
Summary of findings from published studies of screening mammography reviews.
Author,
Programme
Cases
Blinding
Reviewers
Proportion of
False Negatives
Warren et al.
2003, UK2
602 (both screen
detected and interval
cancers)
Partial: Aware the set
was 100% cancer,
unaware of location/
history
3 radiologists
14% visible on earlier
screen
Baines et al (1991),
Canada3
Mix: 677 (both screen
detected and interval
cancers) with 5200
randoms
Partial: Aware of the
casemix, unaware of
location/history
1 radiologist
17% visible on earlier
screen
Saarenmaa et al
(1999), Finland4
131 (both screen
detected and interval
cancers)
Unblinded
1 radiologist
33%
(Screen detected 43%
Interval 19%)
Broeders et al (2003),
Netherlands5
234 (both screen
detected and interval
cancers)
Unblinded
1 radiologist, 1
radiographer
50%
(no significant
difference between
SDC and IC)
Daly et al. (1998), UK6
100 cancers detected
in the incident round
Partial: Aware the set
was 100% cancer,
unaware of location/
history
1 radiologist
25%
Daly et al. (1998), UK6
100 cancers detected
in the incident round
Unblinded
1 radiologist
44%
Jones et al. (1996),
UK7
Mix: 133 cancers
detected in the
incident round with
~400 normals
Partial: Aware of
casemix, unaware of
location/history
4 radiologists (but
only 1/4 required to
diagnose a false
negative)
19%
2
Warren, R. M. L., J. R. Young, et al. (2003). "Radiology review of the UKCCCR Breast Screening Frequency
Trial: potential improvements in sensitivity and lead time of radiological signs." Clinical Radiology 58(2): 128-132.
3 Baines, C. J., D. V. McFarlane, et al. (1990). "The role of the reference radiologist. Estimates of inter-observer
agreement and potential delay in cancer detection in the national breast screening study." Investigative
Radiology 25(9): 971-976.
4 Baines, C. J., D. V. McFarlane, et al. (1990). "The role of the reference radiologist. Estimates of inter-observer
agreement and potential delay in cancer detection in the national breast screening study." Investigative
Radiology 25(9): 971-976.
5 Broeders, M. J. M., N. C. Onland-Moret, et al. (2003). "Use of previous screening mammograms to identify
features indicating cases that would have a possible gain in prognosis following earlier detection." European
Journal of Cancer 39(12): 1770-1775.
6 Daly, C. A., L. Apthorp, et al. (1998). "Second round cancers: how many were visible on the first round of the
UK National Breast Screening Programme, three years earlier?" Clinical Radiology 53(1): 25-28.
7 Jones, R.D., McLean L., et al. (1996). “Proportion of cancers detected at the first incident screen which were
false negative at the prevalent screen.” The Breast 5: 339-343
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
16
APPENDIX THREE
Summary of published investigations into breast screening false negative issues
Reports examined:
1.
Burns, F. G. (2011). An independent external review of the breast screening unit at East
Lancashire NHS Trust (Final Version 5).
2.
Wilson, R. (2006). Report on a review of breast imaging at Altnagelvin Hospital, Belfast City
Hospital and Antrim Area Hospital September 2002 to November 2005. Department of Health Social
Services and Public Safety.
3.
Bélanger, H. and L. Charbonneau (2012). Rapport d’enquête. Révision des mammographies
et des tomodensitométries effectuées dans les cliniques de radiologie Fabreville, Jean-TalonBélanger et Domus medica, Collège des médecins du Québec 2008 – 2010
Summary:
The two NHS reviews are in relation to failure of individual clinicians to appropriately diagnose breast
cancer at assessment. The Quebec review focusses on the accuracy of an individual clinician to read
mammograms and the absence of mandated quality standards for private providers there.
Methods:
Retrospective review of mammograms/assessment notes is the methodology used in these 3 reports.
A panel of expert radiologists were used to determine if the read/assessment was correct.
Issues:
Some indicators of benchmarking of the investigated practitioner’s performance are mentioned, but
there is little commentary around this. The low attention to benchmarking risks exposing harms done
through screening that is not above the normal. Any review will discover cases that have slipped
through – the critical question is to determine if the amount discovered is quantitatively or qualitatively
above that which a well performing practitioner/provider would miss.
Contributory factors of relevance to BSHC:
•
The limits of performance data.
•
The risk of Radiologists working in isolation
•
The risk of under staffing / busyness
Report 1
The Burns report is an investigation into the practice of Dr X, who was identified as missing breast
cancer diagnoses at assessment clinic. Dr X was the Clinical Director of the East Lancashire
screening unit. The single root cause identified was that Dr X failed to update his practical screening
assessment skills in line with guidelines. The overall cancer detection rates for the unit were above
national targets and this led to his practice not being suspected until his poor performance came to
light through the identification of an interval cancer which prompted a closer internal look at Dr X’s
assessment clinic outcomes, which resulted in the discovery of two false negative assessments in a
single clinic.
Process
An incident team was established and decided to get peer-review of all the cases in assessment clinic
at that screening Trust (patients seen by Dr X and other radiologists). The initial review confirmed that
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
17
the error rate of Dr X was significantly higher than the other 2 radiologists and overall was outside of
acceptable practice. The review period was extended several times further – but only to look at Dr X’s
cases.
Comments
This represents a failure of assessment clinic. The scenario at BSHC is potential failure at
mammography – a different stage of screening both epidemiologically and clinically. It is important
that Dr X’s performance was benchmarked to make sure the potential misses were in excess of
expected. He was also found to be in breach of the National standards for screening practice.
Report 2
The extended review of a single radiologist was prompted by colleagues who raised concerns to their
Clinical Director about his clinical competence and performance. This involved assessment clinic
performance for both screen detected and symptomatic cases. Of note, he was not performing
ultrasound examination or taking biopsies on cases where the standards would require this to be
done.
Process
A team of radiologists performed the review of cases. All cases which had been solely reviewed by Dr
X during a time period were re-examined to see if correct diagnosis had been made. Some cases
which had been poorly assessed were brought back and re-assessed. It does not detail what process
was used to benchmark Dr X’s performance but does comment that “screening assessment clinics
carried out by other radiologists has shown an overall high standard of care with no evidence of
general poor practice.”
Contributory factors
•
Shortage of consultant breast radiologists in Northern Ireland is considered to be a significant
contributory factor.
•
A reliance on external locums was believed to increase the rate of referral to assessment.
The recall rates to assessment doubled during the year when these deficiencies were observed. It is
felt this was due to a reliance on locum radiologists to double read the screening mammograms but
who did not have subsequent responsibility for the assessment of cases recalled. It is for this reason
that the NHSBSP guidelines for breast screening radiology recommend that those involved in screen
reading are also directly involved in the assessment at the same site.
•
Otherwise the performance data of the Antrim screening programme had been satisfactory
and equivalent to other services. This illustrates the difficulty of relying on standard performance
statistics to measure the standard of care being provided.
•
The review identified that most errors occurred at sites where the radiologist was working
single handed.
Comments
This represents a failure of assessment clinic. The scenario at BSHC is potential failure at
mammography – a different stage of screening both epidemiologically and clinically.
Report 3
This report is written in French. Although an approximate translation has been achieved through
Google Translate, there remains a limit on our ability to interpret this report accurately.
The report reviews the performance of mammography reading done by a Quebec radiologist during
the period 9 October 2008 and October 9, 2010. These films were reviewed by other radiologists to
determine if the appropriate decision was made. It appears that this radiologist was practicing in
private clinics at least some of the time and these did not have to meet usual quality assurance
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
18
standards for breast screening. It is not clear if the performance of this radiologist is benchmarked, or
if the false negative reads were above that which would be expected.
The report confirms the discrepancies in the readings of the radiologist under investigation and, less
significantly, those of other radiologists. It raises issues on quality assurance mechanisms
surrounding the practice of imaging in private clinics. The recommendations largely focus on
improving quality assurance systems about individual radiologist performance (performance
standards, second reads of all mammograms read by outliers).
Investigation into BSHC Internal Report and Options for Further Action
17th April 2012
Download