Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper

Background: Clinical free-text data (eg, outpatient letters or
nursing notes) represent a vast, untapped source of rich
information that, if more accessible for research, would clarify
and supplement information coded in structured data fields. Data
usually need to be deidentified or anonymized before they can be
reused for research, but there is a lack of established guidelines
to govern effective deidentification and use of free-text
information and avoid damaging data utility as a by-product.
Objective: This study aimed to develop recommendations for the
creation of data governance standards to integrate with existing
frameworks for personal data use, to enable free-text data to be
used safely for research for patient and public benefit. Methods:
We outlined data protection legislation and regulations relating to
the United Kingdom for context and conducted a rapid literature
review and UK-based case studies to explore data governance models
used in working with free-text data. We also engaged with
stakeholders, including text-mining researchers and the general
public, to explore perceived barriers and solutions in working with
clinical free-text. Results: We proposed a set of recommendations,
including the need for authoritative guidance on data governance
for the reuse of free-text data, to ensure public transparency in
data flows and uses, to treat deidentified free-text data as
potentially identifiable with use limited to accredited data safe
havens, and to commit to a culture of continuous improvement to
understand the relationships between the efficacy of
deidentification and reidentification risks, so this can be
communicated to all stakeholders. Conclusions: By drawing together
the findings of a combination of activities, we present a position
paper to contribute to the development of data governance standards
for the reuse of clinical free-text data for secondary purposes.
While working in accordance with existing data governance
frameworks, there is a need for further work to take forward the
recommendations we have proposed, with commitment and investment,
to assure and expand the safe reuse of clinical free-text data for
public benefit.

