Personally Identifiable Information Clinical Trial
Official title:
NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents
| Verified date | March 6, 2024 |
| Source | National Institutes of Health Clinical Center (CC) |
| Contact | n/a |
| Is FDA regulated | No |
| Health authority | |
| Study type | Observational |
Background: Electronic health records contain a vast amount of data about diseases and treatments. Researchers could use this data to test their ideas, but they would need to use records from more than just their own group of patients. But access to those records is restricted to ensure patient privacy. U.S. National Library of Medicine (NLM) has created a computer tool called NLM Scrubber. This program recognizes and deletes personal information from health records. The researchers who developed this program now need access to the original records. This will allow them to see how well the program removes personal information from patient records and how they can make it more accurate. Objectives: To find ways to improve clinical text de-identification. Eligibility: No new participants. Researchers will review data that have already been collected. Design: Researchers will collect a random sample of reports. These will be from different doctors in different fields. Researchers will manually remove personal information from the records. Researchers will also automatically remove personal information from original records using NLM-Scrubber. Researchers will compare the results of the computer program versus the manual changes. They will note when the program has not been removing personal information correctly. They will also note when the program has been deleting nonpersonal health information incorrectly. Researchers will use the results to revise the program. They will keep testing it until the de-identification process is complete.
| Status | Enrolling by invitation |
| Enrollment | 50000 |
| Est. completion date | January 31, 2027 |
| Est. primary completion date | January 31, 2027 |
| Accepts healthy volunteers | No |
| Gender | All |
| Age group | 1 Day and older |
| Eligibility | - No new participant enrollment. Researchers will review data that have already been collected. |
| Country | Name | City | State |
|---|---|---|---|
| United States | National Library of Medicine | Bethesda | Maryland |
| Lead Sponsor | Collaborator |
|---|---|
| National Library of Medicine (NLM) | National Cancer Institute (NCI), National Institutes of Health Clinical Center (CC) |
United States,
Kayaalp M, Browne AC, Callaghan FM, Dodd ZA, Divita G, Ozturk S, McDonald CJ. The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them. J Am Med Inform Assoc. 2014 May-Jun;21(3):423-31. doi: 10.1136/amiajnl-2013-001689. Epub 2013 Sep 11. — View Citation
Kayaalp M, Browne AC, Dodd ZA, Sagan P, McDonald CJ. De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports. AMIA Annu Symp Proc. 2014 Nov 14;2014:767-76. eCollection 2014. — View Citation
Kayaalp M. Patient Privacy in the Era of Big Data. Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13. — View Citation
| Type | Measure | Description | Time frame | Safety issue |
|---|---|---|---|---|
| Primary | The rate of de-identification of PII | HIPAA Privacy Rule defines 18 types of personally identifying information, that need to be de-identified, which include personal names, addresses, significant dates, numeric identifiers (such as social security number). Our annotators label those words and numbers creating a gold standard and NLM-Scrubber tries to recognize and eliminate all of them. The rate of de-identification of PII refers to success of this outcome measure. | 01/01/2017-01/31/2027 | |
| Secondary | The rate of erroneously redacted clinical information | While NLM-Scrubber tries to eliminate only PII elements while preserving non-identifying study data, it inadvertently deletes some of the non-identifying study data elements (non-protected health information) as well. The rate of erroneously redacted clinical information refers to the failure of NLM-Scrubber in preserving non-identifying health information. | 01/01/2017-01/31/2027 |