This tutorial describes how to obtain drug dosing information from unstructured clinical notes using Extract-Med module in the system.
EHR
package.This tutorial describes how to use the Extract-Med module to obtain drug dosing information from unstructured clinical notes. The Extract-Med module uses a natural language processing (NLP) system called medExtractR (see Choi et al.\(^{1}\) and Weeks et al.\(^{2}\) for details).
To begin we load the EHR
package, and the medExtractR
package.
The input to medExtractR is a clinical note like the one below.
Clinic Summary - Neurology
Appointment with **NAME[ZZZ, YYY XXX] on **DATE[Feb 05 2016] 12:30
Neurology Practice
**STREET-ADDRESS
**PLACE Research Bldg
**PLACE, TN, **ZIP-CODE
**PHONE
Vital Signs:
[**DATE[Feb 05 2016] 12:35]: Pulse: 94 bpm; BP: 128/91 mm Hg; O2Sat: 98 %
Health Problems (today or in the past):
Epilepsy
Neck pain
Headache [migrainous]
Medications:
lyrica 100mg prn twice daily
ltg 200 mg (1.5) daily
ltg xr 100 mg 3 in am, 2 in pm
Allergies:
penicillin (rash)
Clinical Instructions/Patient Education/Decision Aids:
Labs today
Significant Procedures:
2006: NECK SURGERY
Tobacco usage:
Patient has smoked in the past 12 months: Yes
Patient has smoked: Yes (Patient has smoked more than 100 cigarettes-5 packs)
Currently smokes: Heavy smoker (smokes more than 10 cigarettes or 1/2 pack per day or equivalent amount of pipe or cigar
tobacco)
Functional Status:
- Serious difficulty concentrating, remembering, or making decisions.
- Serious difficulty walking or climbing stairs.
- Difficulty doing errands alone.
Plan and Assessment:
Patient will continue taking Lamotrigine XR 300-200
Electronically Signed By: **[NAME XXX].
------------------------------------------------------------------------------------------------------------------------
Health Care Team:
- **NAME[CCC, XXX DDD] - Primary Care Physician
If we are interested in the medication Lamotrigine, we would need to extract three mentions from the above note:
The next section demonstrates how to use the extractMed
function to run the Extract-Med module using the example clinical note from above.
extractMed
The following arguments must be specified:
note_fn
: The file name of the note on which to run the system. This can be either a single file name (e.g., "clinical_note01.txt"
) or a vector or list of file names (e.g., c("clinical_note01.txt", "clinical_note02.txt")
or list("clinical_note01.txt", "clinical_note02.txt")
).
drugnames
: Names of the drugs for which we want to extract medication dosing information. This can include any way in which the drug name might be represented in the clinical note, such as generic name (e.g., "lamotrigine"
), brand name (e.g., "Lamictal"
), or an abbreviation (e.g., "LTG"
).
drgunit
: The unit of the drug(s) listed in drugnames
, for example "mg"
.
windowlength
: Length of the search window around each found drug name in which to search for dosing information. There is no default for this argument, requiring the user to carefully consider its value through tuning (see “1. EHR Vignette for Extract-Med and Pro-Med-NLP” for more information on tuning).
"Lamotrigine XR 300-200
Electronically Signed By: **[NAME XXX].
-------------------------------------------------------------------"
max_edit_dist
: The maximum edit distance allowed when identifying drugnames
. Maximum edit distance determines the difference between two strings, and is defined as the number of insertions, deletions, or substitutions required to change one string into the other. This allows us to capture misspellings in the drug names we are searching for, and its value should be carefully considered through tuning (see “1. EHR Vignette for Extract-Med and Pro-Med-NLP” for more information on tuning).
drugnames
. A value of 0 is always used for drug names with less than 5 characters regardless of the value set by max_edit_dist
.Below we show how we would run extractMed
using the example clinical note from the previous section.
mxr_out <- extractMed(note_fn = system.file("examples", "lampid1_2016-02-05_note5_1.txt", package = "EHR"),
drugnames = c("lamotrigine", "lamotrigine XR",
"lamictal", "lamictal XR",
"LTG", "LTG XR"),
drgunit = "mg",
windowlength = 130,
max_edit_dist = 1,
strength_sep="-")
running notes 1-1 in batch 1 of 1 (100%)
The additional argument, strength_sep
, allows users to specify special characters to separate doses administered at different times of day. For example, consider the drug mention “Lamotrigine XR 300-200” from our example clinical note. This indicates that the patient takes 300 mg of the drug in the morning and 200 mg in the evening. Setting strength_sep = c('-')
would allow extractMed
to identify 300 and 200 as “Dose” (i.e., dose given intake) since they are separated by the special character “-”. The default value is NULL
.
Another additional argument allowed in the extractMed
function is lastdose
. This is a logical input specifying whether or not the last dose time entity should be extracted. Default value is FALSE
. See “1. EHR Vignette for Extract-Med and Pro-Med-NLP” and the “Pro-Med-NLP Workshop” for more information on last dose.
extractMed
mxr_out
filename entity expr pos
1 lampid1_2016-02-05_note5_1.txt DrugName ltg 442:445
2 lampid1_2016-02-05_note5_1.txt Strength 200 mg 446:452
3 lampid1_2016-02-05_note5_1.txt DoseAmt 1.5 454:457
4 lampid1_2016-02-05_note5_1.txt Frequency daily 459:464
5 lampid1_2016-02-05_note5_1.txt DrugName ltg xr 465:471
6 lampid1_2016-02-05_note5_1.txt Strength 100 mg 472:478
7 lampid1_2016-02-05_note5_1.txt DoseAmt 3 479:480
8 lampid1_2016-02-05_note5_1.txt IntakeTime in am 481:486
9 lampid1_2016-02-05_note5_1.txt DoseAmt 2 488:489
10 lampid1_2016-02-05_note5_1.txt IntakeTime in pm 490:495
11 lampid1_2016-02-05_note5_1.txt DrugName Lamotrigine XR 1125:1139
12 lampid1_2016-02-05_note5_1.txt DoseStrength 300-200 1140:1147
The output from the Extract-Med module is a data.frame
with 4 columns:
filename
: The file name of the corresponding clinical note, to label results.entity
: The label of the entity for the extracted expression.expr
: Expression extracted from the clinical note.pos
: Position of the extracted expression in the note, in the format startPosition:stopPosition
In the above output, we see that all three lamotrigine mentions from our example clinical note have been extracted, and each expression has been assigned the appropriate entity label and position.
The output of extractMed
must be saved as a CSV file (see code below), the filename of which will serve as the first input to the Pro-Med-NLP module (see “Pro-Med-NLP Workshop”).
# save as csv file
write.csv(mxr_out, file='mxr_out.csv', row.names=FALSE)
Choi L, Beck C, McNeer E, Weeks HL, Williams ML, James NT, Niu X, Abou-Khalil BW, Birdwell KA, Roden DM, Stein CM. Development of a System for Post-marketing Population Pharmacokinetic and Pharmacodynamic Studies using Real-World Data from Electronic Health Records. Clinical Pharmacology & Therapeutics. 2020 Apr;107(4):934-43. doi: 10.1002/cpt.1787.
Weeks HL, Beck C, McNeer E, Williams ML, Bejan CA, Denny JC, Choi L. medExtractR: A targeted, customizable approach to medication extraction from electronic health records. Journal of the American Medical Informatics Association. 2020 Mar;27(3):407-18. doi: 10.1093/jamia/ocz207.
If you see mistakes or want to suggest changes, please create an issue on the source repository.