This tutorial describes how to obtain drug dosing information from unstructured clinical notes using Extract-Med module in the system.

Elizabeth McNeer, Hannah L. Weeks


This tutorial describes how to use the Extract-Med module to obtain drug dosing information from unstructured clinical notes. The Extract-Med module uses a natural language processing (NLP) system called medExtractR (see Choi et al.\(^{1}\) and Weeks et al.\(^{2}\) for details).

To begin we load the EHR package, and the medExtractR package.

# load EHR package and dependency

Unstructured Clinical Notes

The input to medExtractR is a clinical note like the one below.

                                               Clinic Summary - Neurology
Appointment with **NAME[ZZZ, YYY XXX] on **DATE[Feb 05 2016] 12:30
Neurology Practice
**PLACE Research Bldg
Vital Signs:
[**DATE[Feb 05 2016] 12:35]: Pulse: 94 bpm; BP: 128/91 mm Hg; O2Sat: 98 %
Health Problems (today or in the past):
Neck pain
Headache [migrainous]
lyrica 100mg prn twice daily
ltg 200 mg (1.5) daily
ltg xr 100 mg 3 in am, 2 in pm
penicillin (rash)
Clinical Instructions/Patient Education/Decision Aids:
Labs today
Significant Procedures:
Tobacco usage:
Patient has smoked in the past 12 months: Yes
Patient has smoked: Yes (Patient has smoked more than 100 cigarettes-5 packs)
Currently smokes: Heavy smoker (smokes more than 10 cigarettes or 1/2 pack per day or equivalent amount of pipe or cigar
Functional Status:
- Serious difficulty concentrating, remembering, or making decisions.
- Serious difficulty walking or climbing stairs.
- Difficulty doing errands alone.
Plan and Assessment:
Patient will continue taking Lamotrigine XR 300-200
Electronically Signed By: **[NAME XXX].
Health Care Team:
- **NAME[CCC, XXX DDD] - Primary Care Physician

If we are interested in the medication Lamotrigine, we would need to extract three mentions from the above note:

The next section demonstrates how to use the extractMed function to run the Extract-Med module using the example clinical note from above.

Running extractMed

The following arguments must be specified:

Below we show how we would run extractMed using the example clinical note from the previous section.

mxr_out <- extractMed(note_fn = system.file("examples", "lampid1_2016-02-05_note5_1.txt", package = "EHR"),
                       drugnames = c("lamotrigine", "lamotrigine XR", 
                                     "lamictal", "lamictal XR", 
                                     "LTG", "LTG XR"),
                       drgunit = "mg",
                       windowlength = 130,
                       max_edit_dist = 1,
running notes 1-1 in batch 1 of 1 (100%)

The additional argument, strength_sep, allows users to specify special characters to separate doses administered at different times of day. For example, consider the drug mention “Lamotrigine XR 300-200” from our example clinical note. This indicates that the patient takes 300 mg of the drug in the morning and 200 mg in the evening. Setting strength_sep = c('-') would allow extractMed to identify 300 and 200 as “Dose” (i.e., dose given intake) since they are separated by the special character “-”. The default value is NULL.

Another additional argument allowed in the extractMed function is lastdose. This is a logical input specifying whether or not the last dose time entity should be extracted. Default value is FALSE. See “1. EHR Vignette for Extract-Med and Pro-Med-NLP” and the “Pro-Med-NLP Workshop” for more information on last dose.

Output of extractMed

                         filename       entity           expr       pos
1  lampid1_2016-02-05_note5_1.txt     DrugName            ltg   442:445
2  lampid1_2016-02-05_note5_1.txt     Strength         200 mg   446:452
3  lampid1_2016-02-05_note5_1.txt      DoseAmt            1.5   454:457
4  lampid1_2016-02-05_note5_1.txt    Frequency          daily   459:464
5  lampid1_2016-02-05_note5_1.txt     DrugName         ltg xr   465:471
6  lampid1_2016-02-05_note5_1.txt     Strength         100 mg   472:478
7  lampid1_2016-02-05_note5_1.txt      DoseAmt              3   479:480
8  lampid1_2016-02-05_note5_1.txt   IntakeTime          in am   481:486
9  lampid1_2016-02-05_note5_1.txt      DoseAmt              2   488:489
10 lampid1_2016-02-05_note5_1.txt   IntakeTime          in pm   490:495
11 lampid1_2016-02-05_note5_1.txt     DrugName Lamotrigine XR 1125:1139
12 lampid1_2016-02-05_note5_1.txt DoseStrength        300-200 1140:1147

The output from the Extract-Med module is a data.frame with 4 columns:

In the above output, we see that all three lamotrigine mentions from our example clinical note have been extracted, and each expression has been assigned the appropriate entity label and position.

The output of extractMed must be saved as a CSV file (see code below), the filename of which will serve as the first input to the Pro-Med-NLP module (see “Pro-Med-NLP Workshop”).

# save as csv file
write.csv(mxr_out, file='mxr_out.csv', row.names=FALSE)


  1. Choi L, Beck C, McNeer E, Weeks HL, Williams ML, James NT, Niu X, Abou-Khalil BW, Birdwell KA, Roden DM, Stein CM. Development of a System for Post-marketing Population Pharmacokinetic and Pharmacodynamic Studies using Real-World Data from Electronic Health Records. Clinical Pharmacology & Therapeutics. 2020 Apr;107(4):934-43. doi: 10.1002/cpt.1787.

  2. Weeks HL, Beck C, McNeer E, Williams ML, Bejan CA, Denny JC, Choi L. medExtractR: A targeted, customizable approach to medication extraction from electronic health records. Journal of the American Medical Informatics Association. 2020 Mar;27(3):407-18. doi: 10.1093/jamia/ocz207.


If you see mistakes or want to suggest changes, please create an issue on the source repository.