This tutorial describes how to process structured medication data, especially focusing on intravenously given dose data.
EHR
package.This tutorial describes how to use the Pro-Med-Str - Part I module in EHR2PKPD system to process intravenous (IV) dose data (see Choi et al.\(^{1}\) for details).
To begin we load the EHR
package,
We will use example dose data to demonstrate the Pro-Med-Str Part I module. The raw data is shown below.
mar.in0 <- read.csv(system.file("examples", "str_ex2","MAR_DATA.csv", package="EHR"), check.names = FALSE)
head(mar.in0)
Uniq.Id Date Time med:mDrug med:dosage med:route med:freq med:given
1 28579217 2017-02-04 19:15 Nicardipine 3 mcg/kg/min IV <NA> Given
2 28579217 2011-10-02 22:11 Famotidine 4.5 mg IV q12hrs Given
3 28579217 2011-10-02 20:17 Morphine sulfate 1 mg IV q2h prn Given
4 28579217 2011-10-03 02:28 Diphenhydramine injection 12 mg IV now Given
5 28579217 2011-10-02 22:11 Cefazolin 225 mg IV q8hrs Given
6 28579217 2011-10-02 23:30 Morphine sulfate 1 mg IV q2h prn Given
This data consists of a patient ID, date, time, medication name, dosage, route, frequency, and medication given.
The patient ID may need to be renamed so that all input datasets have the same name for the patient ID. This is necessary when combining the datasets to create a crosswalk between the original ID variables and the new ID variables used in the Pro-Med-Str Part I module. We demonstrate how to rename the patient ID variable below.
mar.new <- dataTransformation(mar.in0, rename = c('Uniq.Id' = 'subject_uid'))
In practice, the dose data will need to be combined with other input datasets. This process involves creating a crosswalk between original ID variables and new ID variables. The new ID variables that are required to be the same across all datasets are mod_id
, mod_visit
, and mod_id_visit
. See “2. EHR Vignette for Structured Data” of EHR
package and “Build-PK-IV - Comprehensive Workshop” for examples of this process.
For simplicity, we will skip this step in this tutorial.
We save our dataset as an RDS file using saveRDS
as shown below (CSV or RData is also acceptable data form). Here, we create a temporary directory to store the files using tempdir
. We define two directories, dataDir
for processed data and checkDir
containing files used for interactive checking (optional). Instead of using a temporary directory, dataDir
and checkDir
can be specific directories on your computer.
# define 2 directories
td <- tempdir()
dataDir <- file.path(td, 'data') # directory for processed data
dir.create(dataDir)
checkDir <- file.path(td, 'checks') # directory for interactive checking
dir.create(checkDir)
saveRDS(mar.new, file=file.path(dataDir,"mar_new.rds"))
The next section demonstrates how to use the run_MedStrI
function to run the Pro-Med-Str module.
run_MedStrI()
IV dose data can be in various forms, which may need to be pre-processed. IV dose data can be obtained from different data sources when using electronic health records (EHRs), but we illustrate this module using an example dataset (called “MAR”) that comes from one data source for simplicity (see “Build-PK-IV - Comprehensive Workshop” for a case when dose data are obtained from two data sources).
run_MedStrI()
is the main function to process IV dose data in Pro-Med-Str Part I module.
The module can be semi-interactive for data checking (although it is not required, we recommend using this feature). If check.path
is provided (the default is NULL), it generates several files to check potential data errors and get feedback from an investigator; otherwise, the interactive checking will not be performed. If corrected information (‘fix’ files) are provided, the module should be re-run to incorporate the corrections.
run_MedStrI()
can take two types of raw IV dose data (e.g., flow data and MAR data). In this tutorial, we describe arguments only relevant to processing one of these datasets, MAR data. A detailed description of all arguments can be found in the EHR
package manual of run_MedStrI()
.
mar.path
: file name of MAR data (CSV, RData, RDS), or data.framemar.columns
: a named list that should specify columns in MAR data; ‘id’, ‘datetime’ and ‘dose’ are required. ‘drug’, ‘weight’, ‘given’ may also be specified. ‘datetime’ is date and time for data measurement, which can refer to a single date-time variable (datetime = ‘date_time’) or two variables holding date and time separately (e.g., datetime = c(‘Date’, ‘Time’)). ‘dose’ can also be given as a single variable or two variables. If given as a single column, the column’s values should contain dose and units such as ‘25 mcg’. If given as two column names, the dose column should come before the unit column (e.g., dose = c(‘doseamt’, ‘unit’)). If ‘drug’ is present, the ‘medchk.path’ argument should also be provided, which contain a csv file with a list of drug names that should be used to subset the relevant drugs. The ‘given’ variable should be used in conjunction with the ‘medGivenReq’ argument.medGivenReq
: indicator if values in the MAR given column should equal “Given”; if this is FALSE (the default), NA values are also acceptable. This is a variable that flags whether the medication for inpatients is given as sometimes other dose instead of actual dose given can be specified for communications (e.g., a scheduled dose change). Depending on users’ EHR system and data types, we recommend confirming actual dose given and this argument can be useful for that purpose.medchk.path
: (optional) file name containing data set (CSV, RData, RDS), or data.frame; should have the column ‘medname’ with list of acceptable drug names used to filter MAR data.demo.list
: (optional) demographic information; if available, the output from ‘run_Demo’ or a correctly formatted data.frame; if provided, ‘weight’ is required in demo.columns
as it is used to impute weight when missing.demo.columns
: a named list that should specify columns in demographic data; ‘id’, ‘datetime’, and ‘weight’ are required.check.path
: (optional) file path where the generated files for data checking are stored, and the corresponding data files with fixed data exist. The default (NULL) will not produce any check files.All the following arguments are optional (i.e., default values can be used), but if the required data items for the dose data such as ‘unit’ are not provided, this function does not work.
failunit_fn
: filename stub for records with units other than those specified with infusion.unit
and bolus.unit
(default: ‘Unit’)infusion.unit
: string specifying units for infusion doses (default: ‘mcg/kg/hr’)bolus.unit
: string specifying units for bolus doses (default: ‘mcg’)bol.rate.thresh
: upper bound for retaining bolus doses. Bolus units with a rate above the threshold are dropped (default: Inf; i.e., keep all bolus doses)rateunit
: string specifying units for hourly rate (default: ‘mcg/hr’)ratewgtunit
: string specifying units for hourly rate by weight (default: ‘mcg/kg/hr’)weightunit
: string specifying units for weight (default: ‘kg’)drugname
: drug name of interest (e.g., dex, fent)Below we show how we would run run_MedStrI()
using the cleaned IV dose dataset, “mar_new.rds”, as input.
# define parameters
drugname <- 'fent'
ivdose.out <- run_MedStrI(
mar.path=file.path(dataDir,"mar_new.rds"),
mar.columns = list(id='subject_uid', datetime=c('Date','Time'), dose='med:dosage', drug='med:mDrug', given='med:given'),
medGivenReq = TRUE,
medchk.path = file.path(system.file("examples", "str_ex2", package="EHR"), sprintf('medChecked-%s.csv', drugname)),
demo.list = NULL,
demo.columns = list(),
check.path=checkDir,
failunit_fn = 'Unit',
infusion.unit = 'mcg/kg/hr',
bolus.unit = 'mcg',
bol.rate.thresh = Inf,
rateunit = "mcg/hr",
ratewgtunit = "mcg/kg/hr",
weightunit = "kg",
drugname = drugname
)
no units other than mcg/kg/hr or mcg, file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/failUnit-fent.csv not created
#########################
75 rows from 2 subjects with "kg" in infusion unit but missing weight, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/failNoWgt-fent.csv AND create /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fixNoWgt-fent.csv
#########################
#########################
censor dates created, please see /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/CensorTime-fent.csv
#########################
After run_MedStrI()
is executed, it provides messages such as those shown above, which contain some information about data checking.
run_MedStrI()
head(ivdose.out)
subject_uid date.dose infuse.time.real infuse.time infuse.dose bolus.time bolus.dose given.dose maxint weight
1 28579217 2011-10-02 <NA> <NA> NA 2011-10-02 15:35:00 25 NA 0 NA
2 28579217 2011-10-02 <NA> <NA> NA 2011-10-02 17:26:00 25 NA 0 NA
3 28579217 2017-02-04 <NA> <NA> NA 2017-02-04 16:15:00 50 NA 0 NA
4 28579217 2017-02-04 <NA> <NA> NA 2017-02-04 16:30:00 20 NA 0 NA
5 28579217 2017-02-04 <NA> <NA> NA 2017-02-04 20:57:00 20 NA 0 NA
6 34161967 2016-07-18 <NA> <NA> NA 2016-07-18 14:56:00 100 NA 0 NA
data.frame
with the following columns:
subject_uid
: the new id that will be used to merge datasets.date.dose
: dose given date.infuse.time.real
: infusion dose time recorded in the raw data.infuse.time
: infusion dose time processed (rounded time) based on continuous infusion time.infuse.dose
: infusion dose amount (e.g., calculated by rate*weight if the unit includes weight).bolus.time
: bolus dose time.bolus.dose
: bolus dose.given.dose
: the raw dose given is preserved from the flow sheet data when available (identified as “finalunits” in flow.columns).maxint
: infusion recording interval (e.g., default: maxint = 15 min for MAR data; maxint = 60 min for flow data). In a typical setting this variable should be unused but it may be specified as the ‘gap’ variable in the dose.columns
argument of the run_Build_PK_IV
function. The user may also modify this variable to set an appropriate recording interval. For more details review the run_Build_PK_IV
section of 2. EHR Vignette for Structured Data of the EHR
package.weight
: subject body weight recorded near infusion time, which will be used in dose calculation.This output is the input of run_Build_PK_IV()
function in Build-PK-IV module. In the above output, all variables except given.dose
are the necessary variables that should be reserved to use run_Build_PK_IV()
.
If you see mistakes or want to suggest changes, please create an issue on the source repository.