See also “Example 1: Quick Data Building with Processed Datasets” in “2. EHR Vignette for Structured Data” of EHR package.

Introduction

This tutorial describes a simple pharmacokinetic (PK) data building procedure in EHR2PKPD for medications that are intravenously (IV) administered. It demonstrates how to build PK data using Build-PK-IV when cleaned data for concentration, drug dose, demographic and laboratory datasets are available in the appropriate data format. A comprehensive PK data building procedure with Build-PK-IV for IV medications that requires several data processing modules is described in Build-PK-IV - Comprehensive (see Choi et al.\(^{1}\) for details).

To begin we load the EHR. pkdata, and lubridate packages.

library(EHR)
library(pkdata)
library(lubridate)

Quick Data Building with Processed Datasets

There are three basic steps to build a PK dataset.

(1) Define directories

a directory for the raw data (rawDataDir in the example below)
a directory for interactive checking output files (checkDir in the example below)

rawDataDir <- system.file("examples", "str_ex1", package="EHR")
td <- tempdir()
checkDir <- file.path(td, 'check1')
dir.create(checkDir)

(2) Load cleaned and appropriately formatted data files

Four types of files are used in the Build-PK-IV module:

an IV dosing file
a drug concentration file
a demographic file
a laboratory file (optional)

ivdose.data <- read.csv(file.path(rawDataDir,"IVDose_DATA_simple.csv"),stringsAsFactors = FALSE)
head(ivdose.data, 3)

  patient_id  date.dose    infuse.time.real         infuse.time infuse.dose bolus.time bolus.dose given.dose maxint weight
1          1 2009-10-18 2009-10-18 11:35:00 2009-10-18 12:00:00         8.8       <NA>         NA          0     60    4.4
2          1 2009-10-18 2009-10-18 12:00:00 2009-10-18 12:00:00         8.8       <NA>         NA          0     60    4.4
3          1 2009-10-18 2009-10-18 13:00:00 2009-10-18 13:00:00         8.8       <NA>         NA          0     60    4.4

conc.data <- read.csv(file.path(rawDataDir,"Concentration_DATA_simple.csv"),stringsAsFactors = FALSE)
head(conc.data, 3)

  patient_id patient_visit_id event conc.level           date.time
1         10             10.1     4       0.17 2019-02-02 05:30:00
2         10             10.1     2       4.05 2019-02-24 14:00:00
3         10             10.1     3       0.64 2019-02-25 03:30:00

demo <- read.csv(file.path(rawDataDir,"Demographics_DATA_simple.csv"),stringsAsFactors = FALSE)
head(demo, 3)

  patient_id patient_visit_id gender weight height surgery_date ageatsurgery stat_sts cpb_sts date_icu_dc time_fromor
1          2              2.1      1  62.99 179.72    6/20/2015         6245        2      80   6/22/2015          NA
2          3              3.1      0   7.71  72.99   12/15/2018          574        3      67  12/16/2018          NA
3          4              4.1      1  12.00  92.02    1/12/2018         1214        1      70   1/13/2018          NA
  length_of_icu_stay surgery_date_time
1                  2              <NA>
2                  1              <NA>
3                  1              <NA>

creat.data <- read.csv(file.path(rawDataDir,"Creatinine_DATA_simple.csv"),stringsAsFactors = FALSE)
head(creat.data, 3)

  patient_id           date.time creat
1          2 2015-06-23 04:35:00  0.75
2          2 2015-06-22 04:00:00  0.69
3          2 2015-06-21 01:55:00  0.78

All of the datasets are in the appropriate data format and include the variable patient_id, a unique patient-level ID. The concentration and demographic files also contain a patient_visit_id variable, which is a unique visit-level ID.

(3) Build a final PK dataset with the function `run_Build_PK_IV()`

The following arguments are used in the run_Build_PK_IV function:

conc: drug concentration data
conc.columns: a named list that should specify columns in concentration data
dose: IV dose data
dose.columns: a named list that should specify columns in dose data
demo.list: demographic data; if provided, ‘id’ is required in demo.columns
demo.columns: a named list that should specify columns in demographic data
lab.list: laboratory data
lab.columns: a named list that should specify columns in lab data
check.path: (optional) file path where the generated files for data checking are stored, and the corresponding data files with fixed data exist

In this tutorial, we describe only arguments relevant to this example. A detailed description of all arguments can be found in the EHR package documentation for run_Build_PK_IV().

simple_pk_dat <- run_Build_PK_IV(
    conc=conc.data,
    conc.columns = list(id = 'patient_id', datetime = 'date.time', druglevel = 'conc.level', 
                        idvisit = 'patient_visit_id'),
    dose=ivdose.data,
    dose.columns = list(id = 'patient_id', date = 'date.dose', infuseDatetime = 'infuse.time', 
                        infuseDose = 'infuse.dose', infuseTimeExact= 'infuse.time.real', 
                        bolusDatetime = 'bolus.time', bolusDose = 'bolus.dose', 
                        gap = 'maxint', weight = 'weight'),
    demo.list = demo,
    demo.columns = list(id = 'patient_id', idvisit = 'patient_visit_id'),
    lab.list = list(creat.data),
    lab.columns = list(id = 'patient_id', datetime = 'date.time'),
    check.path=checkDir)

0 duplicated rows
The dimension of the PK data before merging with demographics: 149 x 9
The number of subjects in the PK data before merging with demographics: 10
The number of subjects in the demographic file, who meet the exclusion criteria: 0
check NA frequency in demographics, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/check1/-missing-demo.csv
Some demographic variables are missing and will be excluded: 
The list of final demographic variables: gender
weight
height
surgery_date
ageatsurgery
stat_sts
cpb_sts
date_icu_dc
time_fromor
length_of_icu_stay
surgery_date_time
weight_demo
Checked: there are no missing creat
The dimension of the final PK data exported with the key demographics: 149 x 20 with 10 distinct subjects (patient_id)

The run_Build_PK_IV() function generates an automatic message that provides information about the data processing and the final dataset including the variables, the sample size, and missingness.

Below we show the final PK dataset.

head(simple_pk_dat,15)

   patient_visit_id  time amt   dv rate mdv evid gender weight height surgery_date ageatsurgery stat_sts cpb_sts date_icu_dc
1               1.2  0.00  50   NA    0   1    1      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
2               1.2  0.75 100   NA    0   1    1      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
3               1.2  1.65 100   NA    0   1    1      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
4               1.2  1.77 250   NA    0   1    1      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
5               1.2  2.05 250   NA    0   1    1      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
6               1.2  3.72 250   NA    0   1    1      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
7               1.2  5.23 100   NA    0   1    1      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
8               1.2  6.25  NA 2.83   NA   0    0      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
9               1.2 20.68  NA 0.41   NA   0    0      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
10              1.2 70.90  NA 0.04   NA   0    0      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
11              1.2 95.25  NA 0.01   NA   0    0      1  25.04 114.39     1/8/2016         2295        2      79    1/9/2016
12              2.1  0.00 100   NA    0   1    1      1  62.99 179.72    6/20/2015         6245        2      80   6/22/2015
13              2.1  0.25 150   NA    0   1    1      1  62.99 179.72    6/20/2015         6245        2      80   6/22/2015
14              2.1  1.68  NA 0.78   NA   0    0      1  62.99 179.72    6/20/2015         6245        2      80   6/22/2015
15              3.1  0.00  25   NA    0   1    1      0   7.71  72.99   12/15/2018          574        3      67  12/16/2018
   time_fromor length_of_icu_stay   surgery_date_time weight_demo creat
1         2020                  1 2016-01-08 20:20:00       25.04  0.60
2         2020                  1 2016-01-08 20:20:00       25.04  0.60
3         2020                  1 2016-01-08 20:20:00       25.04  0.60
4         2020                  1 2016-01-08 20:20:00       25.04  0.60
5         2020                  1 2016-01-08 20:20:00       25.04  0.60
6         2020                  1 2016-01-08 20:20:00       25.04  0.60
7         2020                  1 2016-01-08 20:20:00       25.04  0.60
8         2020                  1 2016-01-08 20:20:00       25.04  0.60
9         2020                  1 2016-01-08 20:20:00       25.04  0.50
10        2020                  1 2016-01-08 20:20:00       25.04  0.54
11        2020                  1 2016-01-08 20:20:00       25.04  0.57
12          NA                  2                <NA>       62.99  0.71
13          NA                  2                <NA>       62.99  0.71
14          NA                  2                <NA>       62.99  0.71
15          NA                  1                <NA>        7.71  0.58

This dataset includes the patient_visit_id variable and standard NONMEM formatted variables (for details of data items, see PK-Data-IV-Dosing).

time - time of dosing or concentration event
amt - dose amount administered (NA for concentration records)
dv - dependent variable; i.e., observed concentration (NA for dosing records)
rate - rate of drug administration (e.g., rate=0 for bolus doses)
mdv - missing dependent variable (dv) indicator (e.g., 0 = not missing dv, 1 = missing dv)
evid - event ID (e.g., 0 = observation, 1 = dose event)

If demographic data is provided, the demographic variables will also be included.

Example with only two datasets

The run_Build_PK_IV() can also be used with only two datasets:

an IV dosing file
a file with drug concentration, laboratory values, demographics, etc.

To illustrate this, we generate an example dataset with concentration, laboratory values, and demographics combined, which is shown below.

mrg0 <- merge(conc.data,creat.data,by=c('patient_id','date.time'),all=TRUE)
mrg1 <- merge(mrg0,demo,by=c('patient_id','patient_visit_id'),all=TRUE)
conc.combined <- mrg1[!is.na(mrg1$conc.level),]

head(conc.combined,3)

  patient_id patient_visit_id           date.time event conc.level creat gender weight height surgery_date ageatsurgery stat_sts
2          1              1.2 2016-01-12 06:45:00     5       0.01  0.57      1  25.04 114.39     1/8/2016         2295        2
3          1              1.2 2016-01-11 06:24:00     4       0.04  0.54      1  25.04 114.39     1/8/2016         2295        2
4          1              1.2 2016-01-09 04:11:00     3       0.41  0.50      1  25.04 114.39     1/8/2016         2295        2
  cpb_sts date_icu_dc time_fromor length_of_icu_stay   surgery_date_time
2      79    1/9/2016        2020                  1 2016-01-08 20:20:00
3      79    1/9/2016        2020                  1 2016-01-08 20:20:00
4      79    1/9/2016        2020                  1 2016-01-08 20:20:00

The following arguments are used in the run_Build_PK_IV function:

conc: drug concentration along with other data (laboratory, demographic, etc.)
conc.columns: a named list that should specify columns in concentration data
dose: IV dose data
dose.columns: a named list that should specify columns in dose data
pk.vars: variables to include in the returned PK data. If pk.vars includes ‘date’, the output generates its original date-time to which the ‘time’ is mapped. Users can use pk.vars to include variables for demographics or laboratory values that are already merged with the concentration dataset.

simple_pk_dat2 <- run_Build_PK_IV(
    conc = conc.combined,
    conc.columns = list(id='patient_id', datetime='date.time', druglevel='conc.level'),
    dose = ivdose.data,
    dose.columns = list(id='patient_id', date = 'date.dose', infuseDatetime = 'infuse.time',
                        infuseDose = 'infuse.dose', infuseTimeExact ='infuse.time.real',
                        bolusDatetime = 'bolus.time', bolusDose = 'bolus.dose',
                        gap = 'maxint', weight = 'weight'),
    pk.vars = c('date','weight','height','ageatsurgery','creat',
                'stat_sts','cpb_sts','length_of_icu_stay'))

0 duplicated rows
The dimension of the final PK data: 149 x 15 with 10 distinct subjects (patient_id)

# the final PK dataset
head(simple_pk_dat2,15)

   patient_id  time amt   dv rate mdv evid              date weight height ageatsurgery creat stat_sts cpb_sts length_of_icu_stay
1           1  0.00  50   NA    0   1    1 01/08/16 07:30:00  25.04 114.39         2295  0.60        2      79                  1
2           1  0.75 100   NA    0   1    1 01/08/16 08:15:00  25.04 114.39         2295  0.60        2      79                  1
3           1  1.65 100   NA    0   1    1 01/08/16 09:09:00  25.04 114.39         2295  0.60        2      79                  1
4           1  1.77 250   NA    0   1    1 01/08/16 09:16:00  25.04 114.39         2295  0.60        2      79                  1
5           1  2.05 250   NA    0   1    1 01/08/16 09:33:00  25.04 114.39         2295  0.60        2      79                  1
6           1  3.72 250   NA    0   1    1 01/08/16 11:13:00  25.04 114.39         2295  0.60        2      79                  1
7           1  5.23 100   NA    0   1    1 01/08/16 12:44:00  25.04 114.39         2295  0.60        2      79                  1
8           1  6.25  NA 2.83   NA   0    0 01/08/16 13:45:00  25.04 114.39         2295  0.60        2      79                  1
9           1 20.68  NA 0.41   NA   0    0 01/09/16 04:11:00  25.04 114.39         2295  0.50        2      79                  1
10          1 70.90  NA 0.04   NA   0    0 01/11/16 06:24:00  25.04 114.39         2295  0.54        2      79                  1
11          1 95.25  NA 0.01   NA   0    0 01/12/16 06:45:00  25.04 114.39         2295  0.57        2      79                  1
12          2  0.00 100   NA    0   1    1 06/14/15 13:30:00  62.99 179.72         6245  0.71        2      80                  2
13          2  0.25 150   NA    0   1    1 06/14/15 13:45:00  62.99 179.72         6245  0.71        2      80                  2
14          2  1.68  NA 0.78   NA   0    0 06/14/15 15:11:00  62.99 179.72         6245  0.71        2      80                  2
15          3  0.00  25   NA    0   1    1 12/15/18 08:00:00   7.71  72.99          574  0.44        3      67                  1

References

Choi L, Beck C, McNeer E, Weeks HL, Williams ML, James NT, Niu X, Abou-Khalil BW, Birdwell KA, Roden DM, Stein CM. Development of a System for Post-marketing Population Pharmacokinetic and Pharmacodynamic Studies using Real-World Data from Electronic Health Records. Clinical Pharmacology & Therapeutics. 2020 Apr;107(4):934-43. doi: 10.1002/cpt.1787.

Build-PK-IV - Simple

Introduction

Quick Data Building with Processed Datasets

(1) Define directories

(2) Load cleaned and appropriately formatted data files

(3) Build a final PK dataset with the function `run_Build_PK_IV()`

Example with only two datasets

References

Corrections

Build-PK-IV - Simple

Introduction

Quick Data Building with Processed Datasets

(1) Define directories

(2) Load cleaned and appropriately formatted data files

(3) Build a final PK dataset with the function run_Build_PK_IV()

Example with only two datasets

References

Corrections

(3) Build a final PK dataset with the function `run_Build_PK_IV()`