See also “Example 2: Complete Data Processing and Building from Raw Extracted Data to PK Data” in “2. EHR Vignette for Structured Data” of EHR package.

Introduction

This tutorial describes four modules for processing data (Pro-Demographic, Pro-Med-Str, Pro-Drug Level, Pro-Laboratory) and one module for PK data building (Build-PK-IV) using data extracted from a structured database.

To begin we load the EHR package, the pkdata package, and the lubridate package.

# load EHR package and dependencies
library(EHR)
library(pkdata)
library(lubridate)

We first define three directories:
- one for raw structured data
- one containing files used for interactive checking
- one for processed data.
There are 4 types of raw data expected to exist in the raw data directory (i.e., rawDataDir below):
- a demographic file for use with the Pro-Demographic module (Demographics_DATA.csv)
- two files for the Pro-Drug Level module (SampleTimes_DATA.csv and SampleConcentration_DATA.csv)
- two dosing files for the Pro-Med-Str module (FLOW_DATA.csv and MAR_DATA.csv)
- two lab files for use with the Pro-Laboratory module (Creatinine_DATA.csv and Albumin_DATA.csv).

# define 3 directories
rawDataDir <- system.file("examples", "str_ex2", package="EHR") # directory for raw data

td <- tempdir()
checkDir <- file.path(td, 'checks') # directory for interactive checking
dir.create(checkDir)

dataDir <- file.path(td, 'data') # directory for processed data
dir.create(dataDir)

# examine raw data files in rawDataDir
dir(rawDataDir)

[1] "Albumin_DATA.csv"             "Creatinine_DATA.csv"          "Demographics_DATA.csv"        "e-rx_DATA.csv"               
[5] "FLOW_DATA.csv"                "MAR_DATA.csv"                 "medChecked-fent.csv"          "SampleConcentration_DATA.csv"
[9] "SampleTimes_DATA.csv"

Pre-Processing for Raw Extracted Data

The raw datasets must go through a pre-processing stage which creates new ID variables and datasets that can be used by the data processing modules. There are three pre-processing steps:

read and clean raw data
merge raw data to create new ID variables
make new data for use with modules.

Each raw dataset should contain a subject unique ID, a subject visit ID, or both ids. In this example the subject unique ID is called subject_uid and the subject visit ID is called subject_id. The subject visit ID is a combination of subject and visit/course – e.g., subject_id 14.0 is the first course for subject 14, subject_id 14.1 is the second course for subject 14, and so on. subject_uid is a unique ID that is the same for all subject records. The integer part of subject_id has a 1-to-1 correspondence with subject_uid – for this example, subject_uid 62734832 is associated with both subject_id 14.0 and subject_id 14.1. If there is only a single visit/course per subject only the subject unique ID is needed.

(1) Read and clean raw data

readTransform(): This function reads in a CSV file and makes optional modifications to the resulting dataframe.
Demographics raw data
- The example demographics data file contains ID variables subject_id and subject_uid, in addition to demographic variables such as gender, date of birth, height, weight, etc. As subject_id and subject_uid already exist, no further cleaning is needed.
- The Demographics_DATA.csv file is read in using the readTransform() function.

# demographics data
demo.in <- readTransform(file.path(rawDataDir, "Demographics_DATA.csv"))
head(demo.in)

  subject_id subject_uid gender weight height surgery_date ageatsurgery stat_sts cpb_sts in_hospital_mortality add_ecmo date_icu_dc
1       1106    34364670      0   5.14  59.18    6/28/2014          141        3     133                     0        0    7/2/2014
2       1444    36792472      1   5.67  62.90    1/10/2016          292        1      65                     0        0   1/12/2016
3       1465    36292449      0  23.67 118.02    3/19/2016         2591        2     357                     0        0   3/20/2016
4       1520    34161967      0  14.07  97.04    7/18/2016         1320        5      93                     0        0   7/19/2016
5       1524    37857374      1  23.40 102.80    7/23/2016         1561        3      87                     1        0   7/30/2016
6       1550    37826262      1   6.21  62.03     9/4/2016          208        1     203                     0        0   9/11/2016
  time_fromor
1        1657
2        1325
3          NA
4        1745
5        1847
6        1210

Concentration raw data
- The example concentration data consists of two files:
  - 1. SampleTimes_DATA.csv: contains the concentration sampling times
  - 1. SampleConcentration_DATA.csv: contains the concentration measurements
- If all concentration data is in one file, the user should transform the file so it contains a subject unique ID, a subject visit ID, or both ids.
- Use the function readTransform()
  - to read SampleTimes_DATA.csv, and rename the variable Study.ID to subject_id and create a new variable called samp, which indexes the sample number, using the modify= argument.
  - to read SampleConcentration_DATA.csv, and transform the concentration values - we use the helper function sampId() to process the subject_id field.

# read SampleTimes_DATA.csv
samp.in <- readTransform(file.path(rawDataDir, "SampleTimes_DATA.csv"),
    rename = c('Study.ID' = 'subject_id'),
    modify = list(samp = expression(as.numeric(sub('Sample ', '', Event.Name)))))
head(samp.in)

  subject_id Event.Name Sample.Collection.Date.and.Time samp
1      466.1   Sample 1                  2/3/2017 10:46    1
2      466.1   Sample 2                  2/4/2017 20:30    2
3     1106.0   Sample 1                 6/28/2014 13:40    1
4     1106.0   Sample 2                 6/29/2014 03:10    2
5     1106.0   Sample 3                 6/30/2014 03:35    3
6     1106.0   Sample 4                  7/1/2014 03:45    4

# helper function used to make subject_id
sampId <- function(x) {
  # remove leading zeroes or trailing periods
  subid <- gsub('(^0*|\\.$)', '', x)
  # change _ to .
  gsub('_([0-9]+[_].*)$', '.\\1', subid)
}

# read SampleConcentration_DATA.csv
conc.in <- readTransform(file.path(rawDataDir, "SampleConcentration_DATA.csv"),
  modify = list(
    subid = expression(sampId(name)),
    subject_id = expression(as.numeric(sub('[_].*', '', subid))),
    samp = expression(sub('[^_]*[_]', '', subid)),
    name = NULL,
    data_file = NULL,
    subid = NULL
    )
  )
head(conc.in)

  record_id fentanyl_calc_conc subject_id samp
1         1         0.01413622      466.1    1
2         2         0.27982075      466.1    2
3         3         6.11873679     1106.0    1
4         4         0.59161716     1106.0    2
5         5         0.11280471     1106.0    3
6         6         0.02112153     1106.0    4

Dosing raw data
- The example drug dosing data consists of two files containing two sources of IV dose information:
  - 1. FLOW_DATA.csv: contains aliases for both ID variables, and it is read in with the readTransform() function which renames the variables Subject.Id to subject_id and Subject.Uniq.Id to subject_uid.
  - 1. MAR_DATA.csv: contains several variables with a colon (:) character. To preserve the colon in these variable names, the data can be read in without checking for syntactically valid R variable names. The data is read in using read.csv() with the argument check.names = FALSE and then passed to the dataTransformation() function which renames Uniq.Id to subject_uid.
- If all dosing data is in one file, the user should transform the file so it contains a subject unique ID, a subject visit ID, or both ids.

# FLOW dosing data
flow.in <- readTransform(file.path(rawDataDir, "FLOW_DATA.csv"),
                         rename = c('Subject.Id' = 'subject_id',
                                    'Subject.Uniq.Id' = 'subject_uid')) 
# pre-process the flow data 
# date.time variable should be in an appropriate form
flow.in[,'date.time'] <- pkdata::parse_dates(EHR:::fixDates(flow.in[,'Perform.Date']))
# unit and rate are required: separate unit and rate from 'Final.Rate..NFR.units.' if needed
flow.in[,'unit'] <- sub('.*[ ]', '', flow.in[,'Final.Rate..NFR.units.'])
flow.in[,'rate'] <- as.numeric(sub('([0-9.]+).*', '\\1', flow.in[,'Final.Rate..NFR.units.']))
head(flow.in)

  subject_id subject_uid     Perform.Date FOCUS_MEDNAME Final.Wt..kg. Final.Rate..NFR.units. Final.Units Flow           date.time      unit
1       1596    38340814   12/4/2016 5:30      Fentanyl          6.75            1 mcg/kg/hr       3.375   NA 2016-12-04 05:30:00 mcg/kg/hr
2       1596    38340814   12/4/2016 6:00      Fentanyl          6.75            1 mcg/kg/hr       6.750  0.1 2016-12-04 06:00:00 mcg/kg/hr
3       1596    38340814   12/4/2016 7:00      Fentanyl          6.75            1 mcg/kg/hr       4.500  0.1 2016-12-04 07:00:00 mcg/kg/hr
4       1596    38340814   12/4/2016 7:40      Fentanyl          6.75            0 mcg/kg/hr       0.000   NA 2016-12-04 07:40:00 mcg/kg/hr
5       1607    38551767 12/24/2016 19:30      Fentanyl          2.60            2 mcg/kg/hr       2.600   NA 2016-12-24 19:30:00 mcg/kg/hr
6       1607    38551767 12/24/2016 20:00      Fentanyl          2.60            2 mcg/kg/hr       5.200  0.2 2016-12-24 20:00:00 mcg/kg/hr
  rate
1    1
2    1
3    1
4    0
5    2
6    2

# MAR dosing data
mar.in0 <- read.csv(file.path(rawDataDir, "MAR_DATA.csv"), check.names = FALSE)
mar.in <- dataTransformation(mar.in0, rename = c('Uniq.Id' = 'subject_uid'))
head(mar.in)

  subject_uid       Date  Time                 med:mDrug   med:dosage med:route med:freq med:given
1    28579217 2017-02-04 19:15               Nicardipine 3 mcg/kg/min        IV     <NA>     Given
2    28579217 2011-10-02 22:11                Famotidine       4.5 mg        IV   q12hrs     Given
3    28579217 2011-10-02 20:17          Morphine sulfate         1 mg        IV  q2h prn     Given
4    28579217 2011-10-03 02:28 Diphenhydramine injection        12 mg        IV      now     Given
5    28579217 2011-10-02 22:11                 Cefazolin       225 mg        IV    q8hrs     Given
6    28579217 2011-10-02 23:30          Morphine sulfate         1 mg        IV  q2h prn     Given

Laboratory raw data
- The example laboratory data consists of files two files, Creatinine_DATA.csv and Albumin_DATA.csv. Both files are read in using the readTransform() function and Subject.uniq is renamed to subject_uid.
- Each laboratory file should be transformed so it contains a subject unique ID, a subject visit ID, or both ids.

# Serum creatinine lab data
creat.in <- readTransform(file.path(rawDataDir, "Creatinine_DATA.csv"),
    rename = c('Subject.uniq' = 'subject_uid'))
head(creat.in)

  subject_uid     date time creat
1    28579217 02/05/17 4:00  0.52
2    28579217 02/06/17 5:00  0.53
3    28579217 10/03/11 4:28  0.42
4    28579217 10/04/11 4:15  0.35
5    28579217 10/06/11 4:25  0.29
6    28579217 10/09/11 4:45  0.28

# Albumin lab data
alb.in <- readTransform(file.path(rawDataDir, "Albumin_DATA.csv"),
    rename = c('Subject.uniq' = 'subject_uid'))
head(creat.in)

  subject_uid     date time creat
1    28579217 02/05/17 4:00  0.52
2    28579217 02/06/17 5:00  0.53
3    28579217 10/03/11 4:28  0.42
4    28579217 10/04/11 4:15  0.35
5    28579217 10/06/11 4:25  0.29
6    28579217 10/09/11 4:45  0.28

(2) Merge data to create new ID variables

idCrosswalk(): This function merges all of the cleaned input datasets and creates new IDs.
- Input:
  - the data= argument of this function accepts a list of input datasets
  - the idcols= argument accepts a list of vectors or character strings that identify the ID variables in the corresponding input dataset.
- Output:
  - a crosswalk dataset between the original ID variables (subject_id, subject_uid) and the new ID variables (mod_id, mod_visit, and mod_id_visit).
  - the new variable mod_id_visit has a 1-to-1 correspondence to variable subject_id and uniquely identifies each subjects’ visit/course; the new variable mod_id has a 1-to-1 correspondence to variable subject_uid and uniquely identifies each subject.

# define list of input datasets
data <-  list(demo.in,
              samp.in,
              conc.in,
              flow.in,
              mar.in,
              creat.in,
              alb.in)

# define list of vectors or character strings that identify the ID variables
idcols <-  list(c('subject_id', 'subject_uid'), # id vars in demo.in
                'subject_id', # id var in samp.in
                'subject_id', # id var in conc.in
                c('subject_id', 'subject_uid'), # id vars in flow.in
                'subject_uid', # id var in mar.in
                'subject_uid', # id var in creat.in
                'subject_uid') # id var in creat.in

# merge all IDs from cleaned datasets and create new ID variables
id.xwalk <- idCrosswalk(data, idcols, visit.id="subject_id", uniq.id="subject_uid")
saveRDS(id.xwalk, file=file.path(dataDir,"module_id_xwalk.rds"))
head(id.xwalk)

  subject_id subject_uid mod_visit mod_id mod_id_visit
1      466.0    28579217         1      1          1.1
2      466.1    28579217         2      1          1.2
3     1106.0    34364670         1      2          2.1
4     1444.0    36792472         1      3          3.1
5     1465.0    36292449         1      4          4.1
6     1520.0    34161967         1      5          5.1

(3) Make new data for use with modules

pullFakeId(data, id.xwalk, firstCols = NULL, orderBy = NULL)

pullFakeId(): This function replaces the original IDs – subject_id and subject_uid – with new IDs – mod_id, mod_visit, and mod_id_visit – to create datasets which can be used by the data processing modules.
- The dat= argument should contain the cleaned input data.frame from pre-processing step (1).
- The xwalk= argument should contain the crosswalk data.frame produced in step (2).
- Additional arguments firstCols= and orderBy= control which variables are in the first columns of the output and the sort order, respectively.
- The cleaned, structured data are saved as R objects for use with the modules.

## demographics data
demo.cln <- pullFakeId(demo.in, id.xwalk,
    firstCols = c('mod_id', 'mod_visit', 'mod_id_visit'),
    uniq.id = 'subject_uid')
head(demo.cln)

  mod_id mod_visit mod_id_visit gender weight height surgery_date ageatsurgery stat_sts cpb_sts in_hospital_mortality add_ecmo date_icu_dc
1      2         1          2.1      0   5.14  59.18    6/28/2014          141        3     133                     0        0    7/2/2014
2      3         1          3.1      1   5.67  62.90    1/10/2016          292        1      65                     0        0   1/12/2016
3      4         1          4.1      0  23.67 118.02    3/19/2016         2591        2     357                     0        0   3/20/2016
4      5         1          5.1      0  14.07  97.04    7/18/2016         1320        5      93                     0        0   7/19/2016
5      6         1          6.1      1  23.40 102.80    7/23/2016         1561        3      87                     1        0   7/30/2016
6      7         1          7.1      1   6.21  62.03     9/4/2016          208        1     203                     0        0   9/11/2016
  time_fromor
1        1657
2        1325
3          NA
4        1745
5        1847
6        1210

saveRDS(demo.cln, file=file.path(dataDir,"demo_mod_id.rds"))

## drug level data
# sampling times
samp.cln <- pullFakeId(samp.in, id.xwalk,
    firstCols = c('mod_id', 'mod_visit', 'mod_id_visit', 'samp'), 
    orderBy = c('mod_id_visit','samp'),
    uniq.id = 'subject_uid')
head(samp.cln)

  mod_id mod_visit mod_id_visit samp Event.Name Sample.Collection.Date.and.Time
1      1         2          1.2    1   Sample 1                  2/3/2017 10:46
2      1         2          1.2    2   Sample 2                  2/4/2017 20:30
3     10         1         10.1    1   Sample 1                12/23/2016 05:15
4     10         1         10.1    2   Sample 2                12/24/2016 18:00
5     10         1         10.1    3   Sample 3                12/25/2016 03:00
6     10         1         10.1    4   Sample 4                12/26/2016 04:00

saveRDS(samp.cln, file=file.path(dataDir,"samp_mod_id.rds"))

# drug concentration measurements
conc.cln <- pullFakeId(conc.in, id.xwalk,
    firstCols = c('record_id', 'mod_id', 'mod_visit', 'mod_id_visit', 'samp'),
    orderBy = 'record_id',
    uniq.id = 'subject_uid')
head(conc.cln)

  record_id mod_id mod_visit mod_id_visit samp fentanyl_calc_conc
1         1      1         2          1.2    1         0.01413622
2         2      1         2          1.2    2         0.27982075
3         3      2         1          2.1    1         6.11873679
4         4      2         1          2.1    2         0.59161716
5         5      2         1          2.1    3         0.11280471
6         6      2         1          2.1    4         0.02112153

saveRDS(conc.cln, file=file.path(dataDir,"conc_mod_id.rds"))

## dosing data
# flow
flow.cln <- pullFakeId(flow.in, id.xwalk,
    firstCols = c('mod_id', 'mod_visit', 'mod_id_visit'),
    uniq.id = 'subject_uid')
head(flow.cln)

  mod_id mod_visit mod_id_visit     Perform.Date FOCUS_MEDNAME Final.Wt..kg. Final.Rate..NFR.units. Final.Units Flow           date.time
1      9         1          9.1   12/4/2016 5:30      Fentanyl          6.75            1 mcg/kg/hr       3.375   NA 2016-12-04 05:30:00
2      9         1          9.1   12/4/2016 6:00      Fentanyl          6.75            1 mcg/kg/hr       6.750  0.1 2016-12-04 06:00:00
3      9         1          9.1   12/4/2016 7:00      Fentanyl          6.75            1 mcg/kg/hr       4.500  0.1 2016-12-04 07:00:00
4      9         1          9.1   12/4/2016 7:40      Fentanyl          6.75            0 mcg/kg/hr       0.000   NA 2016-12-04 07:40:00
5     10         1         10.1 12/24/2016 19:30      Fentanyl          2.60            2 mcg/kg/hr       2.600   NA 2016-12-24 19:30:00
6     10         1         10.1 12/24/2016 20:00      Fentanyl          2.60            2 mcg/kg/hr       5.200  0.2 2016-12-24 20:00:00
       unit rate
1 mcg/kg/hr    1
2 mcg/kg/hr    1
3 mcg/kg/hr    1
4 mcg/kg/hr    0
5 mcg/kg/hr    2
6 mcg/kg/hr    2

saveRDS(flow.cln, file=file.path(dataDir,"flow_mod_id.rds"))

# mar
mar.cln <- pullFakeId(mar.in, id.xwalk, firstCols = 'mod_id', uniq.id = 'subject_uid')
head(mar.cln)

  mod_id       Date  Time                 med:mDrug   med:dosage med:route med:freq med:given
1      1 2017-02-04 19:15               Nicardipine 3 mcg/kg/min        IV     <NA>     Given
2      1 2011-10-02 22:11                Famotidine       4.5 mg        IV   q12hrs     Given
3      1 2011-10-02 20:17          Morphine sulfate         1 mg        IV  q2h prn     Given
4      1 2011-10-03 02:28 Diphenhydramine injection        12 mg        IV      now     Given
5      1 2011-10-02 22:11                 Cefazolin       225 mg        IV    q8hrs     Given
6      1 2011-10-02 23:30          Morphine sulfate         1 mg        IV  q2h prn     Given

saveRDS(mar.cln, file=file.path(dataDir,"mar_mod_id.rds"))

## laboratory data
# creatinine
creat.cln <- pullFakeId(creat.in, id.xwalk, 'mod_id',uniq.id = 'subject_uid')
head(creat.cln)

  mod_id     date time creat
1      1 02/05/17 4:00  0.52
2      1 02/06/17 5:00  0.53
3      1 10/03/11 4:28  0.42
4      1 10/04/11 4:15  0.35
5      1 10/06/11 4:25  0.29
6      1 10/09/11 4:45  0.28

saveRDS(creat.cln, file=file.path(dataDir,"creat_mod_id.rds"))

# albumin
alb.cln <- pullFakeId(alb.in, id.xwalk, 'mod_id', uniq.id = 'subject_uid')
head(alb.cln)

  mod_id     date  time alb
1      8 07/30/20  5:23 2.9
2      8 07/28/20  3:12 2.0
3      8 07/29/20  1:39 2.7
4      8 08/21/20 10:35 4.1
5      4 06/13/15 17:20 4.1
6      6 07/25/16  8:35 2.3

saveRDS(alb.cln, file=file.path(dataDir,"alb_mod_id.rds"))

Options and parameters: Before running the processing modules, it is necessary to define several options and parameters.
- Using options(pkxwalk =) allows the modules to access the crosswalk file.
- Create a drugname stub.
- Define the lower limit of quantification (LLOQ) for the drug concentration if applicable.

# set crosswalk option 
xwalk <- readRDS(file.path(dataDir, "module_id_xwalk.rds"))
options(pkxwalk = 'xwalk')

# define parameters
drugname <- 'fent'
LLOQ <- 0.05

Pro-Demographic

This module accepts the cleaned structured demographic dataset and a user-defined set of exclusion criteria and returns a formatted list with the demographic data and records meeting the exclusion criteria suitable for integration with the other modules.
For this example, we exclude subjects with a value of 1 for in_hospital_mortality or add_ecmo and create a new variable called length_of_icu_stay.
run_Demo() is the function to run this module.

# helper function
exclude_val <- function(x, val=1) { !is.na(x) & x == val }

demo.out <- run_Demo(demo.path = file.path(dataDir, "demo_mod_id.rds"),
    demo.columns = list(id = 'mod_id_visit'),
    toexclude = expression(exclude_val(in_hospital_mortality) | exclude_val(add_ecmo)),
    demo.mod.list = list(length_of_icu_stay = 
                        expression(daysDiff(surgery_date, date_icu_dc))))

The number of subjects in the demographic data, who meet the exclusion criteria: 2

head(demo.out$demo)

  mod_id mod_visit mod_id_visit gender weight height surgery_date ageatsurgery stat_sts cpb_sts in_hospital_mortality add_ecmo date_icu_dc
1      2         1          2.1      0   5.14  59.18    6/28/2014          141        3     133                     0        0    7/2/2014
2      3         1          3.1      1   5.67  62.90    1/10/2016          292        1      65                     0        0   1/12/2016
3      4         1          4.1      0  23.67 118.02    3/19/2016         2591        2     357                     0        0   3/20/2016
4      5         1          5.1      0  14.07  97.04    7/18/2016         1320        5      93                     0        0   7/19/2016
5      6         1          6.1      1  23.40 102.80    7/23/2016         1561        3      87                     1        0   7/30/2016
6      7         1          7.1      1   6.21  62.03     9/4/2016          208        1     203                     0        0   9/11/2016
  time_fromor length_of_icu_stay
1        1657                  4
2        1325                  2
3          NA                  1
4        1745                  1
5        1847                  7
6        1210                  7

demo.out$exclude

[1] "6.1"  "13.1"

Pro-Med-Str Part I: IV dose data

This module processes structured medication data. Only Part I which handles IV dose data is described here. For processing structure e-prescription medication data, see Pro-Med-Str - Part II.
The IV dose data comes from two sources:
- Flow data: patient flow sheets which at this institution record infusion rates and changes to all infusions for all inpatients outside of the operating room.
- Medication Administration Records (MAR) data: This data record all bolus doses of medications and infusions administered in the operating room.
The module is semi-interactive – it generates several files to check potential data errors and get feedback from an investigator. If corrected information (‘fix’ files) are provided, the module should be re-run to incorporate the corrections.
run_MedStrI() is the function to process IV dose data.

ivdose.out <- run_MedStrI(
    mar.path=file.path(dataDir,"mar_mod_id.rds"),
    mar.columns = list(id='mod_id', datetime=c('Date','Time'), dose='med:dosage', drug='med:mDrug', given='med:given'),
    medGivenReq = TRUE,
    flow.path=file.path(dataDir,"flow_mod_id.rds"),
    flow.columns = list(id = 'mod_id', datetime = 'date.time', finalunits = 'Final.Units', 
                        unit = 'unit', rate = 'rate', weight = 'Final.Wt..kg.'),
    medchk.path=file.path(system.file("examples", "str_ex2", package="EHR"), sprintf('medChecked-%s.csv', drugname)),
    demo.list = NULL,
    demo.columns = list(),
    missing.wgt.path = NULL,
    wgt.columns = list(),
    check.path = checkDir,
    failflow_fn = 'FailFlow',
    failunit_fn = 'Unit',
    failnowgt_fn = 'NoWgt',
    infusion.unit = 'mcg/kg/hr',
    bolus.unit = 'mcg',
    bol.rate.thresh = Inf,
    rateunit = 'mcg/hr',
    ratewgtunit = 'mcg/kg/hr',
    weightunit = 'kg',
    drugname = drugname)

The number of rows in the original data                124
The number of rows after removing the duplicates       124
no units other than mcg/kg/hr or mcg, file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/failUnit-fent.csv not created
#########################
33 rows from 1 subjects with "kg" in infusion unit but missing weight, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/failNoWgt-fent.csv AND create /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fixNoWgt-fent.csv
#########################
#########################
censor dates created, please see /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/CensorTime-fent.csv
#########################


head(ivdose.out)

  mod_id  date.dose infuse.time.real infuse.time infuse.dose          bolus.time bolus.dose given.dose maxint weight
1      1 2011-10-02             <NA>        <NA>          NA 2011-10-02 15:35:00         25         NA      0     NA
2      1 2011-10-02             <NA>        <NA>          NA 2011-10-02 17:26:00         25         NA      0     NA
3      1 2017-02-04             <NA>        <NA>          NA 2017-02-04 16:15:00         50         NA      0     NA
4      1 2017-02-04             <NA>        <NA>          NA 2017-02-04 16:30:00         20         NA      0     NA
5      1 2017-02-04             <NA>        <NA>          NA 2017-02-04 20:57:00         20         NA      0     NA
6      2 2014-06-28             <NA>        <NA>          NA 2014-06-28 08:15:00         20         NA      0     NA

Pro-Drug Level

This module processes drug concentration data that can be merged with medication dose data and other types of data.
This module is semi-interactive – it generates several files while processing in order to check missing data and potential data errors, and get feedback from an investigator. If corrected information (‘fix’ files) are provided, the module should be re-run to incorporate the corrections.
run_DrugLevel is the function to process the drug concentration data.

conc.out <- run_DrugLevel(conc.path=file.path(dataDir,"conc_mod_id.rds"),
    conc.columns = list(id = 'mod_id', conc = 'conc.level', idvisit = 'mod_id_visit', samplinkid = 'mod_id_event'),
    conc.select=c('mod_id','mod_id_visit','samp','fentanyl_calc_conc'),
    conc.rename=c(fentanyl_calc_conc = 'conc.level', samp= 'event'),
    conc.mod.list=list(mod_id_event = expression(paste(mod_id_visit, event, sep = '_'))),
    samp.path=file.path(dataDir,"samp_mod_id.rds"),
    samp.columns = list(conclinkid = 'mod_id_event', datetime = 'Sample.Collection.Date.and.Time'),
    samp.mod.list=list(mod_id_event = expression(paste(mod_id_visit, samp, sep = '_'))),
    check.path=checkDir,
    failmiss_fn = 'MissingConcDate-',
    multsets_fn = 'multipleSetsConc-',
    faildup_fn = 'DuplicateConc-', 
    drugname=drugname,
    LLOQ=LLOQ,
    demo.list=demo.out,
    demo.columns = list(id = 'mod_id', idvisit = 'mod_id_visit'))

#########################
3 rows need review, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/failMissingConcDate-fent.csv AND create /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fixMissingConcDate-fent.csv
#########################
subjects with concentration missing from sample file
 mod_id mod_id_event
      8        8.1_1
      8        8.1_2
      8        8.1_3
1 subjects have multiple sets of concentration data
16 total unique subjects ids (including multiple visits) currently in the concentration data
15 total unique subjects in the concentration data
#########################
15 rows need review, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/multipleSetsConc-fent2023-11-02.csv
#########################
15 total unique subjects ids (after excluding multiple visits) in the concentration data
15 total unique subjects in the concentration data

head(conc.out)

   mod_id mod_id_visit event  conc.level mod_id_event           date.time eid
1       1          1.2     1 0.014136220        1.2_1 2017-02-03 10:46:00   1
2       1          1.2     2 0.279820752        1.2_2 2017-02-04 20:30:00   1
55     10         10.1     2 3.136047304       10.1_2 2016-12-24 18:00:00   1
56     10         10.1     9 0.004720171       10.1_9 2017-01-01 04:20:00   1
57     10         10.1    10 0.017136367      10.1_10 2017-01-02 04:42:00   1
58     10         10.1    12 0.006335571      10.1_12 2017-01-04 03:40:00   1

The output provides a message that 3 rows are missing concentration date. The file ‘failMissingConcDate-fent.csv’ contains the 3 records with missing values for the date.time variable.

( fail.miss.conc.date <- read.csv(file.path(checkDir,"failMissingConcDate-fent.csv")) )

  subject_id subject_uid mod_id_event datetime
1       1566    35885929        8.1_1       NA
2       1566    35885929        8.1_2       NA
3       1566    35885929        8.1_3       NA

We can correct the missing dates by providing an updated file called ‘fixMissingConcDate-fent.csv’ that contains the missing data.

fail.miss.conc.date[,"datetime"] <- c("9/30/2016 09:32","10/1/2016 19:20","10/2/2016 02:04")
fail.miss.conc.date

  subject_id subject_uid mod_id_event        datetime
1       1566    35885929        8.1_1 9/30/2016 09:32
2       1566    35885929        8.1_2 10/1/2016 19:20
3       1566    35885929        8.1_3 10/2/2016 02:04

write.csv(fail.miss.conc.date, file.path(checkDir,"fixMissingConcDate-fent.csv"))

After providing the updated file, the same run_DrugLevel() function should be re-run. The output now contains an additional message below the first message saying “fixMissingConcDate-fent.csv read with failures replaced”. The conc.out data.frame also contains 3 additional rows with the corrected data.

conc.out <- run_DrugLevel(conc.path=file.path(dataDir,"conc_mod_id.rds"),
    conc.columns = list(id = 'mod_id', conc = 'conc.level', idvisit = 'mod_id_visit', samplinkid = 'mod_id_event'),
    conc.select=c('mod_id','mod_id_visit','samp','fentanyl_calc_conc'),
    conc.rename=c(fentanyl_calc_conc = 'conc.level', samp= 'event'),
    conc.mod.list=list(mod_id_event = expression(paste(mod_id_visit, event, sep = '_'))),
    samp.path=file.path(dataDir,"samp_mod_id.rds"),
    samp.columns = list(conclinkid = 'mod_id_event', datetime = 'Sample.Collection.Date.and.Time'),
    samp.mod.list=list(mod_id_event = expression(paste(mod_id_visit, samp, sep = '_'))),
    check.path=checkDir,
    failmiss_fn = 'MissingConcDate-',
    multsets_fn = 'multipleSetsConc-',
    faildup_fn = 'DuplicateConc-',
    drugname=drugname,
    LLOQ=LLOQ,
    demo.list=demo.out,
    demo.columns = list(id = 'mod_id', idvisit = 'mod_id_visit'))

#########################
3 rows need review, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/failMissingConcDate-fent.csv AND create /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fixMissingConcDate-fent.csv
#########################
file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fixMissingConcDate-fent.csv read with failures replaced
1 subjects have multiple sets of concentration data
16 total unique subjects ids (including multiple visits) currently in the concentration data
15 total unique subjects in the concentration data
#########################
15 rows need review, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/multipleSetsConc-fent2023-11-02.csv
#########################
15 total unique subjects ids (after excluding multiple visits) in the concentration data
15 total unique subjects in the concentration data

Pro-Laboratory

This module processes laboratory data that can be merged with data from other modules.
run_Labs() is the function to process the laboratory data.

creat.out <- run_Labs(lab.path=file.path(dataDir,"creat_mod_id.rds"),
    lab.select = c('mod_id','date.time','creat'),
    lab.mod.list = list(date.time = expression(parse_dates(fixDates(paste(date, time))))))

alb.out <- run_Labs(lab.path=file.path(dataDir,"alb_mod_id.rds"),
    lab.select = c('mod_id','date.time','alb'),
    lab.mod.list = list(date.time = expression(parse_dates(fixDates(paste(date, time))))))

lab.out <- list(creat.out, alb.out)

str(lab.out)

List of 2
 $ :'data.frame':   266 obs. of  3 variables:
  ..$ mod_id   : int [1:266] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ date.time: POSIXct[1:266], format: "2017-02-05 04:00:00" "2017-02-06 05:00:00" "2011-10-03 04:28:00" "2011-10-04 04:15:00" ...
  ..$ creat    : num [1:266] 0.52 0.53 0.42 0.35 0.29 0.28 0.34 0.59 0.54 0.26 ...
 $ :'data.frame':   44 obs. of  3 variables:
  ..$ mod_id   : int [1:44] 8 8 8 8 4 6 6 9 10 10 ...
  ..$ date.time: POSIXct[1:44], format: "2020-07-30 05:23:00" "2020-07-28 03:12:00" "2020-07-29 01:39:00" "2020-08-21 10:35:00" ...
  ..$ alb      : num [1:44] 2.9 2 2.7 4.1 4.1 2.3 2.6 3 3.1 4.2 ...

Build-PK-IV

This module creates PK data for IV medications.
Both dose data in the format output from the Pro-Med-Str1 module and concentration data in the format output from the Pro-DrugLevel module are required.
Demographic data from the Pro-Demographic module and laboratory data from the Pro-Laboratory module are optional.
The module is semi-interactive – it generates several files to check potential data errors, and get feedback from an investigator. If corrected information (‘fix’ files) are provided, the module should be re-run to incorporate the corrections.
If pk.vars includes ‘date’, the output generates its original date-time to which the ‘time’ is mapped. Users can use pk.vars to include variables for demographics or labs that are already merged with the concentration dataset when they prefer to provide a single concentration data file (required). But a separate dose data file is still required.
run_Build_PK_IV() is the function to build PK data with IV dosing data.

pk_dat <- run_Build_PK_IV(
    conc=conc.out,
    conc.columns = list(id = 'mod_id', datetime = 'date.time', druglevel = 'conc.level', 
                        idvisit = 'mod_id_visit'),
    dose=ivdose.out,
    dose.columns = list(id = 'mod_id', date = 'date.dose', infuseDatetime = 'infuse.time', 
                        infuseDose = 'infuse.dose', infuseTimeExact= 'infuse.time.real',
                        bolusDatetime = 'bolus.time', bolusDose = 'bolus.dose', 
                        gap = 'maxint', weight = 'weight'),
    demo.list = demo.out,
    demo.columns = list(id = 'mod_id', idvisit = 'mod_id_visit'),
    lab.list = lab.out,
    lab.columns = list(id = 'mod_id', datetime = 'date.time'),
    pk.vars=c('date'),
    drugname=drugname,
    check.path=checkDir,
    missdemo_fn='-missing-demo',
    faildupbol_fn='DuplicateBolus-',
    date.format="%m/%d/%y %H:%M:%S",
    date.tz="America/Chicago")

0 duplicated rows
The dimension of the PK data before merging with demographics: 234 x 9
The number of subjects in the PK data before merging with demographics: 15
The number of subjects in the demographic file, who meet the exclusion criteria: 2
check NA frequency in demographics, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fent-missing-demo.csv
Some demographic variables are missing and will be excluded: 
The list of final demographic variables: mod_visit
gender
weight
height
surgery_date
ageatsurgery
stat_sts
cpb_sts
in_hospital_mortality
add_ecmo
date_icu_dc
time_fromor
length_of_icu_stay
weight_demo
Checked: there are no missing creat
List of IDs missing at least 1 alb: 1.2
11.1
15.1
2.1
3.1
4.1
5.1
7.1
8.1
The dimension of the final PK data exported with the key demographics: 197 x 24 with 13 distinct subjects (mod_id)

Retrieving the original IDs:
- The function pullRealId() appends the original IDs – subject_id and subject_uid to the data.
- The parameter remove.mod.id=TRUE can be used to also remove any module IDs – mod_id, mod_visit, and mod_id_visit.

# convert id back to original IDs
pk_dat <- pullRealId(pk_dat, remove.mod.id=TRUE)

head(pk_dat)

     subject_id subject_uid time   amt        dv rate mdv evid              date gender weight height surgery_date ageatsurgery stat_sts
2         466.1    28579217 0.00  50.0        NA  0.0   1    1 02/04/17 16:15:00      0  21.99 116.90     2/4/2017         2451        1
2.1       466.1    28579217 0.25  20.0        NA  0.0   1    1 02/04/17 16:30:00      0  21.99 116.90     2/4/2017         2451        1
2.2       466.1    28579217 4.25    NA 0.2798208   NA   0    0 02/04/17 20:30:00      0  21.99 116.90     2/4/2017         2451        1
12       1607.0    38551767 0.00 109.2        NA 10.4   1    1 12/24/16 07:15:00      0   2.60  45.94   12/24/2016           23        3
12.1     1607.0    38551767 0.00  10.0        NA  0.0   1    1 12/24/16 07:15:00      0   2.60  45.94   12/24/2016           23        3
12.2     1607.0    38551767 1.25  15.0        NA  0.0   1    1 12/24/16 08:30:00      0   2.60  45.94   12/24/2016           23        3
     cpb_sts in_hospital_mortality add_ecmo date_icu_dc time_fromor length_of_icu_stay weight_demo creat alb
2        107                     0        0    2/5/2017        1322                  1       21.99  0.54  NA
2.1      107                     0        0    2/5/2017        1322                  1       21.99  0.54  NA
2.2      107                     0        0    2/5/2017        1322                  1       21.99  0.54  NA
12       110                     0        0    1/5/2017          NA                 12        2.76  0.66 1.6
12.1     110                     0        0    1/5/2017          NA                 12        2.76  0.66 1.6
12.2     110                     0        0    1/5/2017          NA                 12        2.76  0.66 1.6

References

Choi L, Beck C, McNeer E, Weeks HL, Williams ML, James NT, Niu X, Abou-Khalil BW, Birdwell KA, Roden DM, Stein CM. Development of a System for Post-marketing Population Pharmacokinetic and Pharmacodynamic Studies using Real-World Data from Electronic Health Records. Clinical Pharmacology & Therapeutics. 2020 Apr;107(4):934-43. doi: 10.1002/cpt.1787.

Build-PK-IV - Comprehensive