This tutorial describes a comprehensive PK data building procedure for medications that are intravenously administered. There are two phases: data processing which standardizes and combines the input data (Pro-Demographic, Pro-Med-Str, Pro-Drug Level, Pro-Laboratory) and data building which creates the final PK data (Build-PK-IV).
EHR
package.This tutorial describes four modules for processing data (Pro-Demographic, Pro-Med-Str, Pro-Drug Level, Pro-Laboratory) and one module for PK data building (Build-PK-IV) using data extracted from a structured database.
To begin we load the EHR
package, the pkdata
package, and the lubridate
package.
We first define three directories:
There are 4 types of raw data expected to exist in the raw data directory (i.e., rawDataDir below):
# define 3 directories
rawDataDir <- system.file("examples", "str_ex2", package="EHR") # directory for raw data
td <- tempdir()
checkDir <- file.path(td, 'checks') # directory for interactive checking
dir.create(checkDir)
dataDir <- file.path(td, 'data') # directory for processed data
dir.create(dataDir)
# examine raw data files in rawDataDir
dir(rawDataDir)
[1] "Albumin_DATA.csv" "Creatinine_DATA.csv" "Demographics_DATA.csv" "e-rx_DATA.csv"
[5] "FLOW_DATA.csv" "MAR_DATA.csv" "medChecked-fent.csv" "SampleConcentration_DATA.csv"
[9] "SampleTimes_DATA.csv"
The raw datasets must go through a pre-processing stage which creates new ID variables and datasets that can be used by the data processing modules. There are three pre-processing steps:
Each raw dataset should contain a subject unique ID, a subject visit ID, or both ids. In this example the subject unique ID is called subject_uid
and the subject visit ID is called subject_id
. The subject visit ID is a combination of subject and visit/course – e.g., subject_id
14.0 is the first course for subject 14, subject_id
14.1 is the second course for subject 14, and so on. subject_uid
is a unique ID that is the same for all subject records. The integer part of subject_id
has a 1-to-1 correspondence with subject_uid
– for this example, subject_uid
62734832 is associated with both subject_id
14.0 and subject_id
14.1. If there is only a single visit/course per subject only the subject unique ID is needed.
readTransform()
: This function reads in a CSV file and makes optional modifications to the resulting dataframe.
Demographics raw data
subject_id
and subject_uid
, in addition to demographic variables such as gender, date of birth, height, weight, etc. As subject_id
and subject_uid
already exist, no further cleaning is needed.readTransform()
function.# demographics data
demo.in <- readTransform(file.path(rawDataDir, "Demographics_DATA.csv"))
head(demo.in)
subject_id subject_uid gender weight height surgery_date ageatsurgery stat_sts cpb_sts in_hospital_mortality add_ecmo date_icu_dc
1 1106 34364670 0 5.14 59.18 6/28/2014 141 3 133 0 0 7/2/2014
2 1444 36792472 1 5.67 62.90 1/10/2016 292 1 65 0 0 1/12/2016
3 1465 36292449 0 23.67 118.02 3/19/2016 2591 2 357 0 0 3/20/2016
4 1520 34161967 0 14.07 97.04 7/18/2016 1320 5 93 0 0 7/19/2016
5 1524 37857374 1 23.40 102.80 7/23/2016 1561 3 87 1 0 7/30/2016
6 1550 37826262 1 6.21 62.03 9/4/2016 208 1 203 0 0 9/11/2016
time_fromor
1 1657
2 1325
3 NA
4 1745
5 1847
6 1210
readTransform()
Study.ID
to subject_id
and create a new variable called samp
, which indexes the sample number, using the modify=
argument.# read SampleTimes_DATA.csv
samp.in <- readTransform(file.path(rawDataDir, "SampleTimes_DATA.csv"),
rename = c('Study.ID' = 'subject_id'),
modify = list(samp = expression(as.numeric(sub('Sample ', '', Event.Name)))))
head(samp.in)
subject_id Event.Name Sample.Collection.Date.and.Time samp
1 466.1 Sample 1 2/3/2017 10:46 1
2 466.1 Sample 2 2/4/2017 20:30 2
3 1106.0 Sample 1 6/28/2014 13:40 1
4 1106.0 Sample 2 6/29/2014 03:10 2
5 1106.0 Sample 3 6/30/2014 03:35 3
6 1106.0 Sample 4 7/1/2014 03:45 4
# helper function used to make subject_id
sampId <- function(x) {
# remove leading zeroes or trailing periods
subid <- gsub('(^0*|\\.$)', '', x)
# change _ to .
gsub('_([0-9]+[_].*)$', '.\\1', subid)
}
# read SampleConcentration_DATA.csv
conc.in <- readTransform(file.path(rawDataDir, "SampleConcentration_DATA.csv"),
modify = list(
subid = expression(sampId(name)),
subject_id = expression(as.numeric(sub('[_].*', '', subid))),
samp = expression(sub('[^_]*[_]', '', subid)),
name = NULL,
data_file = NULL,
subid = NULL
)
)
head(conc.in)
record_id fentanyl_calc_conc subject_id samp
1 1 0.01413622 466.1 1
2 2 0.27982075 466.1 2
3 3 6.11873679 1106.0 1
4 4 0.59161716 1106.0 2
5 5 0.11280471 1106.0 3
6 6 0.02112153 1106.0 4
readTransform()
function which renames the variables Subject.Id
to subject_id
and Subject.Uniq.Id
to subject_uid
.R
variable names. The data is read in using read.csv()
with the argument check.names = FALSE
and then passed to the dataTransformation()
function which renames Uniq.Id
to subject_uid
.# FLOW dosing data
flow.in <- readTransform(file.path(rawDataDir, "FLOW_DATA.csv"),
rename = c('Subject.Id' = 'subject_id',
'Subject.Uniq.Id' = 'subject_uid'))
# pre-process the flow data
# date.time variable should be in an appropriate form
flow.in[,'date.time'] <- pkdata::parse_dates(EHR:::fixDates(flow.in[,'Perform.Date']))
# unit and rate are required: separate unit and rate from 'Final.Rate..NFR.units.' if needed
flow.in[,'unit'] <- sub('.*[ ]', '', flow.in[,'Final.Rate..NFR.units.'])
flow.in[,'rate'] <- as.numeric(sub('([0-9.]+).*', '\\1', flow.in[,'Final.Rate..NFR.units.']))
head(flow.in)
subject_id subject_uid Perform.Date FOCUS_MEDNAME Final.Wt..kg. Final.Rate..NFR.units. Final.Units Flow date.time unit
1 1596 38340814 12/4/2016 5:30 Fentanyl 6.75 1 mcg/kg/hr 3.375 NA 2016-12-04 05:30:00 mcg/kg/hr
2 1596 38340814 12/4/2016 6:00 Fentanyl 6.75 1 mcg/kg/hr 6.750 0.1 2016-12-04 06:00:00 mcg/kg/hr
3 1596 38340814 12/4/2016 7:00 Fentanyl 6.75 1 mcg/kg/hr 4.500 0.1 2016-12-04 07:00:00 mcg/kg/hr
4 1596 38340814 12/4/2016 7:40 Fentanyl 6.75 0 mcg/kg/hr 0.000 NA 2016-12-04 07:40:00 mcg/kg/hr
5 1607 38551767 12/24/2016 19:30 Fentanyl 2.60 2 mcg/kg/hr 2.600 NA 2016-12-24 19:30:00 mcg/kg/hr
6 1607 38551767 12/24/2016 20:00 Fentanyl 2.60 2 mcg/kg/hr 5.200 0.2 2016-12-24 20:00:00 mcg/kg/hr
rate
1 1
2 1
3 1
4 0
5 2
6 2
# MAR dosing data
mar.in0 <- read.csv(file.path(rawDataDir, "MAR_DATA.csv"), check.names = FALSE)
mar.in <- dataTransformation(mar.in0, rename = c('Uniq.Id' = 'subject_uid'))
head(mar.in)
subject_uid Date Time med:mDrug med:dosage med:route med:freq med:given
1 28579217 2017-02-04 19:15 Nicardipine 3 mcg/kg/min IV <NA> Given
2 28579217 2011-10-02 22:11 Famotidine 4.5 mg IV q12hrs Given
3 28579217 2011-10-02 20:17 Morphine sulfate 1 mg IV q2h prn Given
4 28579217 2011-10-03 02:28 Diphenhydramine injection 12 mg IV now Given
5 28579217 2011-10-02 22:11 Cefazolin 225 mg IV q8hrs Given
6 28579217 2011-10-02 23:30 Morphine sulfate 1 mg IV q2h prn Given
readTransform()
function and Subject.uniq
is renamed to subject_uid
.# Serum creatinine lab data
creat.in <- readTransform(file.path(rawDataDir, "Creatinine_DATA.csv"),
rename = c('Subject.uniq' = 'subject_uid'))
head(creat.in)
subject_uid date time creat
1 28579217 02/05/17 4:00 0.52
2 28579217 02/06/17 5:00 0.53
3 28579217 10/03/11 4:28 0.42
4 28579217 10/04/11 4:15 0.35
5 28579217 10/06/11 4:25 0.29
6 28579217 10/09/11 4:45 0.28
# Albumin lab data
alb.in <- readTransform(file.path(rawDataDir, "Albumin_DATA.csv"),
rename = c('Subject.uniq' = 'subject_uid'))
head(creat.in)
subject_uid date time creat
1 28579217 02/05/17 4:00 0.52
2 28579217 02/06/17 5:00 0.53
3 28579217 10/03/11 4:28 0.42
4 28579217 10/04/11 4:15 0.35
5 28579217 10/06/11 4:25 0.29
6 28579217 10/09/11 4:45 0.28
idCrosswalk()
: This function merges all of the cleaned input datasets and creates new IDs.
data=
argument of this function accepts a list of input datasetsidcols=
argument accepts a list of vectors or character strings that identify the ID variables in the corresponding input dataset.subject_id
, subject_uid
) and the new ID variables (mod_id
, mod_visit
, and mod_id_visit
).mod_id_visit
has a 1-to-1 correspondence to variable subject_id
and uniquely identifies each subjects’ visit/course; the new variable mod_id
has a 1-to-1 correspondence to variable subject_uid
and uniquely identifies each subject.# define list of input datasets
data <- list(demo.in,
samp.in,
conc.in,
flow.in,
mar.in,
creat.in,
alb.in)
# define list of vectors or character strings that identify the ID variables
idcols <- list(c('subject_id', 'subject_uid'), # id vars in demo.in
'subject_id', # id var in samp.in
'subject_id', # id var in conc.in
c('subject_id', 'subject_uid'), # id vars in flow.in
'subject_uid', # id var in mar.in
'subject_uid', # id var in creat.in
'subject_uid') # id var in creat.in
# merge all IDs from cleaned datasets and create new ID variables
id.xwalk <- idCrosswalk(data, idcols, visit.id="subject_id", uniq.id="subject_uid")
saveRDS(id.xwalk, file=file.path(dataDir,"module_id_xwalk.rds"))
head(id.xwalk)
subject_id subject_uid mod_visit mod_id mod_id_visit
1 466.0 28579217 1 1 1.1
2 466.1 28579217 2 1 1.2
3 1106.0 34364670 1 2 2.1
4 1444.0 36792472 1 3 3.1
5 1465.0 36292449 1 4 4.1
6 1520.0 34161967 1 5 5.1
pullFakeId(data, id.xwalk, firstCols = NULL, orderBy = NULL)
pullFakeId()
: This function replaces the original IDs – subject_id
and subject_uid
– with new IDs – mod_id
, mod_visit
, and mod_id_visit
– to create datasets which can be used by the data processing modules.
dat=
argument should contain the cleaned input data.frame from pre-processing step (1).xwalk=
argument should contain the crosswalk data.frame produced in step (2).firstCols=
and orderBy=
control which variables are in the first columns of the output and the sort order, respectively.R
objects for use with the modules.## demographics data
demo.cln <- pullFakeId(demo.in, id.xwalk,
firstCols = c('mod_id', 'mod_visit', 'mod_id_visit'),
uniq.id = 'subject_uid')
head(demo.cln)
mod_id mod_visit mod_id_visit gender weight height surgery_date ageatsurgery stat_sts cpb_sts in_hospital_mortality add_ecmo date_icu_dc
1 2 1 2.1 0 5.14 59.18 6/28/2014 141 3 133 0 0 7/2/2014
2 3 1 3.1 1 5.67 62.90 1/10/2016 292 1 65 0 0 1/12/2016
3 4 1 4.1 0 23.67 118.02 3/19/2016 2591 2 357 0 0 3/20/2016
4 5 1 5.1 0 14.07 97.04 7/18/2016 1320 5 93 0 0 7/19/2016
5 6 1 6.1 1 23.40 102.80 7/23/2016 1561 3 87 1 0 7/30/2016
6 7 1 7.1 1 6.21 62.03 9/4/2016 208 1 203 0 0 9/11/2016
time_fromor
1 1657
2 1325
3 NA
4 1745
5 1847
6 1210
saveRDS(demo.cln, file=file.path(dataDir,"demo_mod_id.rds"))
## drug level data
# sampling times
samp.cln <- pullFakeId(samp.in, id.xwalk,
firstCols = c('mod_id', 'mod_visit', 'mod_id_visit', 'samp'),
orderBy = c('mod_id_visit','samp'),
uniq.id = 'subject_uid')
head(samp.cln)
mod_id mod_visit mod_id_visit samp Event.Name Sample.Collection.Date.and.Time
1 1 2 1.2 1 Sample 1 2/3/2017 10:46
2 1 2 1.2 2 Sample 2 2/4/2017 20:30
3 10 1 10.1 1 Sample 1 12/23/2016 05:15
4 10 1 10.1 2 Sample 2 12/24/2016 18:00
5 10 1 10.1 3 Sample 3 12/25/2016 03:00
6 10 1 10.1 4 Sample 4 12/26/2016 04:00
saveRDS(samp.cln, file=file.path(dataDir,"samp_mod_id.rds"))
# drug concentration measurements
conc.cln <- pullFakeId(conc.in, id.xwalk,
firstCols = c('record_id', 'mod_id', 'mod_visit', 'mod_id_visit', 'samp'),
orderBy = 'record_id',
uniq.id = 'subject_uid')
head(conc.cln)
record_id mod_id mod_visit mod_id_visit samp fentanyl_calc_conc
1 1 1 2 1.2 1 0.01413622
2 2 1 2 1.2 2 0.27982075
3 3 2 1 2.1 1 6.11873679
4 4 2 1 2.1 2 0.59161716
5 5 2 1 2.1 3 0.11280471
6 6 2 1 2.1 4 0.02112153
saveRDS(conc.cln, file=file.path(dataDir,"conc_mod_id.rds"))
## dosing data
# flow
flow.cln <- pullFakeId(flow.in, id.xwalk,
firstCols = c('mod_id', 'mod_visit', 'mod_id_visit'),
uniq.id = 'subject_uid')
head(flow.cln)
mod_id mod_visit mod_id_visit Perform.Date FOCUS_MEDNAME Final.Wt..kg. Final.Rate..NFR.units. Final.Units Flow date.time
1 9 1 9.1 12/4/2016 5:30 Fentanyl 6.75 1 mcg/kg/hr 3.375 NA 2016-12-04 05:30:00
2 9 1 9.1 12/4/2016 6:00 Fentanyl 6.75 1 mcg/kg/hr 6.750 0.1 2016-12-04 06:00:00
3 9 1 9.1 12/4/2016 7:00 Fentanyl 6.75 1 mcg/kg/hr 4.500 0.1 2016-12-04 07:00:00
4 9 1 9.1 12/4/2016 7:40 Fentanyl 6.75 0 mcg/kg/hr 0.000 NA 2016-12-04 07:40:00
5 10 1 10.1 12/24/2016 19:30 Fentanyl 2.60 2 mcg/kg/hr 2.600 NA 2016-12-24 19:30:00
6 10 1 10.1 12/24/2016 20:00 Fentanyl 2.60 2 mcg/kg/hr 5.200 0.2 2016-12-24 20:00:00
unit rate
1 mcg/kg/hr 1
2 mcg/kg/hr 1
3 mcg/kg/hr 1
4 mcg/kg/hr 0
5 mcg/kg/hr 2
6 mcg/kg/hr 2
saveRDS(flow.cln, file=file.path(dataDir,"flow_mod_id.rds"))
# mar
mar.cln <- pullFakeId(mar.in, id.xwalk, firstCols = 'mod_id', uniq.id = 'subject_uid')
head(mar.cln)
mod_id Date Time med:mDrug med:dosage med:route med:freq med:given
1 1 2017-02-04 19:15 Nicardipine 3 mcg/kg/min IV <NA> Given
2 1 2011-10-02 22:11 Famotidine 4.5 mg IV q12hrs Given
3 1 2011-10-02 20:17 Morphine sulfate 1 mg IV q2h prn Given
4 1 2011-10-03 02:28 Diphenhydramine injection 12 mg IV now Given
5 1 2011-10-02 22:11 Cefazolin 225 mg IV q8hrs Given
6 1 2011-10-02 23:30 Morphine sulfate 1 mg IV q2h prn Given
saveRDS(mar.cln, file=file.path(dataDir,"mar_mod_id.rds"))
## laboratory data
# creatinine
creat.cln <- pullFakeId(creat.in, id.xwalk, 'mod_id',uniq.id = 'subject_uid')
head(creat.cln)
mod_id date time creat
1 1 02/05/17 4:00 0.52
2 1 02/06/17 5:00 0.53
3 1 10/03/11 4:28 0.42
4 1 10/04/11 4:15 0.35
5 1 10/06/11 4:25 0.29
6 1 10/09/11 4:45 0.28
saveRDS(creat.cln, file=file.path(dataDir,"creat_mod_id.rds"))
# albumin
alb.cln <- pullFakeId(alb.in, id.xwalk, 'mod_id', uniq.id = 'subject_uid')
head(alb.cln)
mod_id date time alb
1 8 07/30/20 5:23 2.9
2 8 07/28/20 3:12 2.0
3 8 07/29/20 1:39 2.7
4 8 08/21/20 10:35 4.1
5 4 06/13/15 17:20 4.1
6 6 07/25/16 8:35 2.3
Options and parameters: Before running the processing modules, it is necessary to define several options and parameters.
options(pkxwalk =)
allows the modules to access the crosswalk file.drugname
stub.in_hospital_mortality
or add_ecmo
and create a new variable called length_of_icu_stay
.run_Demo()
is the function to run this module.# helper function
exclude_val <- function(x, val=1) { !is.na(x) & x == val }
demo.out <- run_Demo(demo.path = file.path(dataDir, "demo_mod_id.rds"),
demo.columns = list(id = 'mod_id_visit'),
toexclude = expression(exclude_val(in_hospital_mortality) | exclude_val(add_ecmo)),
demo.mod.list = list(length_of_icu_stay =
expression(daysDiff(surgery_date, date_icu_dc))))
The number of subjects in the demographic data, who meet the exclusion criteria: 2
head(demo.out$demo)
mod_id mod_visit mod_id_visit gender weight height surgery_date ageatsurgery stat_sts cpb_sts in_hospital_mortality add_ecmo date_icu_dc
1 2 1 2.1 0 5.14 59.18 6/28/2014 141 3 133 0 0 7/2/2014
2 3 1 3.1 1 5.67 62.90 1/10/2016 292 1 65 0 0 1/12/2016
3 4 1 4.1 0 23.67 118.02 3/19/2016 2591 2 357 0 0 3/20/2016
4 5 1 5.1 0 14.07 97.04 7/18/2016 1320 5 93 0 0 7/19/2016
5 6 1 6.1 1 23.40 102.80 7/23/2016 1561 3 87 1 0 7/30/2016
6 7 1 7.1 1 6.21 62.03 9/4/2016 208 1 203 0 0 9/11/2016
time_fromor length_of_icu_stay
1 1657 4
2 1325 2
3 NA 1
4 1745 1
5 1847 7
6 1210 7
demo.out$exclude
[1] "6.1" "13.1"
ivdose.out <- run_MedStrI(
mar.path=file.path(dataDir,"mar_mod_id.rds"),
mar.columns = list(id='mod_id', datetime=c('Date','Time'), dose='med:dosage', drug='med:mDrug', given='med:given'),
medGivenReq = TRUE,
flow.path=file.path(dataDir,"flow_mod_id.rds"),
flow.columns = list(id = 'mod_id', datetime = 'date.time', finalunits = 'Final.Units',
unit = 'unit', rate = 'rate', weight = 'Final.Wt..kg.'),
medchk.path=file.path(system.file("examples", "str_ex2", package="EHR"), sprintf('medChecked-%s.csv', drugname)),
demo.list = NULL,
demo.columns = list(),
missing.wgt.path = NULL,
wgt.columns = list(),
check.path = checkDir,
failflow_fn = 'FailFlow',
failunit_fn = 'Unit',
failnowgt_fn = 'NoWgt',
infusion.unit = 'mcg/kg/hr',
bolus.unit = 'mcg',
bol.rate.thresh = Inf,
rateunit = 'mcg/hr',
ratewgtunit = 'mcg/kg/hr',
weightunit = 'kg',
drugname = drugname)
The number of rows in the original data 124
The number of rows after removing the duplicates 124
no units other than mcg/kg/hr or mcg, file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/failUnit-fent.csv not created
#########################
33 rows from 1 subjects with "kg" in infusion unit but missing weight, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/failNoWgt-fent.csv AND create /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fixNoWgt-fent.csv
#########################
#########################
censor dates created, please see /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/CensorTime-fent.csv
#########################
head(ivdose.out)
mod_id date.dose infuse.time.real infuse.time infuse.dose bolus.time bolus.dose given.dose maxint weight
1 1 2011-10-02 <NA> <NA> NA 2011-10-02 15:35:00 25 NA 0 NA
2 1 2011-10-02 <NA> <NA> NA 2011-10-02 17:26:00 25 NA 0 NA
3 1 2017-02-04 <NA> <NA> NA 2017-02-04 16:15:00 50 NA 0 NA
4 1 2017-02-04 <NA> <NA> NA 2017-02-04 16:30:00 20 NA 0 NA
5 1 2017-02-04 <NA> <NA> NA 2017-02-04 20:57:00 20 NA 0 NA
6 2 2014-06-28 <NA> <NA> NA 2014-06-28 08:15:00 20 NA 0 NA
conc.out <- run_DrugLevel(conc.path=file.path(dataDir,"conc_mod_id.rds"),
conc.columns = list(id = 'mod_id', conc = 'conc.level', idvisit = 'mod_id_visit', samplinkid = 'mod_id_event'),
conc.select=c('mod_id','mod_id_visit','samp','fentanyl_calc_conc'),
conc.rename=c(fentanyl_calc_conc = 'conc.level', samp= 'event'),
conc.mod.list=list(mod_id_event = expression(paste(mod_id_visit, event, sep = '_'))),
samp.path=file.path(dataDir,"samp_mod_id.rds"),
samp.columns = list(conclinkid = 'mod_id_event', datetime = 'Sample.Collection.Date.and.Time'),
samp.mod.list=list(mod_id_event = expression(paste(mod_id_visit, samp, sep = '_'))),
check.path=checkDir,
failmiss_fn = 'MissingConcDate-',
multsets_fn = 'multipleSetsConc-',
faildup_fn = 'DuplicateConc-',
drugname=drugname,
LLOQ=LLOQ,
demo.list=demo.out,
demo.columns = list(id = 'mod_id', idvisit = 'mod_id_visit'))
#########################
3 rows need review, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/failMissingConcDate-fent.csv AND create /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fixMissingConcDate-fent.csv
#########################
subjects with concentration missing from sample file
mod_id mod_id_event
8 8.1_1
8 8.1_2
8 8.1_3
1 subjects have multiple sets of concentration data
16 total unique subjects ids (including multiple visits) currently in the concentration data
15 total unique subjects in the concentration data
#########################
15 rows need review, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/multipleSetsConc-fent2023-11-02.csv
#########################
15 total unique subjects ids (after excluding multiple visits) in the concentration data
15 total unique subjects in the concentration data
head(conc.out)
mod_id mod_id_visit event conc.level mod_id_event date.time eid
1 1 1.2 1 0.014136220 1.2_1 2017-02-03 10:46:00 1
2 1 1.2 2 0.279820752 1.2_2 2017-02-04 20:30:00 1
55 10 10.1 2 3.136047304 10.1_2 2016-12-24 18:00:00 1
56 10 10.1 9 0.004720171 10.1_9 2017-01-01 04:20:00 1
57 10 10.1 10 0.017136367 10.1_10 2017-01-02 04:42:00 1
58 10 10.1 12 0.006335571 10.1_12 2017-01-04 03:40:00 1
date.time
variable. subject_id subject_uid mod_id_event datetime
1 1566 35885929 8.1_1 NA
2 1566 35885929 8.1_2 NA
3 1566 35885929 8.1_3 NA
fail.miss.conc.date[,"datetime"] <- c("9/30/2016 09:32","10/1/2016 19:20","10/2/2016 02:04")
fail.miss.conc.date
subject_id subject_uid mod_id_event datetime
1 1566 35885929 8.1_1 9/30/2016 09:32
2 1566 35885929 8.1_2 10/1/2016 19:20
3 1566 35885929 8.1_3 10/2/2016 02:04
run_DrugLevel()
function should be re-run. The output now contains an additional message below the first message saying “fixMissingConcDate-fent.csv read with failures replaced”. The conc.out data.frame also contains 3 additional rows with the corrected data.conc.out <- run_DrugLevel(conc.path=file.path(dataDir,"conc_mod_id.rds"),
conc.columns = list(id = 'mod_id', conc = 'conc.level', idvisit = 'mod_id_visit', samplinkid = 'mod_id_event'),
conc.select=c('mod_id','mod_id_visit','samp','fentanyl_calc_conc'),
conc.rename=c(fentanyl_calc_conc = 'conc.level', samp= 'event'),
conc.mod.list=list(mod_id_event = expression(paste(mod_id_visit, event, sep = '_'))),
samp.path=file.path(dataDir,"samp_mod_id.rds"),
samp.columns = list(conclinkid = 'mod_id_event', datetime = 'Sample.Collection.Date.and.Time'),
samp.mod.list=list(mod_id_event = expression(paste(mod_id_visit, samp, sep = '_'))),
check.path=checkDir,
failmiss_fn = 'MissingConcDate-',
multsets_fn = 'multipleSetsConc-',
faildup_fn = 'DuplicateConc-',
drugname=drugname,
LLOQ=LLOQ,
demo.list=demo.out,
demo.columns = list(id = 'mod_id', idvisit = 'mod_id_visit'))
#########################
3 rows need review, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/failMissingConcDate-fent.csv AND create /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fixMissingConcDate-fent.csv
#########################
file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fixMissingConcDate-fent.csv read with failures replaced
1 subjects have multiple sets of concentration data
16 total unique subjects ids (including multiple visits) currently in the concentration data
15 total unique subjects in the concentration data
#########################
15 rows need review, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/multipleSetsConc-fent2023-11-02.csv
#########################
15 total unique subjects ids (after excluding multiple visits) in the concentration data
15 total unique subjects in the concentration data
creat.out <- run_Labs(lab.path=file.path(dataDir,"creat_mod_id.rds"),
lab.select = c('mod_id','date.time','creat'),
lab.mod.list = list(date.time = expression(parse_dates(fixDates(paste(date, time))))))
alb.out <- run_Labs(lab.path=file.path(dataDir,"alb_mod_id.rds"),
lab.select = c('mod_id','date.time','alb'),
lab.mod.list = list(date.time = expression(parse_dates(fixDates(paste(date, time))))))
lab.out <- list(creat.out, alb.out)
str(lab.out)
List of 2
$ :'data.frame': 266 obs. of 3 variables:
..$ mod_id : int [1:266] 1 1 1 1 1 1 1 1 1 1 ...
..$ date.time: POSIXct[1:266], format: "2017-02-05 04:00:00" "2017-02-06 05:00:00" "2011-10-03 04:28:00" "2011-10-04 04:15:00" ...
..$ creat : num [1:266] 0.52 0.53 0.42 0.35 0.29 0.28 0.34 0.59 0.54 0.26 ...
$ :'data.frame': 44 obs. of 3 variables:
..$ mod_id : int [1:44] 8 8 8 8 4 6 6 9 10 10 ...
..$ date.time: POSIXct[1:44], format: "2020-07-30 05:23:00" "2020-07-28 03:12:00" "2020-07-29 01:39:00" "2020-08-21 10:35:00" ...
..$ alb : num [1:44] 2.9 2 2.7 4.1 4.1 2.3 2.6 3 3.1 4.2 ...
pk.vars
includes ‘date’, the output generates its original date-time to which the ‘time’ is mapped. Users can use pk.vars
to include variables for demographics or labs that are already merged with the concentration dataset when they prefer to provide a single concentration data file (required). But a separate dose data file is still required.pk_dat <- run_Build_PK_IV(
conc=conc.out,
conc.columns = list(id = 'mod_id', datetime = 'date.time', druglevel = 'conc.level',
idvisit = 'mod_id_visit'),
dose=ivdose.out,
dose.columns = list(id = 'mod_id', date = 'date.dose', infuseDatetime = 'infuse.time',
infuseDose = 'infuse.dose', infuseTimeExact= 'infuse.time.real',
bolusDatetime = 'bolus.time', bolusDose = 'bolus.dose',
gap = 'maxint', weight = 'weight'),
demo.list = demo.out,
demo.columns = list(id = 'mod_id', idvisit = 'mod_id_visit'),
lab.list = lab.out,
lab.columns = list(id = 'mod_id', datetime = 'date.time'),
pk.vars=c('date'),
drugname=drugname,
check.path=checkDir,
missdemo_fn='-missing-demo',
faildupbol_fn='DuplicateBolus-',
date.format="%m/%d/%y %H:%M:%S",
date.tz="America/Chicago")
0 duplicated rows
The dimension of the PK data before merging with demographics: 234 x 9
The number of subjects in the PK data before merging with demographics: 15
The number of subjects in the demographic file, who meet the exclusion criteria: 2
check NA frequency in demographics, see file /var/folders/06/0qv1dr5508j_tbzqdjfqjf680000gn/T//RtmpqLJ9qE/checks/fent-missing-demo.csv
Some demographic variables are missing and will be excluded:
The list of final demographic variables: mod_visit
gender
weight
height
surgery_date
ageatsurgery
stat_sts
cpb_sts
in_hospital_mortality
add_ecmo
date_icu_dc
time_fromor
length_of_icu_stay
weight_demo
Checked: there are no missing creat
List of IDs missing at least 1 alb: 1.2
11.1
15.1
2.1
3.1
4.1
5.1
7.1
8.1
The dimension of the final PK data exported with the key demographics: 197 x 24 with 13 distinct subjects (mod_id)
pullRealId()
appends the original IDs – subject_id
and subject_uid
to the data.remove.mod.id=TRUE
can be used to also remove any module IDs – mod_id
, mod_visit
, and mod_id_visit
.# convert id back to original IDs
pk_dat <- pullRealId(pk_dat, remove.mod.id=TRUE)
head(pk_dat)
subject_id subject_uid time amt dv rate mdv evid date gender weight height surgery_date ageatsurgery stat_sts
2 466.1 28579217 0.00 50.0 NA 0.0 1 1 02/04/17 16:15:00 0 21.99 116.90 2/4/2017 2451 1
2.1 466.1 28579217 0.25 20.0 NA 0.0 1 1 02/04/17 16:30:00 0 21.99 116.90 2/4/2017 2451 1
2.2 466.1 28579217 4.25 NA 0.2798208 NA 0 0 02/04/17 20:30:00 0 21.99 116.90 2/4/2017 2451 1
12 1607.0 38551767 0.00 109.2 NA 10.4 1 1 12/24/16 07:15:00 0 2.60 45.94 12/24/2016 23 3
12.1 1607.0 38551767 0.00 10.0 NA 0.0 1 1 12/24/16 07:15:00 0 2.60 45.94 12/24/2016 23 3
12.2 1607.0 38551767 1.25 15.0 NA 0.0 1 1 12/24/16 08:30:00 0 2.60 45.94 12/24/2016 23 3
cpb_sts in_hospital_mortality add_ecmo date_icu_dc time_fromor length_of_icu_stay weight_demo creat alb
2 107 0 0 2/5/2017 1322 1 21.99 0.54 NA
2.1 107 0 0 2/5/2017 1322 1 21.99 0.54 NA
2.2 107 0 0 2/5/2017 1322 1 21.99 0.54 NA
12 110 0 0 1/5/2017 NA 12 2.76 0.66 1.6
12.1 110 0 0 1/5/2017 NA 12 2.76 0.66 1.6
12.2 110 0 0 1/5/2017 NA 12 2.76 0.66 1.6
If you see mistakes or want to suggest changes, please create an issue on the source repository.