A Practical Example for Usage of the RMDL package

A key benefit of the RMDL package is that it enables further statistical analysis of RDML data. This section describes R packages that link with the RMDL package. It will provide examples on how other packages allow the manipulation of RDML objects.

In this example we will employ the qpcR package (???) to calculate Cq values after the selection of an optimal sigmoidal model, as suggested by the Akaike’s An Information Criterion. The Cq values will then be used to calculate the amplification efficiency from a calibration curve, based on the effcalc function of the chipPCR package (???).

Note: The data used here serve as example only. Overall, the quality of the measurement is not appropriate for further application in a study. Here, we show that the estimated amplification efficiency from the StepOne system differs from that of the proposed pipeline (RDML $\rightarrow$ qpcR $\rightarrow$ chipPCR).

In principle, the following code examples can be combined with report generating toolkits for data analysis pipelines, such as Nozzle (see (???)).

Preparation of the data

In the RDML vignette, it was demonstrated how to read-in RDML data. In this section we will continue with the built-in RDML example file stepone_std.rdml. This file was obtained during the measurement of human DNA concentration by a LightCycler 96 instrument (Roche) and the XY-Detect kit (Syntol, Russia). The file is opened as described before:

# Load the RDML package and use its functions to `extract` the required data
library(RDML)
filename <- system.file("extdata/stepone_std.rdml", package="RDML")
raw_data <- RDML$new(filename=filename)

Working with metadata of the example file

At this stage, all data from the stepone_std.rdml RDML file are available. For example, the amplification efficiency as provided by the StepOne™ Real-Time PCR System software can be fetched from the package via raw_data$target[["RNase P"]]$amplificationEfficiency. In another example, we extract the information about the target in this experiment.

raw_data$target
#> $`RNase P`
#> id: ~ idType
#> description: NFQ-MGB
#> documentation: []
#> xRef: []
#> type: ~ targetTypeType
#> amplificationEfficiencyMethod: 
#> amplificationEfficiency: 93.91181
#> amplificationEfficiencySE: 
#> detectionLimit: 
#> dyeId: ~ idReferencesType
#> sequences: ~ sequencesType
#> commercialAssay:

To get some information about the run we can use:

raw_data$experiment[["Standard Curve Example"]]$run
#> $Run001
#> id: ~ idType
#> description: 
#> documentation: []
#> experimenter: []
#> instrument: Applied Biosystems StepOne™ Instrument
#> dataCollectionSoftware: ~ dataCollectionSoftwareType
#> backgroundDeterminationMethod: Background Subtraction
#> cqDetectionMethod: ~ cqDetectionMethodType
#> thermalCyclingConditions: ~ idReferencesType
#> pcrFormat: ~ pcrFormatType
#> runDate: 2006-11-10T09:24:39.265
#> react: [1, 2, 3, 4, 5, 6, 7, 8, 13, 14, 15, 16, 17, 18, 19, 20, 25, 26, 27, 28, 29, 30, 31, 32]

Extraction of raw amplification curve data

For convenience, we use the pipe function %>% from the magrittr package for further analysis. In the following, we fetch the amplification curve data from the RDML file.

raw_data_tab <- raw_data$AsTable(
    # Custom name pattern 'position~sample~sample.type~target~run.id'
    name.pattern=paste(
        react$position,
        react$sample$id,
        private$.sample[[react$sample$id]]$type$value,
        data$tar$id,
        run$id$id, # run id added to names
        sep="~"))
# Get all fluorescence data and assign them to the object fdata
fdata <- as.data.frame(raw_data$GetFData(raw_data_tab, long.table=FALSE))

Plotting of raw amplification curve data

The plotting of the raw data is an important step to visually inspect the data. In this example, we use the ggplot2 package (???) instead of the default R graphics functions.

# Load the ggplot2 package for plotting
library(ggplot2)
# Load the reshape2 package to rearrange the data
library(reshape2)
# Rearrange and plot the raw data
fdata_gg <- melt(fdata, id.vars="cyc")
ggplot(data=fdata_gg, aes(x=cyc, y=value, color=variable)) + 
    geom_line() + labs(x="Cycle", y="RFU") + theme_light() +
    theme(legend.position="top",
    legend.direction="horizontal")

Analysis of the qPCR amplification curve data by a custom made function

During the next steps we will employ the qpcR package. The function mselect performs sigmoid model selection by different criteria (e.g., bias-corrected Akaike’s Information Criterion). Note: In most cases a sigmoidal model with seven parameters was selected.

The function efficiency calculates the qPCR Cq values, amplification efficiency and other important qPCR parameters. In this example, we set the parameter type of the efficiency function to Cy0, which will calculate and report the Cy0 value. According to (???), the Cy0 value is the intersection of a tangent on the first derivative maximum with the abscissa. However, for all further analyses we will use the second derivative maximum cycle as Cq value. This supplementary material will not focus on the selection of a certain Cq method. For an objective decision, we would like to guide the reader to the study by (???).

# Use the magrittr package to create pipes
library(magrittr)

# Write a custom function that calculates the Cq values and other curve parameters
library(qpcR)
#> Loading required package: MASS
#> Loading required package: minpack.lm
#> Loading required package: rgl
#> Loading required package: robustbase
#> Loading required package: Matrix
res_fit <- do.call(cbind, lapply(2L:ncol(fdata), function(block_column) {
    
    res_model <- try(mselect(pcrfit(data=fdata, cyc=1, fluo=block_column), verbose=FALSE, do.all=TRUE), silent=TRUE)
    if(res_model %>% class=="try-error") {
        res_model <- NA
    }
    else{
        try(efficiency(res_model, plot=FALSE, type="Cy0"), silent=TRUE)
    }
        }
    )
)
# Assign column names
colnames(res_fit) <- colnames(fdata)[-1]

# Fetch only the Cq values (second derivative maximum) and combine them in a
# data.frame

Cq_SDM <- res_fit[rownames(res_fit)=="cpD2", ] %>% unlist %>% as.data.frame
colnames(Cq_SDM) <- c("Cq")

# Prepare the dilutions and calculated Cq values for further usage in the effcalc
# function from the chipPCR package

dilution <- c(as.factor("ntc"), as.factor("unk"), 10000, 5000, 2500, 1250, 625)
Cq_values <- matrix(Cq_SDM[, "Cq"], nrow=length(dilution), ncol=3, byrow=TRUE)

Inspection and interpretation of the analyzed data

Below, we arbitarily selected a non-template control (negative) and an unknown sample (positive) (two out of 24 amplification curves) for the presentation of the coefficients that were calculated from the raw amplication curve data.

Selected data of calculated values from the fitted models. The values reported in the table were calculated by a chain of functions from the **qpcR** package. In detail, this were the *mselect* function to calculate the optimal multiparametric sigmoid model and the *efficiency* function to calculate the Cq values and other curve parameters. More information about these functions is available from the manual of the **qpcR** package. The samples `A01` (ntc, non-template cntrol) and `A05` (unk, unknown) were arbitrarily selected for presentation. The custom made function assigned a NA (not available) to sample `A01` because no Cq value or other curve parameters could be calculated from the non-template control.
	A01~NTC_RNase P_ntcRNase P~Run001	A05~pop1_RNase P_unknRNase P~Run001
eff	NA	1.01020429599186
resVar	NA	7.27e-06
AICc	NA	-347.935408836544
AIC	NA	-351.435408836544
Rsq	NA	0.999986082849537
Rsq.ad	NA	0.999983552458544
cpD1	NA	32.13
cpD2	NA	28.3
cpE	NA	NA
cpR	NA	NA
cpT	NA	NA
Cy0	NA	24.95
cpCQ	NA	NA
cpMR	NA	NA
fluo	NA	0.669107426735924
init1	NA	0.641678574633849
init2	NA	0.519380971518155
cf	NA	NA

Calculation of the amplication efficiency from calibration data

We now use our calculated Cq values from the dilutions steps (625, 1250, 2500, 5000, 10^{4}) to determine the efficiency from the calibration curve. According to the StepOne software, the amplification efficiency is approx. 93.9%. To confirm these results, the effcalc function was employed to determine the coefficients of the calibration curve.

The following few lines are needed to calculate the amplication efficiency based on the previously calculated Cq values.

# Load the chipPCR package
library(chipPCR)

# Use the effcalc function from the chipPCR package to calculate the amplification
# efficiency and store the results in the object res_efficiency

res_efficiency <- effcalc(dilution[-c(1:3)], Cq_values[-c(1:3), ], logx=TRUE)

# Use the %>% function from the magrittr package to plot the results (res_efficiency) 
# from the effcalc function 

res_efficiency %>% plot(., CI=TRUE, ylab="Cq (SDM)", 
                        main="Second Derivative Maximum Method")

The amplification efficiency estimated with the customized function was 94.7%, which is comparable to the value reported in the stepone_std.rdml file.

# Combine the sample labels and the Cq values as calculate by the Second 
# Derivative Maximum Method (cpD2).

sample_Cq <- data.frame(sample=c("ntc", "unk", 
                                 10000, 5000, 2500, 1250, 625), 
                        Cq_values)

# Print table of all Cq values
# Use the kable function from the knitr package to print a table

knitr::kable(sample_Cq, caption="Cq values as calculate by the Second Derivative 
Maximum Method (cpD2). ntc, non-template control. unk, unknown sample. X1, X2 
and X3 are the Cq values from a triplicate measurement.")

Cq values as calculate by the Second Derivative Maximum Method (cpD2). ntc, non-template control. unk, unknown sample. X1, X2 and X3 are the Cq values from a triplicate measurement.
sample	X1	X2	X3
ntc	NA	NA	NA
unk	28.38	28.30	28.42
10000	27.39	27.38	27.30
5000	26.21	26.20	26.24
2500	27.28	27.32	27.32
1250	28.38	28.41	28.33
625	29.42	29.47	29.54

knitr::kable(res_efficiency, caption="Analysis of the amplification efficiency. 
The table reports the concentration-depentent average Cq values from three 
replicates per dilution step. In addition, the standard deviation (SD) and the 
Coefficient of Variation (RSD [%]) are presented. The results indicate that the 
data basis for the calibration curve is valid.")

Analysis of the amplification efficiency. The table reports the concentration-depentent average Cq values from three replicates per dilution step. In addition, the standard deviation (SD) and the Coefficient of Variation (RSD [%]) are presented. The results indicate that the data basis for the calibration curve is valid.
Concentration	Location (Mean)	Deviation (SD)	Coefficient of Variance (RSD [%])
3.69897	26.21667	0.0208167	0.0007940
3.39794	27.30667	0.0230940	0.0008457
3.09691	28.37333	0.0404145	0.0014244
2.79588	29.47667	0.0602771	0.0020449

The Cq values (28.38, 28.3, 28.42) from the unknown sample unk had an average Cq of 28.37 ± 0.06.