vignettes/usingRDML_with_other_R_packages.Rmd
usingRDML_with_other_R_packages.Rmd
A key benefit of the RMDL package is that it enables further statistical analysis of RDML data. This section describes R packages that link with the RMDL package. It will provide examples on how other packages allow the manipulation of RDML objects.
In this example we will employ the qpcR package (???) to calculate Cq values after the selection of an optimal sigmoidal model, as suggested by the Akaike’s An Information Criterion. The Cq values will then be used to calculate the amplification efficiency from a calibration curve, based on the effcalc
function of the chipPCR package (???).
Note: The data used here serve as example only. Overall, the quality of the measurement is not appropriate for further application in a study. Here, we show that the estimated amplification efficiency from the StepOne system differs from that of the proposed pipeline (RDML \(\rightarrow\) qpcR \(\rightarrow\) chipPCR).
In principle, the following code examples can be combined with report generating toolkits for data analysis pipelines, such as Nozzle (see (???)).
In the RDML vignette, it was demonstrated how to read-in RDML data. In this section we will continue with the built-in RDML example file stepone_std.rdml
. This file was obtained during the measurement of human DNA concentration by a LightCycler 96 instrument (Roche) and the XY-Detect kit (Syntol, Russia). The file is opened as described before:
# Load the RDML package and use its functions to `extract` the required data library(RDML) filename <- system.file("extdata/stepone_std.rdml", package="RDML") raw_data <- RDML$new(filename=filename)
At this stage, all data from the stepone_std.rdml
RDML file are available. For example, the amplification efficiency as provided by the StepOne™ Real-Time PCR System software can be fetched from the package via raw_data$target[["RNase P"]]$amplificationEfficiency
. In another example, we extract the information about the target in this experiment.
raw_data$target #> $`RNase P` #> id: ~ idType #> description: NFQ-MGB #> documentation: [] #> xRef: [] #> type: ~ targetTypeType #> amplificationEfficiencyMethod: #> amplificationEfficiency: 93.91181 #> amplificationEfficiencySE: #> detectionLimit: #> dyeId: ~ idReferencesType #> sequences: ~ sequencesType #> commercialAssay:
To get some information about the run we can use:
raw_data$experiment[["Standard Curve Example"]]$run #> $Run001 #> id: ~ idType #> description: #> documentation: [] #> experimenter: [] #> instrument: Applied Biosystems StepOne™ Instrument #> dataCollectionSoftware: ~ dataCollectionSoftwareType #> backgroundDeterminationMethod: Background Subtraction #> cqDetectionMethod: ~ cqDetectionMethodType #> thermalCyclingConditions: ~ idReferencesType #> pcrFormat: ~ pcrFormatType #> runDate: 2006-11-10T09:24:39.265 #> react: [1, 2, 3, 4, 5, 6, 7, 8, 13, 14, 15, 16, 17, 18, 19, 20, 25, 26, 27, 28, 29, 30, 31, 32]
For convenience, we use the pipe function %>%
from the magrittr package for further analysis. In the following, we fetch the amplification curve data from the RDML file.
raw_data_tab <- raw_data$AsTable( # Custom name pattern 'position~sample~sample.type~target~run.id' name.pattern=paste( react$position, react$sample$id, private$.sample[[react$sample$id]]$type$value, data$tar$id, run$id$id, # run id added to names sep="~")) # Get all fluorescence data and assign them to the object fdata fdata <- as.data.frame(raw_data$GetFData(raw_data_tab, long.table=FALSE))
The plotting of the raw data is an important step to visually inspect the data. In this example, we use the ggplot2 package (???) instead of the default R graphics functions.
# Load the ggplot2 package for plotting library(ggplot2) # Load the reshape2 package to rearrange the data library(reshape2) # Rearrange and plot the raw data fdata_gg <- melt(fdata, id.vars="cyc") ggplot(data=fdata_gg, aes(x=cyc, y=value, color=variable)) + geom_line() + labs(x="Cycle", y="RFU") + theme_light() + theme(legend.position="top", legend.direction="horizontal")
During the next steps we will employ the qpcR package. The function mselect
performs sigmoid model selection by different criteria (e.g., bias-corrected Akaike’s Information Criterion). Note: In most cases a sigmoidal model with seven parameters was selected.
The function efficiency
calculates the qPCR Cq values, amplification efficiency and other important qPCR parameters. In this example, we set the parameter type of the efficiency
function to Cy0
, which will calculate and report the Cy0 value. According to (???), the Cy0 value is the intersection of a tangent on the first derivative maximum with the abscissa. However, for all further analyses we will use the second derivative maximum cycle as Cq value. This supplementary material will not focus on the selection of a certain Cq method. For an objective decision, we would like to guide the reader to the study by (???).
# Use the magrittr package to create pipes library(magrittr) # Write a custom function that calculates the Cq values and other curve parameters library(qpcR) #> Loading required package: MASS #> Loading required package: minpack.lm #> Loading required package: rgl #> Loading required package: robustbase #> Loading required package: Matrix res_fit <- do.call(cbind, lapply(2L:ncol(fdata), function(block_column) { res_model <- try(mselect(pcrfit(data=fdata, cyc=1, fluo=block_column), verbose=FALSE, do.all=TRUE), silent=TRUE) if(res_model %>% class=="try-error") { res_model <- NA } else{ try(efficiency(res_model, plot=FALSE, type="Cy0"), silent=TRUE) } } ) ) # Assign column names colnames(res_fit) <- colnames(fdata)[-1] # Fetch only the Cq values (second derivative maximum) and combine them in a # data.frame Cq_SDM <- res_fit[rownames(res_fit)=="cpD2", ] %>% unlist %>% as.data.frame colnames(Cq_SDM) <- c("Cq") # Prepare the dilutions and calculated Cq values for further usage in the effcalc # function from the chipPCR package dilution <- c(as.factor("ntc"), as.factor("unk"), 10000, 5000, 2500, 1250, 625) Cq_values <- matrix(Cq_SDM[, "Cq"], nrow=length(dilution), ncol=3, byrow=TRUE)
Below, we arbitarily selected a non-template control (negative) and an unknown sample (positive) (two out of 24 amplification curves) for the presentation of the coefficients that were calculated from the raw amplication curve data.
A01~NTC_RNase PntcRNase P~Run001 | A05~pop1_RNase PunknRNase P~Run001 | |
---|---|---|
eff | NA | 1.01020429599186 |
resVar | NA | 7.27e-06 |
AICc | NA | -347.935408836544 |
AIC | NA | -351.435408836544 |
Rsq | NA | 0.999986082849537 |
Rsq.ad | NA | 0.999983552458544 |
cpD1 | NA | 32.13 |
cpD2 | NA | 28.3 |
cpE | NA | NA |
cpR | NA | NA |
cpT | NA | NA |
Cy0 | NA | 24.95 |
cpCQ | NA | NA |
cpMR | NA | NA |
fluo | NA | 0.669107426735924 |
init1 | NA | 0.641678574633849 |
init2 | NA | 0.519380971518155 |
cf | NA | NA |
We now use our calculated Cq values from the dilutions steps (625, 1250, 2500, 5000, 10^{4}) to determine the efficiency from the calibration curve. According to the StepOne software, the amplification efficiency is approx. 93.9%. To confirm these results, the effcalc
function was employed to determine the coefficients of the calibration curve.
The following few lines are needed to calculate the amplication efficiency based on the previously calculated Cq values.
# Load the chipPCR package library(chipPCR) # Use the effcalc function from the chipPCR package to calculate the amplification # efficiency and store the results in the object res_efficiency res_efficiency <- effcalc(dilution[-c(1:3)], Cq_values[-c(1:3), ], logx=TRUE) # Use the %>% function from the magrittr package to plot the results (res_efficiency) # from the effcalc function res_efficiency %>% plot(., CI=TRUE, ylab="Cq (SDM)", main="Second Derivative Maximum Method")
The amplification efficiency estimated with the customized function was 94.7%, which is comparable to the value reported in the stepone_std.rdml
file.
# Combine the sample labels and the Cq values as calculate by the Second # Derivative Maximum Method (cpD2). sample_Cq <- data.frame(sample=c("ntc", "unk", 10000, 5000, 2500, 1250, 625), Cq_values) # Print table of all Cq values # Use the kable function from the knitr package to print a table knitr::kable(sample_Cq, caption="Cq values as calculate by the Second Derivative Maximum Method (cpD2). ntc, non-template control. unk, unknown sample. X1, X2 and X3 are the Cq values from a triplicate measurement.")
sample | X1 | X2 | X3 |
---|---|---|---|
ntc | NA | NA | NA |
unk | 28.38 | 28.30 | 28.42 |
10000 | 27.39 | 27.38 | 27.30 |
5000 | 26.21 | 26.20 | 26.24 |
2500 | 27.28 | 27.32 | 27.32 |
1250 | 28.38 | 28.41 | 28.33 |
625 | 29.42 | 29.47 | 29.54 |
knitr::kable(res_efficiency, caption="Analysis of the amplification efficiency. The table reports the concentration-depentent average Cq values from three replicates per dilution step. In addition, the standard deviation (SD) and the Coefficient of Variation (RSD [%]) are presented. The results indicate that the data basis for the calibration curve is valid.")
Concentration | Location (Mean) | Deviation (SD) | Coefficient of Variance (RSD [%]) |
---|---|---|---|
3.69897 | 26.21667 | 0.0208167 | 0.0007940 |
3.39794 | 27.30667 | 0.0230940 | 0.0008457 |
3.09691 | 28.37333 | 0.0404145 | 0.0014244 |
2.79588 | 29.47667 | 0.0602771 | 0.0020449 |
The Cq values (28.38, 28.3, 28.42) from the unknown sample unk
had an average Cq of 28.37 ± 0.06.