Skip to contents

Assign patients to four prostate cancer DNA methylation subtypes

Usage

estimate.subtypes(
  methy.data,
  subtype.model = "RF",
  prop.missing.cutoff = 0.3,
  pamr.impute.using.all.cpgs = TRUE,
  seed = 123
)

Arguments

methy.data

A data.frame with patients as rows (rownames give patient ids) and column names give CpG ids.

subtype.model

Which subtype model to use ('PAMR' or 'RF' for random forest). Although slower, we recommend 'RF' for its increased accuracy and intrinsic imputation for missing values. Further, if some of the required CpGs are completely missing, then you must use 'RF'.

prop.missing.cutoff

The maximum proportion of missing values allowed for each required CpG.

pamr.impute.using.all.cpgs

If using subtype.model = 'PAMR', should imputation be done using all CpGs in methy.data (TRUE) or only the CpGs required by subtype.model.pamr (FALSE). When TRUE, imputation will be slower and use more memory, but should be more accurate.

seed

integer seed used for imputation.

Value

  • subtypes: data.frame with the estimated subtypes and sample IDs (rownames of methy.data)

  • validation: output from validate.subtype.model.cpgs to check if methy.data contains the required CpGs and whether any CpGs have high missingness.

Examples

### example CpG data
data('example.data');

subtypes <- estimate.subtypes(example.data);

# estimated subtypes
head(subtypes$subtypes);
#>              subtype
#> TCGA-CH-5739    MS-3
#> TCGA-HC-7079    MS-2

# validation results:
# length(subtypes$validation$required.cpgs)
# length(subtypes$validation$required.cpgs.with.high.missing)
# length(subtypes$validation$missing.cpgs)