Skip to contents

Calculate actual overlap and corresponding p-value using the hypergeometric distribution as well as the expected overlap.

Usage

calculate.overlap.and.pvalue(list1, list2, total.size, lower.tail = TRUE, adjust = FALSE);

Arguments

list1

Vector representing a query list

list2

Vector representing a reference list being compared to

total.size

Numerical value representing the total size of reference sample pool

lower.tail

logical for calculating p-value; if TRUE (default), probability is P[X <= x], otherwise, P[X > x] unless adjust = TRUE. See phyper

adjust

logical for calculations for lower.tail = FALSE; if FALSE (default), P[X > x], otherwise, P[X >= x]

Details

Calculations are based on using list2 as the basis for comparison but if list1 and list2 come from the same sample pool, it doesn't matter how list1 or list2 are designated. Be careful when interpreting results for lower.tail = FALSE because probability is P[X > x] and not P[X >= x].

Value

Returns a three-element vector containing the actual overlap, expected overlap and p-value

Author

Paul C. Boutros and Denise Mak

See also

Examples

# SNP array example 1
# Total number of SNPs on array is 2000
# Identified 56 special SNPs in group1
# Identified 78 special SNPs in group2
# 10 interesting SNPs are common to both groups
# Question: Is 10 greater or less than what we would expect?

list1 <- paste("SNP", 1:56);
list2 <- paste("SNP", 47:124);
total.size <- 2000;

# Expected overlap is 2.2

# SNP array example 2
# Total number of SNPs on array is 2000
# 500 SNPs on array found on chromosome 1 
# Identified 41 special SNPs
# 5 interesting SNPs found on chromosome 1
# Question: Is 5 over or under-represented on chromosome 1 as compared to
# the proportion of chromosome 1 SNPs on the array?

list1 <- paste("SNP", 1:41);
list2 <- paste("SNP", 37:536);
total.size <- 2000;

# Expected overlap is 10.3

# Calculate if over-represented
calculate.overlap.and.pvalue(
  list1 = list1, 
  list2 = list2,
  total.size = total.size,
  lower.tail = FALSE,
  adjust = TRUE
  );
#> Warning: Calculating P[X >= x]
#> [1]  5.0000000 10.2500000  0.9874854
# Note: calculation is for P[X > x]

# Calculate if under-represented
calculate.overlap.and.pvalue(
  list1 = list1, 
  list2 = list2,
  total.size = total.size,
  lower.tail = TRUE,
  adjust = FALSE
  );
#> [1]  5.00000000 10.25000000  0.03507362
# Note: calculation is for P[X <= x]