Skip to contents

Segmentation of histograms and distribution fitting

Usage

segment_and_fit(
  histogram_obj,
  optima_threshold = 0,
  optima_flat_endpoints = T,
  histogram_count_threshold = 0,
  eps = 1,
  remove_low_entropy = T,
  min_gap_size = 2,
  min_segment_size = 3,
  seed = NULL,
  max_uniform = FALSE,
  uniform_threshold = 0.75,
  uniform_stepsize = 5,
  uniform_max_sd = 0,
  truncated_models = FALSE,
  metric = c("jaccard", "intersection", "ks", "mse", "chisq"),
  distributions = c("norm", "unif", "gamma", "gamma_flip"),
  consensus_method = c("weighted_majority_vote", "rra"),
  metric_weights = rev(seq(1, 2, length.out = length(metric))),
  distribution_prioritization = distributions
)

Arguments

histogram_obj

a Histogram or HistogramList object

optima_threshold

threshold for local optima, i.e. a point can only be considered a local optima if it differs from its neighbour optima by greater than the permitted threshold, default 0

optima_flat_endpoints

in regions of flat density, whether to return the endpoints or the midpoints

histogram_count_threshold

a hard threshold to filter histogram density

eps

numeric (epsilon) hyperparameter to finetune segmentation. See Delon et al, 2005

remove_low_entropy

logical, indicating whether to filter out low entropy regions

min_gap_size

integer, indicating the minimum gap size to be filtered

min_segment_size

integer, indication the minimum segment size, default 3

seed

numeric seed

max_uniform

logical, whether to find a subsegment maximizing the fit of a uniform distribution

uniform_threshold

numeric, indicating the minimum proportion of the subsegment

uniform_stepsize

integer, indicating the stepsize (relative to the histogram bins) to take in the search for the uniform subsegment

uniform_max_sd

numeric, the number of standard deviations of the computed metric distribution away from the optimal uniform which has maximum length

truncated_models

logical, whether to fit truncated distributions

metric

a subset of mle, jaccard, intersection, ks, mse, chisq indicating metrics to use for fit optimization. Metrics should be ordered in descending priority. The first metric in the vector will be used to return the consensus model for the distribution determined through voting.

distributions

a subset of norm, gamma, and unif indicating distributions to fit.

consensus_method

one of weighted_majority_vote and rra as a method of determining the best method

metric_weights

required if method is weighted_majority_voting. weights of each metric to be multiplied by rankings. Weights should be in decreasing order. A higher weight results in a higher priority of the metric.

distribution_prioritization

if method is weighted_majority_voting, a list of ranked distributions, to break ties

Value

a HistogramFit object representing the Histogram and results of the fit

Examples

if (FALSE) {
x = Histogram(c(0, 0, 1, 2, 3, 2, 1, 2, 3, 4, 5, 3, 1, 0))
res = segment_and_fit(x)
}