Segmentation of histograms and distribution fitting

Usage

segment_and_fit(
  histogram_obj,
  optima_threshold = 0,
  optima_flat_endpoints = T,
  histogram_count_threshold = 0,
  eps = 1,
  remove_low_entropy = T,
  min_gap_size = 2,
  min_segment_size = 3,
  seed = NULL,
  max_uniform = FALSE,
  uniform_threshold = 0.75,
  uniform_stepsize = 5,
  uniform_max_sd = 0,
  truncated_models = FALSE,
  metric = c("jaccard", "intersection", "ks", "mse", "chisq"),
  distributions = c("norm", "unif", "gamma", "gamma_flip"),
  consensus_method = c("weighted_majority_vote", "rra"),
  metric_weights = rev(seq(1, 2, length.out = length(metric))),
  distribution_prioritization = distributions
)

Arguments

histogram_obj: a Histogram or HistogramList object
optima_threshold: threshold for local optima, i.e. a point can only be considered a local optima if it differs from its neighbour optima by greater than the permitted threshold, default 0
optima_flat_endpoints: in regions of flat density, whether to return the endpoints or the midpoints
histogram_count_threshold: a hard threshold to filter histogram density
eps: numeric (epsilon) hyperparameter to finetune segmentation. See Delon et al, 2005
remove_low_entropy: logical, indicating whether to filter out low entropy regions
min_gap_size: integer, indicating the minimum gap size to be filtered
min_segment_size: integer, indication the minimum segment size, default 3
seed: numeric seed
max_uniform: logical, whether to find a subsegment maximizing the fit of a uniform distribution
uniform_threshold: numeric, indicating the minimum proportion of the subsegment
uniform_stepsize: integer, indicating the stepsize (relative to the histogram bins) to take in the search for the uniform subsegment
uniform_max_sd: numeric, the number of standard deviations of the computed metric distribution away from the optimal uniform which has maximum length
truncated_models: logical, whether to fit truncated distributions
metric: a subset of mle, jaccard, intersection, ks, mse, chisq indicating metrics to use for fit optimization. Metrics should be ordered in descending priority. The first metric in the vector will be used to return the consensus model for the distribution determined through voting.
distributions: a subset of norm, gamma, and unif indicating distributions to fit.
consensus_method: one of weighted_majority_vote and rra as a method of determining the best method
metric_weights: required if method is weighted_majority_voting. weights of each metric to be multiplied by rankings. Weights should be in decreasing order. A higher weight results in a higher priority of the metric.
distribution_prioritization: if method is weighted_majority_voting, a list of ranked distributions, to break ties

Value

a HistogramFit object representing the Histogram and results of the fit

Examples

if (FALSE) {
x = Histogram(c(0, 0, 1, 2, 3, 2, 1, 2, 3, 4, 5, 3, 1, 0))
res = segment_and_fit(x)
}