Segmentation of histograms and distribution fitting
Usage
segment_and_fit(
histogram_obj,
optima_threshold = 0,
optima_flat_endpoints = T,
histogram_count_threshold = 0,
eps = 1,
remove_low_entropy = T,
min_gap_size = 2,
min_segment_size = 3,
seed = NULL,
max_uniform = FALSE,
uniform_threshold = 0.75,
uniform_stepsize = 5,
uniform_max_sd = 0,
truncated_models = FALSE,
metric = c("jaccard", "intersection", "ks", "mse", "chisq"),
distributions = c("norm", "unif", "gamma", "gamma_flip"),
consensus_method = c("weighted_majority_vote", "rra"),
metric_weights = rev(seq(1, 2, length.out = length(metric))),
distribution_prioritization = distributions
)
Arguments
- histogram_obj
a Histogram or HistogramList object
- optima_threshold
threshold for local optima, i.e. a point can only be considered a local optima if it differs from its neighbour optima by greater than the permitted threshold, default 0
- optima_flat_endpoints
in regions of flat density, whether to return the endpoints or the midpoints
- histogram_count_threshold
a hard threshold to filter histogram density
- eps
numeric (epsilon) hyperparameter to finetune segmentation. See
Delon et al, 2005
- remove_low_entropy
logical, indicating whether to filter out low entropy regions
- min_gap_size
integer, indicating the minimum gap size to be filtered
- min_segment_size
integer, indication the minimum segment size, default 3
- seed
numeric seed
- max_uniform
logical, whether to find a subsegment maximizing the fit of a uniform distribution
- uniform_threshold
numeric, indicating the minimum proportion of the subsegment
- uniform_stepsize
integer, indicating the stepsize (relative to the histogram bins) to take in the search for the uniform subsegment
- uniform_max_sd
numeric, the number of standard deviations of the computed metric distribution away from the optimal uniform which has maximum length
- truncated_models
logical, whether to fit truncated distributions
- metric
a subset of
mle
,jaccard
,intersection
,ks
,mse
,chisq
indicating metrics to use for fit optimization. Metrics should be ordered in descending priority. The first metric in the vector will be used to return theconsensus
model for the distribution determined through voting.- distributions
a subset of
norm
,gamma
, andunif
indicating distributions to fit.- consensus_method
one of
weighted_majority_vote
andrra
as a method of determining the best method- metric_weights
required if
method
isweighted_majority_voting
. weights of each metric to be multiplied by rankings. Weights should be in decreasing order. A higher weight results in a higher priority of the metric.- distribution_prioritization
if
method
isweighted_majority_voting
, a list of ranked distributions, to break ties