Skip to contents
  1. Description
  2. Installation
  3. Quick start
  4. Resources
  5. Getting help
  6. Citation information
  7. License

Description

HistogramZoo is a generalized framework for histogram segmentation and statistical characterization written in R. Histograms are represented as S3 Histogram or GenomicHistogram objects. For the characterization of genomic data on a linear coordinate system, such as the human reference genome, HistogramZoo provides several convenient data loaders to compute coverage of BED files and load bigWig files, and support both genomic and transcriptomic coordinate systems.

The characteristic function provided by HistogramZoo is segment_and_fit which is a wrapper for a sequence of operations which aim to segment complicated histograms and identify regions of enriched density and fit known statistical distributions to these regions. Using the Fine-to-Coarse segmentation algorithm described in Delon et al., 2006, segmentation occurs iteratively on a set of local optima. The user is provided with the option of filtering low entropy regions, fitting truncated distributions, fine-tuning the endpoints of uniformly-distributed regions and using several goodness-of-fit metrics to improve reported results.

Finally, HistogramZoo provides several convenient functions to summarize fits and visualize data.

overview plot

Installation

Using devtools in R:

library(devtools);
install_github('https://github.com/uclahs-cds/public-R-HistogramZoo'); 

From source: shell script git clone https://github.com/uclahs-cds/public-R-HistogramZoo.git R CMD INSTALL public-R-HistogramZoo

From CRAN (coming soon!)

Quick start

A basic example of applying HistogramZoo to a histogram of Gaussian data is provided below:


library(HistogramZoo);

set.seed(271828);

# Generating Data
gaussian_data <- rnorm(10000, mean = 50, sd = 15);
histogram_data <- observations_to_histogram(gaussian_data, histogram_bin_width=5);

# Segmentation and fitting distributions
results <- segment_and_fit(histogram_data, eps = 1);

# Summarizing results
results_table <- summarize_results(results);

# Plotting
create_coverageplot(results);

Results table

region_id segment_id start end interval_count interval_sizes interval_starts histogram_start histogram_end value metric dist dist_param1 dist_param2 dist_param1_name dist_param2_name
-21-102 1 19 79 1 61, 1, 9 20 0.9554924687 consensus norm 33.03241096 14.88105209 mean sd

Coverage plot coverage plot

Getting help

Looking for guidance or support with HistogramZoo? Look no further.

  • Check out our Discussions page!
  • Submit bugs 🐛, suggest new features 🌸 or see current work 🦾 at our Issues page.

Citation information

You have stumbled upon an unpublished software 🤫 🤫 🤫. We are currently preparing the manuscript for HistogramZoo. Please befriend us to learn more or check back later for updated citation information.

License

Authors: Helen Zhu, Stefan Eng, Paul C. Boutros ()

HistogramZoo is licensed under the GNU General Public License version 3.0. See the file LICENSE.md for the terms of the GNU GPL license.

HistogramZoo is a generalized framework for histogram segmentation and statistical characterization written in R.

Copyright (C) University of California Los Angeles (“Boutros Lab”) All rights reserved.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.