Cluster analysis of a set of variables. Variables can be quantitative, qualitative or a mixture of both.
Solves multivariate least squares (MLS) problems subject to constraints on the coefficients, e.g., non-negativity, orthogonality, equality, inequality, monotonicity, unimodality, smoothness, etc. Includes flexible functions for solving MLS problems subject to user-specified equality and/or inequality constraints, as well as a wrapper function that implements 24 common constraint options. Also does k-fold or generalized cross-validation to tune constraint options for MLS problems. See ten Berge (1993, ISBN:9789066950832) for an overview of MLS problems, and see Goldfarb and Idnani (1983) <doi:10.1007/BF02591962> for a discussion of the underlying quadratic programming algorithm.
Implements the Bayesian calibration model described in Pratola and Chkrebtii (2018) <DOI:10.5705/ss.202016.0403> for stochastic and deterministic simulators. Additive and multiplicative discrepancy models are currently supported. See <http://www.matthewpratola.com/software> for more information and examples.
Calculate p-values and confidence intervals using cluster-adjusted t-statistics (based on Ibragimov and Muller (2010) <DOI:10.1198/jbes.2009.08046>, pairs cluster bootstrapped t-statistics, and wild cluster bootstrapped t-statistics (the latter two techniques based on Cameron, Gelbach, and Miller (2008) <DOI:10.1162/rest.90.3.414>. Procedures are included for use with GLM, ivreg, plm (pooling or fixed effects), and mlogit models.
Single objective optimization using a CMA-ES.
Statistical and biological validation of clustering results. This package implements Dunn Index, Silhouette, Connectivity, Stability, BHI and BSI. Further information can be found in Brock, G et al. (2008) <doi: 10.18637/jss.v025.i04>.
Calculate some statistics aiming to help analyzing the clustering tendency of given data. In the first version, Hopkins statistic is implemented. See Hopkins and Skellam (1954) <doi:10.1093/oxfordjournals.aob.a083391>.
Functions to perform k-prototypes partitioning clustering for mixed variable-type data according to Z.Huang (1998): Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Variables, Data Mining and Knowledge Discovery 2, 283-304.
This is a function for validating microarray clusters via reproducibility, based on the paper referenced below.
Nonparametric rank based tests (rank-sum tests and signed-rank tests) for clustered data, especially useful for clusters having informative cluster size and intra-cluster group size.
We developed the clusterGeneration package to provide functions for generating random clusters, generating random covariance/correlation matrices, calculating a separation index (data and population version) for pairs of clusters or cluster distributions, and 1-D and 2-D projection plots to visualize clusters. The package also contains a function to generate random clusters based on factorial designs with factors such as degree of separation, number of clusters, number of variables, number of noisy variables.
Integrative context-dependent clustering for heterogeneous biomedical datasets. Identifies local clustering structures in related datasets, and a global clusters that exist across the datasets.
One haplotype is a combination of SNP (Single Nucleotide Polymorphisms) within the QTL (Quantitative Trait Loci). clusterhap groups together all individuals of a population with the same haplotype. Each group contains individual with the same allele in each SNP, whether or not missing data. Thus, clusterhap groups individuals, that to be imputed, have a non-zero probability of having the same alleles in the entire sequence of SNP's. Moreover, clusterhap calculates such probability from relative frequencies.
Provides functionality for the analysis of clustered data using the cluster bootstrap.
Central limit theorem experiments presented by data frames or plots. Functions include generating theoretical sample space, corresponding probability, and simulated results as well.
Provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models, including the bias-reduced linearization estimator introduced by Bell and McCaffrey (2002) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2002002/article/9058-eng.pdf> and developed further by Pustejovsky and Tipton (2017) <DOI:10.1080/07350015.2016.1247004>. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple- contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple- contrast hypotheses use an approximation to Hotelling's T-squared distribution. Methods are provided for a variety of fitted models, including lm() and mlm objects, glm(), geeglm() (from package 'geepack'), ivreg() (from package 'AER'), ivreg() (from package 'ivreg' when estimated by ordinary least squares), plm() (from package 'plm'), gls() and lme() (from 'nlme'), lmer() (from `lme4`), robu() (from 'robumeta'), and rma.uni() and rma.mv() (from 'metafor').
A collection of data sets for teaching cluster analysis.
Tests, utilities, and case studies for analyzing significance in clustered binary matched-pair data. The central function clust.bin.pair uses one of several tests to calculate a Chi-square statistic. Implemented are the tests Eliasziw (1991) <doi:10.1002/sim.4780101211>, Obuchowski (1998) <doi:10.1002/(SICI)1097-0258(19980715)17:13%3C1495::AID-SIM863%3E3.0.CO;2-I>, Durkalski (2003) <doi:10.1002/sim.1438>, and Yang (2010) <doi:10.1002/bimj.201000035> with McNemar (1947) <doi:10.1007/BF02295996> included for comparison. The utility functions nested.to.contingency and paired.to.contingency convert data between various useful formats. Thyroids and psychiatry are the canonical datasets from Obuchowski and Petryshen (1989) <doi:10.1016/0165-1781(89)90196-0> respectively.
Defines the classes and functions used to simulate and to analyze data sets describing copy number variants and, optionally, sequencing mutations in order to detect clonal subsets. See Zucker et al. (2019) <doi:10.1093/bioinformatics/btz057>.
Calculates equations commonly used in clinical pharmacokinetics and clinical pharmacology, such as equations for dose individualization, compartmental pharmacokinetics, drug exposure, anthropomorphic calculations, clinical chemistry, and conversion of common clinical parameters. Where possible and relevant, it provides multiple published and peer-reviewed equations within the respective R function.
Interface to the Google Cloud Machine Learning Platform <https://cloud.google.com/ml-engine>, which provides cloud tools for training machine learning models.
Provides means of plots for comparing utilization data of compute systems.
A small subset of Unicode symbols, that are useful when building command line applications. They fall back to alternatives on terminals that do not support Unicode. Many symbols were taken from the 'figures' 'npm' package (see <https://github.com/sindresorhus/figures>).
Tools for assessing data quality, performing exploratory analysis, and semi-automatic preprocessing of messy data with change tracking for integral dataset cleaning.
Simple utility functions to read from and write to the Windows, OS X, and X11 clipboards.
Functions for the quality control, homogenization and missing data filling of climatological series and to obtain climatological summaries and grids from the results. Also functions to display wind-roses, meteograms, Walter&Lieth diagrams, and more.
Functions for calculating clinical significance.
A robust constrained L1 minimization method for estimating a large sparse inverse covariance matrix (aka precision matrix), and recovering its support for building graphical models. The computation uses linear programming. The method was published in TT Cai, W Liu, X Luo (2011) <doi:10.1198/jasa.2011.tm10155>.
Tools to download the climatic data of the Spanish Meteorological Agency (AEMET) directly from R using their API and create scientific graphs (climate charts, trend analysis of climate time series, temperature and precipitation anomalies maps, warming stripes graphics, climatograms, etc.).
Small package to clean the R console and the R environment with the call of just one function.
Climate stability measures are not formalized in the literature and tools for generating stability metrics from existing data are nascent. This package provides tools for calculating climate stability from raster data encapsulating climate change as a series of time slices. The methods follow Owens and Guralnick <doi:10.17161/bi.v14i0.9786> Biodiversity Informatics.
A profile likelihood based method of estimation and inference on the correlation coefficient of bivariate data with different types of censoring and missingness.
Provides an expectation maximization (EM) algorithm to fit a mixture of continuous time Markov models for use with clickstream or other sequence type data. Gallaugher, M.P.B and McNicholas, P.D. (2018) <arXiv:1802.04849>.
Functions to append confidence intervals, prediction intervals, and other quantities of interest to data frames. All appended quantities are for the response variable, after conditioning on the model and covariates. This package has a data frame first syntax that allows for easy piping. Currently supported models include (log-) linear, (log-) linear mixed, generalized linear models, generalized linear mixed models, and accelerated failure time models.
Data cleaning functions for classes logical, factor, numeric, character, currency and Date to make data cleaning fast and easy. Relying on very few dependencies, it provides smart guessing, but with user options to override anything if needed.
A collection of clean 'R Markdown' HTML document templates using classy-looking classless CSS styles. These documents use a minimal set of dependencies but still look great, making them suitable for use a package vignettes or for sharing results via email.
Client for 'CKAN' API (<https://ckan.org/>). Includes interface to 'CKAN' 'APIs' for search, list, show for packages, organizations, and resources. In addition, provides an interface to the 'datastore' API.
Check your R code for some of the most common layout flaws. Many tried to teach us how to write code less dreadful, be it implicitly as B. W. Kernighan and D. M. Ritchie (1988) <ISBN:0-13-110362-8> in 'The C Programming Language' did, be it explicitly as R.C. Martin (2008) <ISBN:0-13-235088-2> in 'Clean Code: A Handbook of Agile Software Craftsmanship' did. So we should check our code for files too long or wide, functions with too many lines, too wide lines, too many arguments or too many levels of nesting. Note: This is not a static code analyzer like pylint or the like. Checkout <https://cran.r-project.org/package=lintr> instead.
Functions to work with data frames to prepare data for further analysis. The functions for imputation, encoding, partitioning, and other manipulation can produce log files to keep track of process.
Get description of images from Clarifai API. For more information, see <http://clarifai.com>. Clarifai uses a large deep learning cloud to come up with descriptive labels of the things in an image. It also provides how confident it is about each of the labels.
Given $p$-dimensional training data containing $d$ groups (the design space), a classification algorithm (classifier) predicts which group new data belongs to. Generally the input to these algorithms is high dimensional, and the boundaries between groups will be high dimensional and perhaps curvilinear or multi-faceted. This package implements methods for understanding the division of space between the groups.
Implementation of the Wilkinson and Ivany (2002) approach to paleoclimate analysis, applied to isotope data extracted from clams.
Performs 'classical' age-depth modelling of dated sediment deposits - prior to applying more sophisticated techniques such as Bayesian age-depth modelling. Any radiocarbon dated depths are calibrated. Age-depth models are constructed by sampling repeatedly from the dated levels, each time drawing age-depth curves. Model types include linear interpolation, linear or polynomial regression, and a range of splines. See Blaauw (2010). <doi:10.1016/j.quageo.2010.01.002>.
An implementation of the 'Chrome DevTools Protocol', for controlling a headless Chrome web browser.
Implements 'Markowitz' Critical Line Algorithm ('CLA') for classical mean-variance portfolio optimization, see Markowitz (1952) <doi:10.2307/2975974>. Care has been taken for correctness in light of previous buggy implementations.
Circular Statistics, from "Topics in Circular Statistics" (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.
Provide step by step guided tours of 'Shiny' applications.
Detection of outliers in circular-circular regression models, modifying its and estimating of models parameters.
Includes functions for the analysis of circular data using distributions based on Nonnegative Trigonometric Sums (NNTS). The package includes functions for calculation of densities and distributions, for the estimation of parameters, for plotting and more.
Circular layout is an efficient way for the visualization of huge amounts of information. Here this package provides an implementation of circular layout generation in R as well as an enhancement of available software. The flexibility of the package is based on the usage of low-level graphics functions such that self-defined high-level graphics can be easily implemented by users for specific purposes. Together with the seamless connection between the powerful computational and visual environment in R, it gives users more convenience and freedom to design figures for better understanding complex patterns behind multiple dimensional data. The package is described in Gu et al. 2014 <doi:10.1093/bioinformatics/btu393>.
