This is a package for Non-Negative Linear Models (NNLM). It implements fast sequential coordinate descent algorithms for non-negative linear regression and non-negative matrix factorization (NMF). It supports mean square error and Kullback-Leibler divergence loss. Many other features are also implemented, including missing value imputation, domain knowledge integration, designable W and H matrices and multiple forms of regularizations.
Provides a work-alike to R's POSIXct class which implements 360- and 365-day calendars in addition to the gregorian calendar.
A light-weight package helps you track and visualize the progress of parallel version of vectorized R functions (mc*apply). Parallelization (mc.core > 1) works only on *nix (Linux, Unix such as macOS) system due to the lack of fork() functionality, which is essential for mc*apply, on Windows.
Provides a function for fitting cumulative link, adjacent category, forward and backward continuation ratio, and stereotype ordinal response models when the number of parameters exceeds the sample size, using the the generalized monotone incremental forward stagewise method.
High-dimensional datasets that do not exhibit a clear intrinsic clustered structure pose a challenge to conventional clustering algorithms. For this reason, we developed an unsupervised framework that helps scientists to better subgroup their datasets based on visual cues, please see Gao S, Mutter S, Casey A, Makinen V-P (2018) Numero: a statistical framework to define multivariable subgroups in complex population-based datasets, Int J Epidemiology, dyy113, <doi:10.1093/ije/dyy113>. The framework includes the necessary functions to construct a self-organizing map of the data, to evaluate the statistical significance of the observed data patterns, and to visualize the results.
Implementation of the two error variance estimation methods in high-dimensional linear models of Yu, Bien (2017) <arXiv:1712.02412>.
Allows distance based spatial clustering of georeferenced data by implementing the City Clustering Algorithm - CCA. Multiple versions allow clustering for matrix, raster and single coordinates on a plain (euclidean distance) or on a sphere (great-circle or orthodromic distance).
Provides a simple R interface to the OPUS Miner algorithm (implemented in C++) for finding the top-k productive, non-redundant itemsets from transaction data. The OPUS Miner algorithm uses the OPUS search algorithm to efficiently discover the key associations in transaction data, in the form of self-sufficient itemsets, using either leverage or lift. See <http://i.giwebb.com/index.php/research/association-discovery/> for more information in relation to the OPUS Miner algorithm.
Flexible optimizer with numerous input specifications for detailed parameterisation. Designed for complex loss functions with state and parameter space constraints. Visualization tools for validation and analysis of the convergence are included.
Wraps the Boost odeint library for integration of differential equations.
Get or set UNIX priority (niceness) of running R process.
Model fitting procedures for regression with network cohesion effects, when a network connecting sample individuals is available in a regression problem. In the future, other commonly used statistical models will be added, such as gaussian graphical model.
Functions to patch specials in .dvi files, or entries in .synctex files. Works with "concordance=TRUE" in Sweave or knitr to link sources to previews.
Simple algorithms for circle packing.
A non parametric test for change points detection in the dependence between the components of multivariate data, with or without (multiple) changes in the marginal distributions. The full details, justification and examples are published in Rohmer (2016) <doi:10.1016/j.spl.2016.06.026>.
These routines create multiple imputations of missing at random categorical data, and create multiply imputed synthesis of categorical data, with or without structural zeros. Imputations and syntheses are based on Dirichlet process mixtures of multinomial distributions, which is a non-parametric Bayesian modeling approach that allows for flexible joint modeling.
Implements methods for centrality related analyses of networks. While the package includes the possibility to build more than 20 indices, its main focus lies on index-free assessment of centrality via partial rankings obtained by neighborhood-inclusion or positional dominance. These partial rankings can be analyzed with different methods, including probabilistic methods like computing expected node ranks and relative rank probabilities (how likely is it that a node is more central than another?). The methodology is described in depth in the vignettes and in Schoch (2018) <doi:10.1016/j.socnet.2017.12.003>.
This package contains functions to carry out high throughput data analysis and to conduct data set comparisons. Similarity matrices from high throughput phenotypic data containing uninformative (e.g. wild type) or missing data can be calculated to report similarity of response. A suite of graph comparisons using an adjacency or correlation matrix format are included to facilitate quick network analysis.
An efficient unified nonconvex penalized estimation algorithm for Gaussian (linear), binomial Logit (logistic), Poisson, multinomial Logit, and Cox proportional hazard regression models. The unified algorithm is implemented based on the convex concave procedure and the algorithm can be applied to most of the existing nonconvex penalties. The algorithm also supports convex penalty: least absolute shrinkage and selection operator (LASSO). Supported nonconvex penalties include smoothly clipped absolute deviation (SCAD), minimax concave penalty (MCP), truncated LASSO penalty (TLP), clipped LASSO (CLASSO), sparse ridge (SRIDGE), modified bridge (MBRIDGE) and modified log (MLOG). For high-dimensional data (data set with many variables), the algorithm selects relevant variables producing a parsimonious regression model. Kim, D., Lee, S. and Kwon, S. (2018) <arXiv:1811.05061>, Lee, S., Kwon, S. and Kim, Y. (2016) <doi:10.1016/j.csda.2015.08.019>, Kwon, S., Lee, S. and Kim, Y. (2015) <doi:10.1016/j.csda.2015.07.001>. (This research is funded by Julian Virtue Professorship from Center for Applied Research at Pepperdine Graziadio Business School and the National Research Foundation of Korea.)
Implements Python-style zip for R. Is a more flexible version of cbind.
Analyze count time series with excess zeros. Two types of statistical models are supported: Markov regression by Yang et al. (2013) <doi:10.1016/j.stamet.2013.02.001> and state-space models by Yang et al. (2015) <doi:10.1177/1471082X14535530>. They are also known as observation-driven and parameter-driven models respectively in the time series literature. The functions used for Markov regression or observation-driven models can also be used to fit ordinary regression models with independent data under the zero-inflated Poisson (ZIP) or zero-inflated negative binomial (ZINB) assumption. Besides, the package contains some miscellaneous functions to compute density, distribution, quantile, and generate random numbers from ZIP and ZINB distributions.
This collection of data exploration tools was developed at Yale University for the graphical exploration of complex multivariate data; barcode and gpairs now have their own packages. The new big.read.table() provided here may be useful for large files when only a subset is needed.
Provides an alternative canonical correlation/redundancy analysis function, with associated print, plot, and summary methods. A method for generating helio plots is also included.
Can be used for coloring output in terminals. It was developed for the standard Ubuntu terminal but should be compatible with any terminal using xterm or ANSI escape sequences. If run in windows, RStudio, or any other platform not supporting such escape sequences it gracefully passes on any output without modifying it.
Shows the relationship between an independent and dependent variable through Weight of Evidence and Information Value.
This is a set of minimization tools (maximum likelihood estimation and least square fitting) to solve examples in the Johan Gabrielsson and Dan Weiner's book "Pharmacokinetic and Pharmacodynamic Data Analysis - Concepts and Applications" 5th ed. (ISBN:9198299107). Examples include linear and nonlinear compartmental model, turn-over model, single or multiple dosing bolus/infusion/oral models, allometry, toxicokinetics, reversible metabolism, in-vitro/in-vivo extrapolation, enterohepatic circulation, metabolite modeling, Emax model, inhibitory model, tolerance model, oscillating response model, enantiomer interaction model, effect compartment model, drug-drug interaction model, receptor occupancy model, and rebound phenomena model.
Functions for easily creating interactive web pages using 'R Markdown' that students can use in self-guided learning.
Installs specified versions of R packages hosted on CRAN and provides functions to list available versions and the versions of currently installed packages. These tools can be used to help make R projects and packages more reproducible. 'versions' fits in the narrow gap between the 'devtools' install_version() function and the 'checkpoint' package. devtools::install_version() installs a stated package version from source files stored on the CRAN archives. However CRAN does not store binary versions of packages so Windows users need to have RTools installed and Windows and OSX users get longer installation times. 'checkpoint' uses the Revolution Analytics MRAN server to install packages (from source or binary) as they were available on a given date. It also provides a helpful interface to detect the packages in use in a directory and install all of those packages for a given date. 'checkpoint' doesn't provide install.packages-like functionality however, and that's what 'versions' aims to do, by querying MRAN. As MRAN only goes back to 2014-09-17, 'versions' can't install packages archived before this date.
The base 'sets' tools follow the algebraic definition that each element of a set must be unique. Since it's often helpful to compare all elements of two vectors, this toolset treats every element as unique for counting purposes. For ease of use, all functions in vecsets have an argument 'multiple' which, when set to FALSE, reverts them to the base::set tools functionality.
Forms a query to submit for US Treasury yield curve data, posting this query to the US Treasury web site's data feed service. By default the download includes data yield data for 12 products from January 1, 1990, some of which are NA during this span. The caller can pass parameters to limit the query to a certain year or year and month, but the full download is not especially large. The download data from the service is in XML format. The package's main function transforms that XML data into a numeric data frame with treasury product items (constant maturity yields for 12 kinds of bills, notes, and bonds) as columns and dates as row names. The function returns a list which includes an item for this data frame as well as query-related values for reference and the update date from the service.
zm(), called with any active plot allow to enter an interactive session to zoom/navigate any plot. The development version, as well as binary releases can be found at https://github.com/cbarbu/R-package-zoom
Inference procedures accommodate a flexible range of hazard ratio patterns with a two-sample semi-parametric model. This model contains the proportional hazards model and the proportional odds model as sub-models, and accommodates non-proportional hazards situations to the extreme of having crossing hazards and crossing survivor functions. Overall, this package has four major functions: 1) the parameter estimation, namely short-term and long-term hazard ratio parameters; 2) 95 percent and 90 percent point-wise confidence intervals and simultaneous confidence bands for the hazard ratio function; 3) p-value of the adaptive weighted log-rank test; 4) p-values of two lack-of-fit tests for the model. See the included "read_me_first.pdf" for brief instructions. In this version (1.1), there is no need to sort the data before applying this package.
Provides data from the United Nation's World Population Prospects 2019.
Provides data from the United Nation's World Population Prospects 2015.
Implements an automated binning of numeric variables and factors with respect to a dichotomous target variable. Two approaches are provided: An implementation of fine and coarse classing that merges granular classes and levels step by step. And a tree-like approach that iteratively segments the initial bins via binary splits. Both procedures merge, respectively split, bins based on similar weight of evidence (WOE) values and stop via an information value (IV) based criteria. The package can be used with single variables or an entire data frame. It provides flexible tools for exploring different binning solutions and for deploying them to (new) data.
Provides insight into how the best hand for a poker game changes based on the game dealt, players who stay in until the showdown and wildcards added to the base game. At this time the package does not support player tactics, so draw poker variants are not included.
The german Wikibook "GNU R" introduces R to new users. This package is a collection of functions and datas used in the german WikiBook "GNU R"
Builds complex plots, heatmaps in particular, using natural semantics. Bigger plots can be assembled using directives such as 'LeftOf', 'RightOf', 'TopOf', and 'Beneath' and more. Other features include clustering, dendrograms and integration with 'ggplot2' generated grid objects. This package is particularly designed for bioinformaticians to assemble complex plots for publication.
Implementation of Johansen's general formulation of Welch-James's statistic with Approximate Degrees of Freedom, which makes it suitable for testing any linear hypothesis concerning cell means in univariate and multivariate mixed model designs when the data pose non-normality and non-homogeneous variance. Some improvements, namely trimmed means and Winsorized variances, and bootstrapping for calculating an empirical critical value, have been added to the classical formulation. The code departs from a previous SAS implementation by L.M. Lix and H.J. Keselman, available at <http://supp.apa.org/psycarticles/supplemental/met_13_2_110/SAS_Program.pdf> and published in Keselman, H.J., Wilcox, R.R., and Lix, L.M. (2003) <DOI:10.1111/1469-8986.00060>.
Provides a convenient interface for constructing plots to visualize the fit of regression models arising from a wide variety of models in R ('lm', 'glm', 'coxph', 'rlm', 'gam', 'locfit', 'lmer', 'randomForest', etc.)
Creates Vertex Similarity matrix of an undirected graph based on the method stated by E. A. Leicht, Petter Holme, AND M. E. J. Newman in their paper <DOI:10.1103/PhysRevE.73.026120>.
The 'Vega-Lite' 'JavaScript' framework provides a higher-level grammar for visual analysis, akin to 'ggplot' or 'Tableau', that generates complete 'Vega' specifications. Functions exist which enable building a valid 'spec' from scratch or importing a previously created 'spec' file. Functions also exist to export 'spec' files and to generate code which will enable plots to be embedded in properly configured web pages. The default behavior is to generate an 'htmlwidget'.
Abstract descriptions of (yet) unobserved variables.
Tool for easy and efficient discretization of continuous and categorical data. The package calculates the most optimal binning of a given explanatory variable with respect to a user-specified target variable. The purpose is to assign a unique Weight-of-Evidence value to each of the calculated binpoints in order to recode the original variable. The package allows users to impose certain restrictions on the functional form on the resulting binning while maximizing the overall information value in the original data. The package is well suited for logistic scoring models where input variables may be subject to restrictions such as linearity by e.g. regulatory authorities. An excellent source describing in detail the development of scorecards, and the role of Weight-of-Evidence coding in credit scoring is (Siddiqi 2006, ISBN: 978–0-471–75451–0). The package utilizes the discrete nature of decision trees and Isotonic Regression to accommodate the trade-off between flexible functional forms and maximum information value.
Provides a number of functions to facilitate extracting information in 'YAML' fragments from one or multiple files, optionally structuring the information in a 'data.tree'. 'YAML' (recursive acronym for "YAML ain't Markup Language") is a convention for specifying structured data in a format that is both machine- and human-readable. 'YAML' therefore lends itself well for embedding (meta)data in plain text files, such as Markdown files. This principle is implemented in 'yum' with minimal dependencies (i.e. only the 'yaml' packages, and the 'data.tree' package can be used to enable additional functionality).
A collection of string functions designed for writing compact and expressive R code. 'yasp' (Yet Another String Package) is simple, fast, dependency-free, and written in pure R. The package provides: a coherent set of abbreviations for paste() from package 'base' with a variety of defaults, such as p() for "paste" and pcc() for "paste and collapse with commas"; wrap(), bracket(), and others for wrapping a string in flanking characters; unwrap() for removing pairs of characters (at any position in a string); and sentence() for cleaning whitespace around punctuation and capitalization appropriate for prose sentences.
A parser and a writer for 'WEKA' Attribute-Relation File Format <https://waikato.github.io/weka-wiki/arff_stable/> in pure R, with no dependencies. As opposed to other R implementations, this package can read standard (dense) as well as sparse files, i.e. those where each row does only contain nonzero components. Unlike 'RWeka', 'yarr' does not require any 'Java' installation nor is dependent on external software. This implementation is generalized from those in packages 'mldr' and 'mldr.datasets'.
Provides data from the United Nation's World Population Prospects 2017.
Supplies permutation-test alternatives to traditional hypothesis-test procedures such as two-sample tests for means, medians, and standard deviations; correlation tests; tests for homogeneity and independence; and more. Suitable for general audiences, including individual and group users, introductory statistics courses, and more advanced statistics courses that desire an introduction to permutation tests.
Creates messages containing random facts from the Wikipedia homepage. Intended to keep users interested during long waiting periods.
