r-ordpens
|
public |
Selection and/or smoothing of ordinally scaled independent variables using a group lasso or generalized ridge penalty.
|
2025-09-22 |
r-lsa
|
public |
The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.
|
2025-09-22 |
r-uptasticsearch
|
public |
'Elasticsearch' is an open-source, distributed, document-based datastore (<https://www.elastic.co/products/elasticsearch>). It provides an 'HTTP' 'API' for querying the database and extracting datasets, but that 'API' was not designed for common data science workflows like pulling large batches of records and normalizing those documents into a data frame that can be used as a training dataset for statistical models. 'uptasticsearch' provides an interface for 'Elasticsearch' that is explicitly designed to make these data science workflows easy and fun.
|
2025-09-22 |
r-gggrid
|
public |
An extension of 'ggplot2' that makes it easy to add raw 'grid' output, such as customised annotations, to a 'ggplot2' plot.
|
2025-09-22 |
changeforest
|
public |
Classifier based non-parametric change point detection
|
2025-09-22 |
napari-convpaint
|
public |
A plugin for segmentation by pixel classification using convolutional feature extraction
|
2025-09-22 |
r-ddpcr
|
public |
An interface to explore, analyze, and visualize droplet digital PCR (ddPCR) data in R. This is the first non-proprietary software for analyzing two-channel ddPCR data. An interactive tool was also created and is available online to facilitate this analysis for anyone who is not comfortable with using R.
|
2025-09-22 |
r-gfa
|
public |
Factor analysis implementation for multiple data sources, i.e., for groups of variables. The whole data analysis pipeline is provided, including functions and recommendations for data normalization and model definition, as well as missing value prediction and model visualization. The model group factor analysis (GFA) is inferred with Gibbs sampling, and it has been presented originally by Virtanen et al. (2012), and extended in Klami et al. (2015) <DOI:10.1109/TNNLS.2014.2376974> and Bunte et al. (2016) <DOI:10.1093/bioinformatics/btw207>; for details, see the citation info.
|
2025-09-22 |
r-testit
|
public |
Provides two convenience functions assert() and test_pkg() to facilitate testing R packages.
|
2025-09-22 |
multiaddr
|
public |
Python implementation of jbenet's multiaddr
|
2025-09-22 |
r-logistf
|
public |
Fit a logistic regression model using Firth's bias reduction method, equivalent to penalization of the log-likelihood by the Jeffreys prior. Confidence intervals for regression coefficients can be computed by penalized profile likelihood. Firth's method was proposed as ideal solution to the problem of separation in logistic regression, see Heinze and Schemper (2002) <doi:10.1002/sim.1047>. If needed, the bias reduction can be turned off such that ordinary maximum likelihood logistic regression is obtained. Two new modifications of Firth's method, FLIC and FLAC, lead to unbiased predictions and are now available in the package as well, see Puhr et al (2017) <doi:10.1002/sim.7273>.
|
2025-09-22 |
sentencepiece
|
public |
Unsupervised text tokenizer for Neural Network-based text generation.
|
2025-09-22 |
sentencepiece-python
|
public |
Unsupervised text tokenizer for Neural Network-based text generation.
|
2025-09-22 |
libsentencepiece
|
public |
Unsupervised text tokenizer for Neural Network-based text generation.
|
2025-09-22 |
sentencepiece-spm
|
public |
Unsupervised text tokenizer for Neural Network-based text generation.
|
2025-09-22 |
r-ldbounds
|
public |
Computations related to group sequential boundaries. Includes calculation of bounds using the Lan-DeMets alpha spending function approach. Based on FORTRAN program ld98 implemented by Reboussin, et al. (2000) <doi:10.1016/s0197-2456(00)00057-x>.
|
2025-09-22 |
pydoe3
|
public |
Simple, fast, extensible JSON encoder/decoder for Python
|
2025-09-22 |
qgv
|
public |
Interactive Qt graphViz display
|
2025-09-22 |
r-weatherdata
|
public |
Functions that help in fetching weather data from websites. Given a location and a date range, these functions help fetch weather data (temperature, pressure etc.) for any weather related analysis.
|
2025-09-22 |
pydiverse-common
|
public |
Common functionality shared between pydiverse libraries
|
2025-09-22 |
r-multimode
|
public |
Different examples and methods for testing (including different proposals described in Ameijeiras-Alonso et al., 2018 <DOI:10.1007/s11749-018-0611-5>) and exploring (including the mode tree, mode forest and SiZer) the number of modes using nonparametric techniques.
|
2025-09-22 |
azure-mgmt-datamigration
|
public |
Microsoft Azure Data Migration Client Library for Python
|
2025-09-22 |
sendgrid
|
public |
SendGrid library for Python
|
2025-09-22 |
r-eikosograms
|
public |
An eikosogram (ancient Greek for probability picture) divides the unit square into rectangular regions whose areas, sides, and widths, represent various probabilities associated with the values of one or more categorical variates. Rectangle areas are joint probabilities, widths are always marginal (though possibly joint margins, i.e. marginal joint distributions of two or more variates), and heights of rectangles are always conditional probabilities. Eikosograms embed the rules of probability and are useful for introducing elementary probability theory, including axioms, marginal, conditional, and joint probabilities, and their relationships (including Bayes theorem as a completely trivial consequence). They are markedly superior to Venn diagrams for this purpose, especially in distinguishing probabilistic independence, mutually exclusive events, coincident events, and associations. They also are useful for identifying and understanding conditional independence structure. As data analysis tools, eikosograms display categorical data in a manner similar to Mosaic plots, especially when only two variates are involved (the only case in which they are essentially identical, though eikosograms purposely disallow spacing between rectangles). Unlike Mosaic plots, eikosograms do not alternate axes as each new categorical variate (beyond two) is introduced. Instead, only one categorical variate, designated the "response", presents on the vertical axis and all others, designated the "conditioning" variates, appear on the horizontal. In this way, conditional probability appears only as height and marginal probabilities as widths. The eikosogram is therefore much better suited to a response model analysis (e.g. logistic model) than is a Mosaic plot. Mosaic plots are better suited to log-linear style modelling as in discrete multivariate analysis. Of course, eikosograms are also suited to discrete multivariate analysis with each variate in turn appearing as the response. This makes it better suited than Mosaic plots to discrete graphical models based on conditional independence graphs (i.e. "Bayesian Networks" or "BayesNets"). The eikosogram and its superiority to Venn diagrams in teaching probability is described in W.H. Cherry and R.W. Oldford (2003) <https://math.uwaterloo.ca/~rwoldfor/papers/eikosograms/paper.pdf>, its value in exploring conditional independence structure and relation to graphical and log-linear models is described in R.W. Oldford (2003) <https://math.uwaterloo.ca/~rwoldfor/papers/eikosograms/independence/paper.pdf>, and a number of problems, puzzles, and paradoxes that are easily explained with eikosograms are given in R.W. Oldford (2003) <https://math.uwaterloo.ca/~rwoldfor/papers/eikosograms/examples/paper.pdf>.
|
2025-09-22 |
r-catencoders
|
public |
Contains some commonly used categorical variable encoders, such as 'LabelEncoder' and 'OneHotEncoder'. Inspired by the encoders implemented in Python 'sklearn.preprocessing' package (see <http://scikit-learn.org/stable/modules/preprocessing.html>).
|
2025-09-22 |