Suchergebnisse
Filter
3334 Ergebnisse
Sortierung:
SSRN
Working paper
SSRN
Bayesian Gaussian Copula Factor Models for Mixed Data
Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.1
BASE
An Integrated Purchase Model Using Gaussian Copula
In: Behaviormetrika, Band 41, Heft 2, S. 147-167
ISSN: 1349-6964
Differentially Private Release of Datasets using Gaussian Copula
In: Journal of privacy and confidentiality, Band 10, Heft 2
ISSN: 2575-8527
We propose a generic mechanism to efficiently release differentially private synthetic versions of high-dimensional datasets with high utility. The core technique in our mechanism is the use of copulas, which are functions representing dependencies among random variables with a multivariate distribution. Specifically, we use the Gaussian copula to define dependencies of attributes in the input dataset, whose rows are modelled as samples from an unknown multivariate distribution, and then sample synthetic records through this copula. Despite the inherently numerical nature of Gaussian correlations we construct a method that is applicable to both numerical and categorical attributes alike. Our mechanism is efficient in that it only takes time proportional to the square of the number of attributes in the dataset. We propose a differentially private way of constructing the Gaussian copula without compromising computational efficiency. Through experiments on three real-world datasets, we show that we can obtain highly accurate answers to the set of all one-way marginal, and two-and three-way positive conjunction queries, with 99% of the query answers having absolute (fractional) error rates between 0.01 to 3%. Furthermore, for a majority of two-way and three-way queries, we outperform independent noise addition through the well-known Laplace mechanism. In terms of computational time we demonstrate that our mechanism can output synthetic datasets in around 6 minutes 47 seconds on average with an input dataset of about 200 binary attributes and more than 32,000 rows, and about 2 hours 30 mins to execute a much larger dataset of about 700 binary attributes and more than 5 million rows. To further demonstrate scalability, we ran the mechanism on larger (artificial) datasets with 1,000 and 2,000 binary attributes (and 5 million rows) obtaining synthetic outputs in approximately 6 and 19 hours, respectively. These are highly feasible times for synthetic datasets, which are one-off releases.
Matching a correlation coefficient by a Gaussian copula
In: Communications in statistics. Theory and methods, Band 48, Heft 7, S. 1728-1747
ISSN: 1532-415X
Semiparametric Estimation for the Additive Inverse Gaussian Frailty Model
In: Communications in statistics. Theory and methods, Band 41, Heft 12, S. 2269-2278
ISSN: 1532-415X
A Theoretical Argument Why the t-Copula Explains Credit Risk Contagion Better than the Gaussian Copula
In: Advances in decision sciences, Band 2010, S. 1-29
ISSN: 2090-3367
One of the key questions in credit dependence modelling is the specfication of the copula function linking the marginals of default variables. Copulae functions are important because they allow to decouple statistical inference into two parts: inference of the marginals and inference of the dependence. This is particularly important in the area of credit risk where information on dependence is scant. Whereas the techniques to estimate the parameters of the copula function seem to be fairly well established, the choice of the copula function is still an open problem. We find out by simulation that the t-copula naturally arises from a structural model of credit risk, proposed by Cossin and Schellhorn (2007). If revenues are linked by a Gaussian copula, we demonstrate that the t-copula provides a better fit to simulations than does a Gaussian copula. This is done under various specfications of the marginals and various configurations of the network. Beyond its quantitative importance, this result is qualitatively intriguing. Student's t-copulae induce fatter (joint) tails than Gaussian copulae ceteris paribus. On the other hand observed credit spreads have generally fatter joint tails than the ones implied by the Gaussian distribution. We thus provide a new statistical explanation why (i) credit spreads have fat joint tails, and (ii) financial crises are amplified by network effects.
Semiparametric quantity regression for complete or Censored data ; Semiparametric copula quantile regression for complete or censored data
When facing multivariate covariates, general semiparametric regression techniques come at hand to propose flexible models that are unexposed to the curse of dimensionality. In this work a semiparametric copula-based estimator for conditional quantiles is investigated for both complete or right-censored data. In spirit, the methodology is extending the recent work of Noh, El Ghouch and Bouezmarni [34] and Noh, El Ghouch and Van Keilegom [35], as the main idea consists in appropriately defining the quantile regression in terms of a multivariate copula and marginal distributions. Prior estimation of the latter and simple plug-in lead to an easily implementable estimator expressed, for both contexts with or without censoring, as a weighted quantile of the observed response variable. In addition, and contrary to the initial suggestion in the literature, a semiparametric estimation scheme for the multivariate copula density is studied, motivated by the possible shortcomings of a purely parametric approach and driven by the regression context. The resulting quantile regression estimator has the valuable property of being automatically monotonic across quantile levels. Additionally, the copula-based approach allows the analyst to spontaneously take account of common regression concerns such as interactions between covariates or possible transformations of the latter. From a theoretical prospect, asymptotic normality for both complete and censored data is obtained under classical regularity conditions. Finally, numerical examples as well as a real data application are used to illustrate the validity and finite sample performance of the proposed procedure.
BASE
SSRN
Working paper
Copula Gaussian graphical models with penalized ascent Monte Carlo EM algorithm
In: Statistica Neerlandica: journal of the Netherlands Society for Statistics and Operations Research, Band 69, Heft 4, S. 419-441
ISSN: 1467-9574
Typical data that arise from surveys, experiments, and observational studies include continuous and discrete variables. In this article, we study the interdependence among a mixed (continuous, count, ordered categorical, and binary) set of variables via graphical models. We propose an ℓ1‐penalized extended rank likelihood with an ascent Monte Carlo expectation maximization approach for the copula Gaussian graphical models and establish near conditional independence relations and zero elements of a precision matrix. In particular, we focus on high‐dimensional inference where the number of observations are in the same order or less than the number of variables under consideration. To illustrate how to infer networks for mixed variables through conditional independence, we consider two datasets: one in the area of sports and the other concerning breast cancer.
Maximum likelihood estimation of Gaussian copula models for geostatistical count data
In: Communications in statistics. Simulation and computation, Band 49, Heft 8, S. 1957-1981
ISSN: 1532-4141
Corrigendum to "A Theoretical Argument Why the t-Copula Explains Credit Risk Contagion Better than the Gaussian Copula"
In: Advances in decision sciences, Band 2016, S. 1-1
ISSN: 2090-3367
A transition model for analyzing multivariate longitudinal data using Gaussian copula approach
In: Advances in statistical analysis: AStA, Band 104, Heft 2, S. 169-223
ISSN: 1863-818X