Compressed principal component analysis of non-Gaussian vectors

Marc Mignolet; Christian Soize

doi:10.1137/20M1322029

Article Dans Une Revue SIAM/ASA Journal on Uncertainty Quantification Année : 2020

Compressed principal component analysis of non-Gaussian vectors

(1) , (2)

1
2

Marc Mignolet

Fonction : Auteur

Arizona State University [Tempe]

Christian Soize

Fonction : Auteur
PersonId : 4240
IdHAL : christian-soize
ORCID : 0000-0002-1083-6771

Laboratoire Modélisation et Simulation Multi-Echelle

Résumé

A novel approximate representation of non-Gaussian random vectors is introduced and validated, which can be viewed as a Compressed Principal Component Analysis (CPCA). This representation relies on the eigenvectors of the covariance matrix obtained as in a Principal Component Analysis (PCA) but expresses the random vector as a linear combination of a random sample of N of these eigenvectors. In this model, the indices of these eigenvectors are independent discrete random variables with probabilities proportional to the corresponding eigenvalues. Moreover, the coefficients of the linear combination are zero mean unit variance random variables. Under these conditions, it is first shown that the covariance matrix of this CPCA matches exactly its PCA counterpart independently of the value of N. Next, it is also shown that the distribution of the random coefficients can be selected, without loss of generality, to be a symmetric function. Then, to represent the vector of these coefficients, a novel set of symmetric vector-valued multidimensional polynomials of the canonical Gaussian random vector is derived. Interestingly, it is noted that the number of such polynomials is only slowly growing with the maximum polynomial order thereby providing a framework for a compact approximation of the target random vector. The identification of the de-terministic parameters of the expansion of the random coefficients on these symmetric vector-valued multidimensional polynomial is addressed next. Finally, an example of application is provided that demonstrates the good matching of the distributions of the elements of the target random vector and its approximation with only a very limited number of parameters. 1. Introduction. The objective of this paper is to propose the Compressed Principal Component Analysis (CPCA) that is a novel small parameterized representation of any non-Gaussian second-order random variable X = (X 1 ,. .. , X n) with values in R n. This representation would be useful for solving statistical inverse problems related to any stochastic computational model for which there is an uncertain vector-valued system-parameter that is modeled by a random vector X. To explain the benefits of this representation, consider the framework of a classical statistical inverse problem. Let us assume that a parameterized representation of X has been constructed and is written as X = g(z, Ξ) in which Ξ = (Ξ 1 ,. .. , Ξ N) is the R N-valued normalized Gaussian random variable (centered and with a co-variance matrix that is the identity matrix) the probability distribution of which is denoted by P Ξ (dξ) on R N. The parameterization of the representation corresponds to the vector z = (z 1 ,. .. , z M) of hyperparameters, which belongs to an admissible set that is a subset C z of R M. The measurable mapping ξ → g(z, ξ) is defined through the construction of the representation. Consequently, if z is fixed to a given value z opt , then the probability distribution P X of X is completely defined as the image of P Ξ (dξ) under the mapping ξ → g(z opt , ξ). Let us consider a computational model with an uncertain system-parameter x that is modeled by random variable X. Let Q be the vector-valued random quantity of interest that is constructed as an obser

Mots clés

Principal component analysis Compressed principal component analysis Non- Gaussian vector Random eigenvectors Symmetric polynomials Random fields Stochastic processes Inverse problem Stochastic model Reduction method Uncertainty quantification Stochastic modeling

Domaines

Statistiques [math.ST] Probabilités [math.PR] Machine Learning [stat.ML]

Fichier principal

publi-2020-SIAM-ASA-JUQ-8(4)1261-1286-mignolet-soize-preprint.pdf (508.43 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Christian Soize : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02966143

Soumis le : mardi 13 octobre 2020-18:57:53

Dernière modification le : mercredi 27 mars 2024-03:10:26

Archivage à long terme le : jeudi 14 janvier 2021-19:50:49

Dates et versions

hal-02966143 , version 1 (13-10-2020)

Identifiants

HAL Id : hal-02966143 , version 1
DOI : 10.1137/20M1322029

Citer

Marc Mignolet, Christian Soize. Compressed principal component analysis of non-Gaussian vectors. SIAM/ASA Journal on Uncertainty Quantification, 2020, 8 (4), pp.1261-1286. ⟨10.1137/20M1322029⟩. ⟨hal-02966143⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MSME MSME_MECA UPEC UNIV-EIFFEL

55 Consultations

175 Téléchargements

Compressed principal component analysis of non-Gaussian vectors

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager