# Compressed principal component analysis of non-Gaussian vectors

Abstract : A novel approximate representation of non-Gaussian random vectors is introduced and validated, which can be viewed as a Compressed Principal Component Analysis (CPCA). This representation relies on the eigenvectors of the covariance matrix obtained as in a Principal Component Analysis (PCA) but expresses the random vector as a linear combination of a random sample of N of these eigenvectors. In this model, the indices of these eigenvectors are independent discrete random variables with probabilities proportional to the corresponding eigenvalues. Moreover, the coefficients of the linear combination are zero mean unit variance random variables. Under these conditions, it is first shown that the covariance matrix of this CPCA matches exactly its PCA counterpart independently of the value of N. Next, it is also shown that the distribution of the random coefficients can be selected, without loss of generality, to be a symmetric function. Then, to represent the vector of these coefficients, a novel set of symmetric vector-valued multidimensional polynomials of the canonical Gaussian random vector is derived. Interestingly, it is noted that the number of such polynomials is only slowly growing with the maximum polynomial order thereby providing a framework for a compact approximation of the target random vector. The identification of the de-terministic parameters of the expansion of the random coefficients on these symmetric vector-valued multidimensional polynomial is addressed next. Finally, an example of application is provided that demonstrates the good matching of the distributions of the elements of the target random vector and its approximation with only a very limited number of parameters. 1. Introduction. The objective of this paper is to propose the Compressed Principal Component Analysis (CPCA) that is a novel small parameterized representation of any non-Gaussian second-order random variable X = (X 1 ,. .. , X n) with values in R n. This representation would be useful for solving statistical inverse problems related to any stochastic computational model for which there is an uncertain vector-valued system-parameter that is modeled by a random vector X. To explain the benefits of this representation, consider the framework of a classical statistical inverse problem. Let us assume that a parameterized representation of X has been constructed and is written as X = g(z, Ξ) in which Ξ = (Ξ 1 ,. .. , Ξ N) is the R N-valued normalized Gaussian random variable (centered and with a co-variance matrix that is the identity matrix) the probability distribution of which is denoted by P Ξ (dξ) on R N. The parameterization of the representation corresponds to the vector z = (z 1 ,. .. , z M) of hyperparameters, which belongs to an admissible set that is a subset C z of R M. The measurable mapping ξ → g(z, ξ) is defined through the construction of the representation. Consequently, if z is fixed to a given value z opt , then the probability distribution P X of X is completely defined as the image of P Ξ (dξ) under the mapping ξ → g(z opt , ξ). Let us consider a computational model with an uncertain system-parameter x that is modeled by random variable X. Let Q be the vector-valued random quantity of interest that is constructed as an obser
Keywords :
Document type :
Journal articles
Domain :

Cited literature [59 references]

https://hal-upec-upem.archives-ouvertes.fr/hal-02966143
Contributor : Christian Soize <>
Submitted on : Tuesday, October 13, 2020 - 6:57:53 PM
Last modification on : Friday, October 16, 2020 - 3:39:40 AM

### File

publi-2020-SIAM-ASA-JUQ-8(4)12...
Files produced by the author(s)

### Citation

Marc Mignolet, Christian Soize. Compressed principal component analysis of non-Gaussian vectors. SIAM/ASA Journal on Uncertainty Quantification, ASA, American Statistical Association, 2020, 8 (4), pp.1261-1286. ⟨10.1137/20M1322029⟩. ⟨hal-02966143⟩

Record views