It fits vine copula based mixture model distributions to the continuous data for a given number of components as described in Sahin and Czado (2021) and use its results for clustering.
vcmm( data, total_comp, is_cvine = NA, vinestr = NA, trunclevel = 1, mar = NA, bicop = NA, methods = c("kmeans"), threshold = 1e-04, maxit = 10 )
data | A matrix or data frame of observations. Categorical/discrete variables not (yet) allowed. If a matrix or data frame, rows correspond to observations (i) and columns correspond to variables (p). |
---|---|
total_comp | An integer specifying the numbers of mixture components (clusters) |
is_cvine | An integer specifying if the type of components' vine tree structure is C-vine before/after the ECM phase of clustering.
|
vinestr | A matrix specifying vine tree structures before/after the ECM phase of clustering. The default is automatic selection. RVineMatrixCheck checks for a valid R-vine matrix. |
trunclevel | An integer showing the level of truncation for vine tree structures before the ECM phase of clustering. The default is 1. |
mar | A vector of character strings indicating the parametric univariate marginal distributions to be fitted before/after the ECM phase of clustering. The default is c('cauchy','gamma','llogis','lnorm','logis','norm','snorm','std', 'sstd'). Other distributions not (yet) allowed. |
bicop | A vector of integers denoting the parametric bivariate copula families to be fitted before/after the ECM phase of clustering. The default is c(1,2,3,4,5,6,7,8,10,13,14,16,17,18,20,23,24,26,27,28,30,33,34,36,37,38,40). BiCop describes the available families with their specifications. |
methods | A vector of character strings indicating initial clustering method(s) to have a partition for model selection before the ECM phase of clustering. Current options:
|
threshold | A numeric, stopping the ECM phase of clustering. The default is 1e-4. |
maxit | An integer, specifying the maximum number of iterations in the CM-step 2 optimization. The default is 10. |
An object of class vcmm result. It contains the elements
the vector with the classification of observations
a list containing the fitted VCMM.
Use print.vcmm_res()
to obtain log-likelihood, BIC, ICL, number of estimated parameters, initial clustering method used
and total number of ECM iterations for the fitted VCMM. summary.vcmm_res()
shows the fitted vine tree structures and
univariate marginal distributions, bivariate copula families with the estimated parameters, as well as
mixture proportions of each component.
Sahin and Czado (2021), Vine copula mixture models and clustering for non-Gaussian data, Econometrics and Statistics. doi: 10.1016/j.ecosta.2021.08.011
# Simulate 3-dimensianal data from parametric vine copula based mixture model # with 2 components on x-scale. Each component has 500 observations. dims <- 3 obs <- c(500,500) RVMs <- list() RVMs[[1]] <- VineCopula::RVineMatrix(Matrix=matrix(c(1,3,2,0,3,2,0,0,2),dims,dims), family=matrix(c(0,3,4,0,0,14,0,0,0),dims,dims), par=matrix(c(0,0.8571429,2.5,0,0,5,0,0,0),dims,dims), par2=matrix(sample(0, dims*dims, replace=TRUE),dims,dims)) RVMs[[2]] <- VineCopula::RVineMatrix(Matrix=matrix(c(1,3,2,0,3,2,0,0,2), dims,dims), family=matrix(c(0,6,5,0,0,13,0,0,0), dims,dims), par=matrix(c(0,1.443813,11.43621,0,0,2,0,0,0),dims,dims), par2=matrix(sample(0, dims*dims, replace=TRUE),dims,dims)) margin <- matrix(c('Normal', 'Gamma', 'Lognormal', 'Lognormal', 'Normal', 'Gamma'), 3, 2) margin_pars <- array(0, dim=c(2, 3, 2)) margin_pars[,1,1] <- c(1, 2) margin_pars[,1,2] <- c(1.5, 0.4) margin_pars[,2,1] <- c(1, 0.2) margin_pars[,2,2] <- c(18, 5) margin_pars[,3,1] <- c(0.8, 0.8) margin_pars[,3,2] <- c(1, 0.2) x_data <- rvcmm(dims, obs, margin, margin_pars, RVMs) # Example-1: fit parametric 3 dimensional R-vine copula based mixture model with 2 components, # using all univariate marginal distributions and bivariate copula families allowed, # initial partition of k-means if (FALSE) { fit <- vcmm(x_data[,1:3], total_comp = 2) print(fit) summary(fit) table(x_data$comp_id, fit$cluster) # evaluate the density at (X1, X2, X3) = (1,2,3) for the fitted vcmm # after encoding fitted vine copula RVMs_fitted <- list() RVMs_fitted[[1]] <- VineCopula::RVineMatrix(Matrix=fit$output$vine_structure[,,1], family=fit$output$bicop_familyset[,,1], par=fit$output$bicop_param[,,1], par2=fit$output$bicop_param2[,,1]) RVMs_fitted[[2]] <- VineCopula::RVineMatrix(Matrix=fit$output$vine_structure[,,2], family=fit$output$bicop_familyset[,,2], par=fit$output$bicop_param[,,2], par2=fit$output$bicop_param2[,,2]) dvcmm(c(1, 2, 3), fit$output$margin, fit$output$marginal_param, RVMs_fitted, fit$output$mixture_prob) }