| Title: | Gene‐based Association Tests of Zero‐inflated Count Phenotype for Rare Variants |
|---|---|
| Description: | Gene‐based association tests to model count data with excessive zeros and rare variants using zero-inflated Poisson/zero-inflated negative Binomial regression framework. This method was originally described by Fan, Sun, and Li in Genetic Epidemiology 46(1):73-86 <doi:10.1002/gepi.22438>. |
| Authors: | Xiaomin Liu [aut, cre, cph] |
| Maintainer: | Xiaomin Liu <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.1 |
| Built: | 2026-05-16 06:49:12 UTC |
| Source: | https://github.com/fanx0037/zim4rv |
Cauchy combination test (Cauchy‐p)
This function combines p-values using Cauchy combination test for testing the joint genetic effect.
cauchyp(x)cauchyp(x)
x |
a numeric vector containing p-values |
a combined p-value indicating the joint effect
Small, artificially generated toy data set that provides artificial information of genotypes for 200 individuals on 3 rs locations to illustrate the analysis with the use of the package.
data(Ex1_genedata)data(Ex1_genedata)
An object of class "data.frame"
Family IDs
Individual IDs
Genotype code for rs1
Genotype code for rs2
Genotype code for rs3
This data set was artificially created and modified for the ZIM4rv package.
data(Ex1_genedata) head(Ex1_genedata)data(Ex1_genedata) head(Ex1_genedata)
Small, artificially generated toy data set that provides artificial information of count phenotypes and covariates for 200 individuals to illustrate the analysis with the use of the package.
data(Ex1_phenodata)data(Ex1_phenodata)
An object of class "data.frame"
Family IDs
Individual IDs
Zero-inflated count phenotypes
Covariate education years
Covariate sex
The first principal component
The second principal component
The third principal component
This data set was artificially created and modified for the ZIM4rv package.
data(Ex1_phenodata) head(Ex1_phenodata)data(Ex1_phenodata) head(Ex1_phenodata)
Small, artificially generated toy data set that provides artificial information of covariates for 15 individuals to illustrate the pre-processing with the use of the package.
data(Ex2_covar)data(Ex2_covar)
An object of class "data.frame" listing IDs and covariates separately
This data set was artificially created and modified for the ZIM4rv package.
data(Ex2_covar) head(Ex2_covar)data(Ex2_covar) head(Ex2_covar)
Small, artificially generated toy data set that provides artificial information of dosage for 15 individuals to illustrate the pre-processing with the use of the package.
data(Ex2_dosage)data(Ex2_dosage)
An object of .dosage file
This data set was artificially created and modified for the ZIM4rv package.
data(Ex2_dosage) head(Ex2_dosage)data(Ex2_dosage) head(Ex2_dosage)
Small, artificially generated toy data set that provides artificial information of .fam file for 15 individuals to illustrate the pre-processing with the use of the package.
data(Ex2_fam)data(Ex2_fam)
An object of standard .fam file
This data set was artificially created and modified for the ZIM4rv package.
data(Ex2_fam) head(Ex2_fam)data(Ex2_fam) head(Ex2_fam)
Small, artificially generated toy data set that provides artificial information of phenotypes for 15 individuals to illustrate the pre-processing with the use of the package.
data(Ex2_pheno)data(Ex2_pheno)
An object of class "data.frame" listing IDs and phenotypes separately
This data set was artificially created and modified for the ZIM4rv package.
data(Ex2_pheno) head(Ex2_pheno)data(Ex2_pheno) head(Ex2_pheno)
Small, artificially generated toy data set that provides artificial information of 3 genetic regions to illustrate the pre-processing with the use of the package.
data(Ex2_region)data(Ex2_region)
An object of class "data.frame" listing genetic regions
where each row contains chromosome, basepairs and the name of genetic region respectively
This data set was artificially created and modified for the ZIM4rv package.
data(Ex2_region) head(Ex2_region)data(Ex2_region) head(Ex2_region)
Compute the p-value for the burden test
This function takes a vector of weights, a data frame of rare variants
and a matrix of Score statistics produced by U_fi_lmd for ZIP model
or U_phi_mu4zinb for ZINB model to compute the p-value for the burden test.
p_burden_single(wt, G_rare, s)p_burden_single(wt, G_rare, s)
wt |
a numeric vector containing weights for all variants |
G_rare |
a data frame containing data of rare variants |
s |
a matrix of the score statistics for each variant from each subject |
the p-value for the burden test
Compute the p-value for the kernel test
This function takes a diagonal matrix of weights, a data frame of rare variants
and a matrix of Score statistics produced by U_fi_lmd for ZIP model
or U_phi_mu4zinb for ZINB model to compute the p-value for the kernel test.
p_kernel_single(wt_matrix2, G_rare, s)p_kernel_single(wt_matrix2, G_rare, s)
wt_matrix2 |
a diagonal matrix containing the squared weights for all variants |
G_rare |
a data frame containing data of rare variants |
s |
a matrix of the score statistics for each variant from each subject |
the p-value for the kernel test (ZIP-k)
Estimation of phi_hat and lambda_hat for ZIP model
This function gives the estimation of 2 parameters phi and lambda for each subject under the null hypothesis.
phi_lambda_hat(simud)phi_lambda_hat(simud)
simud |
a data frame containing a phenotype named y and covariates |
This function first fits zero‐inflated Poisson regression of phenotype y on the covariates only to obtain the estimates of regression coefficients and then compute the estimations of phi and lambda.
a list of 2 estimations of parameters for each subject
zeroinfl
Estimation of phi_hat, mu_hat and alpha_hat for ZINB model
This function gives the estimation of three parameters phi, mu and alpha in ZINB model for each subject under the null hypothesis.
phi_mu_hat4zinb(simud)phi_mu_hat4zinb(simud)
simud |
a data frame containing a phenotype named y and covariates |
This function first fits zero‐inflated negative binomial regression of phenotype y on the covariates only to obtain the estimates of regression coefficients and inverse dispersion and then compute the estimations of phi, mu and alpha.
a list of 3 estimations of parameters for each subject
zeroinfl
Preprocess genotype files in PLINK format
This function converts PLINK format files into data frames containing genotypes information in proper format for the model fitting and testing.
preprocess_genedata(fam_file, dosage_file, region_file, gene_name)preprocess_genedata(fam_file, dosage_file, region_file, gene_name)
fam_file |
.fam file in PLINK format |
dosage_file |
a dosage file includes dosage information of each variant for all individuals |
region_file |
a file listing genetic regions where each row contains chromosome, basepairs and the name of genetic region respectively |
gene_name |
a character string of the name of a gene, e.g."CEPT". The name is case-sensitive. |
a data frame containing genotypes for all individuals in the required format for model fitting and testing
data(Ex2_fam) data(Ex2_dosage) data(Ex2_region) preprocess_genedata(Ex2_fam,Ex2_dosage,Ex2_region,"r2")data(Ex2_fam) data(Ex2_dosage) data(Ex2_region) preprocess_genedata(Ex2_fam,Ex2_dosage,Ex2_region,"r2")
Preprocess phenotype files in PLINK format
This function converts PLINK format files into data frames containing phenotypes and covariates information in proper format for the model fitting and testing.
preprocess_phenodata(pheno_file, cov_file)preprocess_phenodata(pheno_file, cov_file)
pheno_file |
phenotype file in PLINK format |
cov_file |
covariate file in PLINK format |
a data frame containing phenotypes and covariates respectively for all individuals in the required format for model fitting and testing
data(Ex2_pheno) data(Ex2_covar) preprocess_phenodata(Ex2_pheno,Ex2_covar)data(Ex2_pheno) data(Ex2_covar) preprocess_phenodata(Ex2_pheno,Ex2_covar)
Compute Score statistics for ZIP model
This function takes the estimations of phi and lambda produced by the phi_lambda_hat
and computes the score statistics under the null hypothesis.
U_fi_lmd(simudata, G_rare)U_fi_lmd(simudata, G_rare)
simudata |
a data frame containing a phenotype named y and covariates |
G_rare |
a data frame containing data of rare variants with the same subject order as in simudata |
a list of 2 matrice of the score statistics for each variant from each subject
Compute score statistics for ZINB model
This function takes the estimations of phi and lambda produced by the phi_lambda_hat4negbin
and computes the score statistics for ZINB model under the null hypothesis.
U_phi_mu4zinb(simudata, G_rare)U_phi_mu4zinb(simudata, G_rare)
simudata |
a data frame containing a phenotype named y and covariates |
G_rare |
a data frame containing data of rare variants with the same subject order as in simudata |
a list of 2 matrice of the score statistics for each variant from each subject
Vuong's test
This function performs Vuong's test, a likelihood ratio test for model selection and non-nested hypotheses. This function is for model selection between zero-inflated Poisson model and zero-inflated negative binomial model.
vuong_test(phenodata)vuong_test(phenodata)
phenodata |
a data frame containing family and individual IDs for all objects as well as zero-inflated counts as a phenotype and a set of covariates. Each row represents a different individual. The first two columns are Family ID (FID) and Individual ID (IID) respectively. There must be one and only one phenotype in the third column and the phenotype have to be zero-inflated count data which should be non-negative integers, e.g. neuritic plaque counts. Each of the rest of columns represents a different covariate, e.g. age, sex, etc. |
nothing returned, prints a table of 3 test statistics and p values, and exits silently.
Gene‐based association tests to model zero-inflated count data
This function performs gene‐based association tests between a set of SNPs/genes and zero-inflated count data using ZIP regression or ZINB regression or two-stage SKAT model framework.
zimfrv( phenodata, genedata, genename = "NA", weights = "Equal", missing_cutoff = 0.15, max_maf = 1, model = "zip" )zimfrv( phenodata, genedata, genename = "NA", weights = "Equal", missing_cutoff = 0.15, max_maf = 1, model = "zip" )
phenodata |
a data frame containing family and individual IDs for all objects as well as zero-inflated counts as a phenotype and a set of covariates. Each row represents a different individual. The first two columns are Family ID (FID) and Individual ID (IID) respectively. There must be one and only one phenotype in the third column and the phenotype have to be zero-inflated count data which should be non-negative integers, e.g. neuritic plaque counts. Each of the rest of columns represents a different covariate, e.g. age, sex, etc. |
genedata |
a data frame containing family and individual IDs for all objects as well as numeric genotype data. Each row represents a different individual. The first two columns are Family ID (FID) and Individual ID (IID) respectively. Each of the rest columns represents a seperate gene/SNP marker. The genotype should be coded as 0, 1, 2 and NA for AA, Aa, aa and missing. Both of Family ID (FID) and Individual ID (IID) for each row in the 'genedata' derived from the PLINK formatted files should be in the same order as in the 'phenodata'. The number of rows in 'genedata' should be equal to the number of rows in 'phenodata'. |
genename |
a character string of the name of a gene, e.g. "CETP". The name is case-sensitive. |
weights |
a character string of pre-specified variant weighting schemes (default="Equal"). "Equal" represents no weight, "MadsenBrowning" represents the Madsen and Browning (2009) weight, "Beta" represents the Beta weight. |
missing_cutoff |
a cutoff of the missing rates of SNPs (default=0.15). Any SNPs with missing rates higher than the cutoff will be excluded from the analysis. |
max_maf |
a cutoff of the maximum minor allele frequencies (MAF) (default=1, no cutoff). Any SNPs with MAF > cutoff will be excluded from the analysis. |
model |
character specification of zero-inflated count model family (default="zip"). "zip" represents Zero-Inflated Poisson model, "zinb" represents Zero-Inflated Negative Binomial model, "skat" represents the two-stage Sequence Kernel Association Test method. |
a list of 10 items including the name of gene, the number of rare variants in the genetic region, the kind of method used for modeling, and individual p-values of gene‐based association tests (burden test and kernel test for both parameters) and combined p-values using Cauchy combination test.
GeneName |
the name of gene. |
No.Var |
the number of rare variants in the gene. |
Method |
the method used to compute the p-values. |
p.value_pi_burden |
single p-value for parameter |
p.value_lambda_burden / p.value_mu_burden |
single p-value for parameter |
p.value_pi_kernel |
single p-value for parameter |
p.value_lambda_kernel / p.value_mu_kernel |
single p-value for parameter |
p.value_pi_combined |
Combined p-value of testing parameter |
p.value_lambda_combined / p.value_mu_combined |
Combined p-value of testing parameter |
p.value_overall |
Combined p-value of testing the overall association using Cauchy combination test. |
Fan, Q., Sun, S., & Li, Y.‐J. (2021). Precisely modeling zero‐inflated count phenotype for rare variants. Genetic Epidemiology, 1–14.
data(Ex1_phenodata) data(Ex1_genedata) zimfrv(Ex1_phenodata,Ex1_genedata,weights = "Beta",max_maf = 0.02,model="zinb")data(Ex1_phenodata) data(Ex1_genedata) zimfrv(Ex1_phenodata,Ex1_genedata,weights = "Beta",max_maf = 0.02,model="zinb")