Researchers at UNIST have revealed through a new study published in Nucleic Acids Research a new method that they claim is capable of identifying potential disease genes in a more accurate and cost-effective way.
The novel statistical algorithm has also been considered as a new promising approach for the identification of candidate disease genes, as it works effectively with less genomic data and takes only a minute or two to get results.
Researchers have presented the novel method and software GSA-SNP2 for pathway enrichment analysis of GWAS P-value data. According to the research team, GSA-SNP2 provides high power, decent type I error control and fast computation by incorporating the random set model and SNP-count adjusted gene score.
Each individual’s genome is a unique combination of DNA sequences that play major roles in determining who we are. This accounts for all individual differences, including susceptibility for disease and diverse phenotypes. Such genetic variation among humans are known as single nucleotide polymorphisms (SNPs). SNPs that correlate with specific diseases could serve as predictive biomarkers to aid the development of new drugs. Through the statistical analysis of GWAS summary data, it is possible to identify the disease-associated SNPs.
Despite the astronomical amounts of money and time invested in the statistical analysis of SNP data, the conventional SNP detection technologies have been unable to identify all possible SNPs. This is because most of the conventional methods for detecting SNPs are designed to strictly control false-positives in the results. Therefore, among tens of thousands of genomics data and hundreds of thousands of SNPs analyzed, the number of markers described within a candidate disease gene often reaches severl tens.
The team aimed to develop an algorithm that improves the statistical predictability while maintaining accurate control of false positives. To do this, they applied the monotone Cubic Spline trend curve to the gene score via the competitive pathway analysis for gene expression data.
In a comparative study using simulated and real GWAS data, GSA-SNP2 exhibited high power and best prioritized gold standard positive pathways compared with six existing enrichment-based methods and two self-contained methods. Based on these results, the difference between pathway analysis approaches was investigated and the effects of the gene correlation structures on the pathway enrichment analysis were also discussed. In addition, GSA-SNP2 is able to visualize protein interaction networks within and across the significant pathways so that the user can prioritize the core subnetworks for further studies.
According to the research team, GSA-SNP2 provides a greatly improved type I error control by using the SNP-count adjusted gene scores, while nevertheless preserving high statistical power. It also provides both local and global protein interaction networks in the associated pathways, and may facilitate integrated pathway and network analysis of GWAS data.
The research team expects that their GSA-SNP2 is able to visualize protein interaction networks within and across the significant pathways so that the user can prioritize the core subnetworks for further studies.