【佳學(xué)基因檢測(cè)】GWAS技術(shù)在基因檢測(cè)和基因解碼中的應(yīng)用?
全基因組關(guān)聯(lián)研究
全基因組關(guān)聯(lián)(GWA)研究掃描了整個(gè)物種的基因組,以尋找多達(dá)數(shù)百萬(wàn)個(gè)SNP與特定感興趣性狀之間的關(guān)聯(lián)。值得注意的是,感興趣的特征實(shí)際上可以是歸因于群體的任何類(lèi)型的表型,無(wú)論是定性(如疾病狀態(tài))還是定量(如身高)?;旧?,給定p個(gè)SNP和n個(gè)樣本或個(gè)體,GWA分析將擬合p個(gè)獨(dú)立的單變量線性模型,每個(gè)模型基于n個(gè)樣本,使用每個(gè)SNP的基因型作為感興趣特征的預(yù)測(cè)因子。每個(gè)P檢驗(yàn)中的關(guān)聯(lián)顯著性(P值)由相應(yīng)SNP的系數(shù)估計(jì)β確定(從技術(shù)上講,關(guān)聯(lián)顯著性為P(eta | H_0:eta=0))。請(qǐng)注意,因?yàn)檫@些測(cè)試是獨(dú)立的,而且數(shù)量相當(dāng)多,所以在建立并行GWA分析時(shí)有很大的計(jì)算優(yōu)勢(shì)。相當(dāng)合理的是,有必要使用多種假設(shè)檢驗(yàn)方法(如Bonferroni、Benjamini-Hochberg或錯(cuò)誤發(fā)現(xiàn)率(FDR))調(diào)整產(chǎn)生的P值。GWA研究現(xiàn)在在許多不同物種的遺傳學(xué)中很常見(jiàn)。
Genome-wide association studies
Genome-wide association (GWA) studies scan an entire species genome for association between up to millions of SNPs and a given trait of interest. Notably, the trait of interest can be virtually any sort of phenotype ascribed to the population, be it qualitative (e.g. disease status) or quantitative (e.g. height). Essentially, given p SNPs and n samples or individuals, a GWA analysis will fit p independent univariate linear models, each based on n samples, using the genotype of each SNP as predictor of the trait of interest. The significance of association (P-value) in each of the p tests is determined from the coefficient estimate of the corresponding SNP (technically speaking, the significance of association is ). Note that because these tests are independent and quite numerous, there is a great computational advantage in setting up a parallelized GWA analysis (as we will do shortly). Quite reasonably, it is necessary to adjust the resulting P-values using multiple hypothesis testing methods such as Bonferroni, Benjamini-Hochberg or false discovery rate (FDR). GWA studies are now commonplace in genetics of many different species.
關(guān)聯(lián)映射與連鎖映射
通常,人們無(wú)法區(qū)分關(guān)聯(lián)和連鎖作圖或數(shù)量性狀位點(diǎn)(QTL)作圖之間的區(qū)別。盡管概念上相似,但它們的工作方式實(shí)際上是相反的。兩者之間的一個(gè)關(guān)鍵區(qū)別是關(guān)聯(lián)作圖依賴于無(wú)關(guān)個(gè)體的高密度SNP基因分型,而連鎖作圖依賴于受控育種實(shí)驗(yàn)中顯著較少的標(biāo)記分離——毫不奇怪,QTL作圖很少在人類(lèi)中進(jìn)行。重要的是,關(guān)聯(lián)作圖提供了基因組中的點(diǎn)關(guān)聯(lián),而連鎖作圖提供了QTL,即染色體區(qū)域。
本教程涵蓋了在進(jìn)行GWA分析時(shí)要考慮的基本方面,從基因型和表型數(shù)據(jù)的預(yù)處理到結(jié)果的解釋。我們將使用316名中國(guó)人、印度人和馬來(lái)人的混合人群,賊近使用高通量SNP芯片測(cè)序、轉(zhuǎn)錄組學(xué)和脂質(zhì)組學(xué)對(duì)其進(jìn)行了表征(Saw等人,2017年)。更具體地說(shuō),我們將尋找>250萬(wàn)SNP標(biāo)記與膽固醇水平之間的關(guān)聯(lián)。賊后,我們將使用USCS基因組瀏覽器探索候選SNP的附近,以獲得功能性見(jiàn)解。此處顯示的方法主要基于里德等人2015年概述的教程。R腳本和一些數(shù)據(jù)可以在我的存儲(chǔ)庫(kù)中找到,但是您仍然需要從這里下載omics數(shù)據(jù)。請(qǐng)遵循回購(gòu)協(xié)議中的說(shuō)明。
Association mapping vs. linkage mapping
Too often, people cannot tell the difference between association and linkage mapping, or quantitative trait loci (QTL) mapping. Albeit conceptually similar, their are actually opposite in their workings. One of the key differences between the two is that association mapping relies on high-density SNP genotyping of unrelated individuals, whereas linkage mapping relies on the segregation of substantially fewer markers in controlled breeding experiments – unsurprisingly QTL mapping is seldom conducted in humans. Importantly, association mapping gives you point associations in the genome, whereas linkage mapping gives you QTL, chromosomal regions.
The present tutorial covers fundamental aspects to consider when conducting GWA analysis, from the pre-processing of genotype and phenotype data to the interpretation of results. We will use a mixed population of 316 Chinese, Indian and Malay that was recently characterized using high-throughput SNP-chip sequencing, transcriptomics and lipidomics (Saw et al., 2017). More specifically, we will search for associations between the >2.5 million SNP markers and cholesterol levels. Finally, we will explore the vicinity of candidate SNPs using the USCS Genome Browser in order to gain functional insights. The methodology shown here is largely based on the tutorial outlined in Reed et al., 2015. The R scripts and some of the data can be found in my repository, but you will still need to download the omics data from here. Please follow the instructions in the repo.
(責(zé)任編輯:佳學(xué)基因)