Authors: Suenori Chiku , Kimio Yoshimura & Teruhiko Yoshida
There are two kinds of applications of principal component analysis (PCA) to analyze population substructures of genetic polymorphism data. One application is for an individual covariance matrix, and the other application is for a marker covariance matrix. The former method is already implemented in EIGENSTRAT 1; the latter method, however, is not common because it cannot be applied, if data include missing typing data (allele call). Here, we describe some modification of a Mixture Model 2, so that it can handle data with missing allele calls (we call it a compensated mixture model (CMM) protocol). MM applies PCA to a marker covariance matrix before applying the normal-distribution mixture model.
The calculation procedures for CMM are as follows:
The result on the 5,197 SNP typing data on the Chinese and Japanese population of the HapMap project (SNPs were selected by the following criteria: physical distances among the SNPs are more than 500kbp, minor allele frequency more than 3%, and missing genotype call rate less than 5%) are shown in Table 1 and Figure 1.
This work was supported in Japan by the program for promotion of Fundamental Studies in Health Sciences of the National Institute of Biomedical Innovation (NiBio).
Table 1: The number of counts of the inferred subpopulation number based on Bayesian information criterion for the HapMap Chinese and Japanese data on the 5,197 SNPs.
Figure 1: Bayesian information criterion values of the 5,197 SNPs of the HapMap Chinese and Japanese data. A result of 200 iterations of CMM is shown.
Genetic variation in PSCA is associated with susceptibility to diffuse-type gastric cancer, Hiromi Sakamoto, Kimio Yoshimura, Norihisa Saeki, Hitoshi Katai, Tadakazu Shimoda, Yoshihiro Matsuno, Daizo Saito, Haruhiko Sugimura, Fumihiko Tanioka, Shunji Kato, Norio Matsukura, Noriko Matsuda, Tsuneya Nakamura, Ichinosuke Hyodo, Tomohiro Nishina, Wataru Yasui, Hiroshi Hirose, Matsuhiko Hayashi, Emi Toshiro, Sumiko Ohnami, Akihiro Sekine, Yasunori Sato, Hirohiko Totsuka, Masataka Ando, Ryo Takemura, Yoriko Takahashi, Minoru Ohdaira, Kenichi Aoki, Izumi Honmyo, Suenori Chiku, Kazuhiko Aoyagi, Hiroki Sasaki, Shumpei Ohnami, Kazuyoshi Yanagihara, Kyong-Ah Yoon, Myeong-Cherl Kook, Yeon-Su Lee, Sook Ryun Park, Chan Gyoo Kim, Il Ju Choi, Teruhiko Yoshida, Yusuke Nakamura, and Setsuo Hirohashi, Nature Genetics 40 (6) 730 - 740 18/05/2008 doi:10.1038/ng.152
Suenori Chiku, Mizuho Information & Research Institute, Inc.
Kimio Yoshimura, Keio University School of Medicine
Teruhiko Yoshida, National Cancer Center Research Institute
Source: Protocol Exchange (2008) doi:10.1038/nprot.2008.129. Originally published online 10 July 2008.