Machine Learning for Genetic Studies
Additional plots and information:
Subsets
The full set of data obtained from the GWAS 1 consists of 7 833 SNPs and is henceforth referred to as the “full” set. Out of these SNPs, 23 have a p-value below 1e-8, and are henceforth referred to as the “tops”. Two optional feature reduction methods are implemented as well, ‘SelectKBest’ which selects the most significant features according to the ANOVA f-score, and lastly we use PCA to reduce the number of features in the full set to 100 “artificial” SNPs, the “reduced” set. (Results from the full set are not yet available)
Name | Reduction | Selection | Features (nr) | SNPs |
---|---|---|---|---|
Full | GWAS | No | 7 833 | Yes |
Tops | GWAS | p-value $< 1e^{-8}$ | 23 | Yes |
Top5 | GWAS | min p-value | 5 | Yes |
Reduced | GWAS + PCA | top k | 100 | No |
Selected | GWAS + f-anova | top k | 100 | Yes |