Genome-wide association studies (GWAS) and quantitative trait loci (QTL) mapping have been successful in identifying genetic variants in humans and model organisms. However, for complex traits, in which multiple loci contribute, genomic variants with small individual effects may be missed by these methods owing to limited statistical power from testing thousands of single-nucleotide polymorphisms (SNPs). Here we present a case-study of the application of machine learning algorithms, specifically ElasticNet regression, as a complementary approach to conventional GWAS and QTL mapping techniques, with the goal of identifying major loci and others that may contribute minor effects on their own, but collectively drive substantial phenotypic diversity. As a test of the approach, we applied ElasticNet regression to identify loci that control the plasticity in the gene regulatory network for C. elegans endoderm development. Our phenotype analysis has revealed large variation in the requirement for SKN-1 in endoderm development: while the laboratory N2 strain shows a partially penetrant phenotype loss of gut (30% of embryos produce gut) in
skn-1(-), we found that this varies widely across 94 wild isotypes, ranging from 0% to ~60%. GWAS using efficient mixed-model analysis (EMMA) identified a single highly significant peak on chromosome IV that accounts for at least some of the variation across these strains. QTL mapping using RILs obtained from crosses between N2 and the MY16 isotype (in which only ~2% of embryos make gut in
skn-1(-)) identified a significant QTL on chromosome IV, likely corresponding to the locus identified by GWAS, as well as three additional QTL on chromosomes I, II, and X. We found that the ElasticNet machine-learning tool applied to the same dataset not only effectively identified all four regions found by conventional QTL mapping, but also uncovered additional loci on chromosomes III and V (R2 = 0.50, p = 1.95x10 -7), revealing that this method may provide a more sensitive strategy for identifying genomic variations responsible for complex genetic traits. We are currently testing the novel regions identified by the machine learning approach in near-isogenic lines of different genetic backgrounds to assess their impact on the observed variation in the endoderm gene regulatory network.