Skip to main content

Table 5 Prediction rule for two classifiers based on one replicate

From: Large-scale risk prediction applied to Genetic Analysis Workshop 17 mini-exome sequence data

Feature

Empirical Bayes classifier

Random forest classifier

 

Genes

#SNP

MAF

Genes

#SNP

MAF

1

Age

  

Age

  

2

Smoke

  

Smoke

  

3

GOLGA1

1

<0.01

OR1L6

 

<0.01

  

1

0.01–0.05

 

3

0.01–0.05

  

1

≥0.05

 

1

≥0.05

4

FLT1

25

<0.01

VTI1B

9

<0.01

  

7

0.01–0.05

 

1

0.01–0.05

  

3

≥0.05

 

1

≥0.05

5

NFKBIA

6

<0.01

DENND1A

19

<0.01

   

0.01–0.05

 

3

0.01–0.05

  

2

≥0.05

 

4

≥0.05

6

DGKZ

17

<0.01

C9ORF66

4

<0.01

  

4

0.01–0.05

 

3

0.01–0.05

  

1

≥0.05

 

4

≥0.05

7

SMTN

23

<0.01

CECR1

8

<0.01

  

4

0.01–0.05

  

0.01–0.05

  

2

≥0.05

 

4

≥0.05

8

PAK7

1

0.30

MAP3K12

14

<0.01

     

3

0.01–0.05

      

≥0.05

9

ADAM15

22

<0.01

SLC20A2

24

<0.01

  

5

0.01–0.05

 

4

0.01–0.05

  

3

≥0.05

 

1

≥0.05

10

ADAMTS4

33

<0.01

ALK

9

<0.01

  

4

0.01–0.05

 

1

0.01–0.05

  

3

≥0.05

 

6

≥0.05

  1. Top 10 important features from the model incorporating genes and environmental variables (Age and Smoke) using one replicate between our proposed method (empirical Bayes) and the random forest method. #SNP, number of SNPs within a specific gene. MAF shows three intervals of minor allele frequency: MAF < 0.01, 0.01 ≤ MAF < 0.05, and MAF > 0.05. The boldfaced gene FLT1 still can be selected in the empirical Bayes method but is not observed using the random forest method.