Table 1: Statistics of the metagenomes based on MG-RAST default QC filtering and annotation.

Sample

UM022

UM067

UM037

UM122

MG-RAST ID

4543594.3

4543596.3

4543595.3

4543593.3

Pre-QC

Metagenome size (bp)

3,030,324,503

2,611,156,821

2,249,873,193

2,197,833,062

Total number of reads

31,731,210

27,274,680

23,555,696

22,968,554

Artificial duplicate reads

3,052,366

3,165,803

2,575,726

2,865,700

Post-QC

Metagenome size (bp)

2,453,950,463

2,065,160,967

1,783,970,426

1,709,819,161

Total number of reads

25,342,659

21,287,088

18,415,034

17,635,818

Mean sequence length (bp)

96 ± 3

97 ± 3

96 ± 3

96 ± 3

Mean GC content (%)

44 ± 8

44 ± 8

45 ± 8

44 ± 8

Predicted protein features

19,180,540

15,866,409

14,057,605

13,310,544

Predicted rRNA features

4,717,758

4,212,007

3,680,207

3,720,011

Identified protein featuresa

1,007,289 (5.3%)

1,014,906 (6.4%)

950,030 (6.8%)

1,016,961 (7.4%)

Identified rRNA features

7,575 (0.2%)

20,627 (0.5%)

14,166 (0.4%)

10,590 (0.3%)

Identified functional categories

102,026

92,558

119,189

99,250

Total DRISEE error (%)

8.412

7.492

6.837

6.227

α-Diversity (species)

19.761

12.142

13.682

15.790

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

aFeatures assigned annotation using the M5NR protein database.