Skip to main content

Table 1 Summary statistics for scaffolding of sequence and nonsequence based data

From: Enhancing genome assemblies by integrating non-sequence based data

Run Included

Number of paired-end reads

Total scaffold span

N50 scaffold span

Number of scaffolds

Percentage of original contigs included into scaffolds

Initial Assembly

---

---

36 KB

277,711

0.0 %

Virtual map

173,294

3204 MB

39 KB

271,687

2.2 %

4kb library

8,415,542

3069 MB

49 KB

165,909

40.2 %

8kb library

11,718,457

3177 MB

52 KB

202,026

27.2 %

Illumina libraries

20,133,999

2829 MB

78 KB

129,290

53.4 %

All data

20,407,293

2534 MB

105 KB

124,099

55.2 %

Ideal Genome

---

2700 MB

---

8

---

  1. This table shows the relative contributions that each library makes to reducing the number of scaffolds and increasing the N50. All data are derived from the Bambus output statistics file. The left column indicates the data set used to enhance the assembly with Bambus. Number of pared end reads indicates the total number of paired data points used for each data set to enhance the assembly. The total scaffold span provides an indirect assessment of the genome size based on the assembly and was used by Bambus to calculate the N50. The N50 indicates the size of the smallest contig in the smallest set of contigs that add up to 50% of the size of their respective total scaffold span. The number of scaffolds indicates the number of independently ordered regions in our assembly. The reduction in this number with the integration of each library indicates the integration and ordering of the original contigs into larger scaffolds. The total number of scaffolds generated from the assembly is listed and the percentage reduction is from the initial number of contigs present in the input library. The bottom row lists the ideal genome size (2.7 GB) and number of contigs (one for each chromosome = 8).