Skip to main content

Table 1 Summary statistics for scaffolding of sequence and nonsequence based data

From: Enhancing genome assemblies by integrating non-sequence based data

Run Included Number of paired-end reads Total scaffold span N50 scaffold span Number of scaffolds Percentage of original contigs included into scaffolds
Initial Assembly --- --- 36 KB 277,711 0.0 %
Virtual map 173,294 3204 MB 39 KB 271,687 2.2 %
4kb library 8,415,542 3069 MB 49 KB 165,909 40.2 %
8kb library 11,718,457 3177 MB 52 KB 202,026 27.2 %
Illumina libraries 20,133,999 2829 MB 78 KB 129,290 53.4 %
All data 20,407,293 2534 MB 105 KB 124,099 55.2 %
Ideal Genome --- 2700 MB --- 8 ---
  1. This table shows the relative contributions that each library makes to reducing the number of scaffolds and increasing the N50. All data are derived from the Bambus output statistics file. The left column indicates the data set used to enhance the assembly with Bambus. Number of pared end reads indicates the total number of paired data points used for each data set to enhance the assembly. The total scaffold span provides an indirect assessment of the genome size based on the assembly and was used by Bambus to calculate the N50. The N50 indicates the size of the smallest contig in the smallest set of contigs that add up to 50% of the size of their respective total scaffold span. The number of scaffolds indicates the number of independently ordered regions in our assembly. The reduction in this number with the integration of each library indicates the integration and ordering of the original contigs into larger scaffolds. The total number of scaffolds generated from the assembly is listed and the percentage reduction is from the initial number of contigs present in the input library. The bottom row lists the ideal genome size (2.7 GB) and number of contigs (one for each chromosome = 8).