Enhancing genome assemblies by integrating non-sequence based data

BMC Proceedings

Table 1 Summary statistics for scaffolding of sequence and nonsequence based data

Run Included	Number of paired-end reads	Total scaffold span	N50 scaffold span	Number of scaffolds	Percentage of original contigs included into scaffolds
Initial Assembly	---	---	36 KB	277,711	0.0 %
Virtual map	173,294	3204 MB	39 KB	271,687	2.2 %
4kb library	8,415,542	3069 MB	49 KB	165,909	40.2 %
8kb library	11,718,457	3177 MB	52 KB	202,026	27.2 %
Illumina libraries	20,133,999	2829 MB	78 KB	129,290	53.4 %
All data	20,407,293	2534 MB	105 KB	124,099	55.2 %
Ideal Genome	---	2700 MB	---	8	---

This table shows the relative contributions that each library makes to reducing the number of scaffolds and increasing the N50. All data are derived from the Bambus output statistics file. The left column indicates the data set used to enhance the assembly with Bambus. Number of pared end reads indicates the total number of paired data points used for each data set to enhance the assembly. The total scaffold span provides an indirect assessment of the genome size based on the assembly and was used by Bambus to calculate the N50. The N50 indicates the size of the smallest contig in the smallest set of contigs that add up to 50% of the size of their respective total scaffold span. The number of scaffolds indicates the number of independently ordered regions in our assembly. The reduction in this number with the integration of each library indicates the integration and ordering of the original contigs into larger scaffolds. The total number of scaffolds generated from the assembly is listed and the percentage reduction is from the initial number of contigs present in the input library. The bottom row lists the ideal genome size (2.7 GB) and number of contigs (one for each chromosome = 8).

ISSN: 1753-6561