SLiMSuite v1.11 sees the introduction of six genome assembly tools:
- DepthCharge = Genome assembly quality control and misassembly repair. DepthCharge is an assembly quality control and misassembly repair program. It uses mapped long read depth of coverage to charge through a genome assembly and identify coverage “cliffs” that may indicate a misassembly. If appropriate, it will then blast the assembly into fragment at those misassemblies.
- DepthKopy = DepthKopy: Read-depth based copy number estimation. DepthKopy applies the same single-copy read depth estimate as DepthSizer to estimate the copy number of different gene regions in a slightly modified version of the approach used in the basenji genome paper.
- DepthSizer = DepthSizer: Read-depth based genome size prediction. DepthSizer uses long-read depth profiles and BUSCO single-copy orthologues to predict genome size. DepthSizer works on the principle that Complete BUSCO genes should represent predominantly single copy (diploid read depth) regions along with some poor quality and/or repeat regions. Assembly artefacts and collapsed repeats etc. are predicted to deviate from diploid read depth in an inconsistent manner. Therefore, even if less than half the region is actually diploid coverage, the modal read depth is expected to represent the actual single copy read depth.
- GapSpanner = GapSpanner: Genome assembly gap long read support and reassembly tool. GapSpanner uses (or generates) a BAM file of long reads mapped to a genome assembly to assess assembly “gaps” for spanning read support. Optionally, reads spanning each gap can be extracted and re-assembled with Flye. If the new assembly spans the gap, crude gap-filling can be performed. This will be reversed if edits are not subsequently supported by spanning reads mapped onto the updated assembly.
- NUMTFinder = NUMTFinder: Nuclear mitochondrial fragment (NUMT) search tool. NUMTFinder uses a mitochondrial genome to search against genome assembly and identify putative NUMTs. NUMT fragments are then combined into NUMT blocks based on proximity.
- Taxolotl = Taxolotl: Genome assembly taxonomy summary and assessment tool. Taxolotl combines the MMseqs2 easy-taxonomy with GFF parsing to perform taxonomic analysis of a genome assembly (and any subsets given by taxsubsets=LIST) using an annotated proteome. Taxonomic assignments are mapped onto genes as well as assembly scaffolds and (if assembly=FILE is given) contigs.
Documentation for these tools can be found in their individual repos. Please note that individual repos may be ahead of the main SLiMSuite repo.
More information can also be found in the corresponding publications:
- Chen SH, Rossetto M, van der Merwe M, Lu-Irving P, Yap JS, Sauquet H, Bourke G, Amos TG, Bragg JG & Edwards RJ (accepted): Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C. Molecular Ecology Resources. [Mol Ecol Res] [bioRxiv]
- Edwards RJ, Field MA, Ferguson JM, Dudchenko O, Keilwagen K, Rosen BD, Johnson GS, Rice ES, Hillier L, Hammond JM, Towarnicki SG, Omer A, Khan R, Skvortsova K, Bogdanovic O, Zammit RA, Aiden EL, Warren WC & Ballard JWO (2021): Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome. BMC Genomics 22:188 [BMC Genomics] [PubMed] [bioRxiv]
- Stuart KC*, Edwards RJ*, Cheng Y, Warren WC, Burt DW, Sherwin WB, Hofmeister NR, Werner SJ, Ball GF, Bateson M, Brandley MC, Buchanan KL, Cassey P, Clayton DF, De Meyer T, Meddle SL & Rollins LA (preprint): Transcript- and annotation-guided genome assembly of the European starling. bioRxiv 2021.04.07.438753; doi: 10.1101/2021.04.07.438753. [*Joint first authors] [bioRxiv]
See also the included release_notes.txt on GitHub for a full list of the python module updates since v1.9.0.