Thursday, 19 September 2024

SLiMSuite Release v1.12.0 (2024-09-19)


The current SLiMSuite release is v1.12.0 (2024-09-19) and can be downloaded by clicking the button (left).

In addition to the tarball available via the button (left), SLiMSuite is now available as a GitHub repository (right).

DOI

See also: Installation and Setup.


The main updates in SLiMSuite v1.12.0 are:

  • Python3. The main tools in use have been updated and checked with Python3. Some older tools might still have bugs.
  • BUSCOMP now has a new modephylofas=T to generate output of compiled and renamed files for BUSCO-based phylogenomics.
  • DepthKopy has undergone several upgrades and will now rate Duplicated BUSCOs (TRUE/FALSE) based on depth, chunk up input for multithreaded processing, and collapse features by depth to estimate total copy number. KAT can now generate the kmer usage in an alternative assembly for comparison.
  • DepthSizer has some miscellaneous bug fixes and the same parallisation added to DepthKopy.
  • Diploidocus output has been updated for ChromSyn compatibility.
  • NUMTFinder has had several coverage and depth bugs fixed.
  • PAFScaff has been updated for rapid BUSCO-based mapping. Added purechrom=T to enable reciprocal PAFScaff runs for SharpClaw
  • SynBad has received multiple updates to enable BUSCO mapping and update the assembly map output to be compatible with Telociraptor.
  • Telociraptor has received multiple updates to improve generation of tweaked assemblies from assembly maps. It also now features a chromosome sorting and renaming function based on size.
  • DepthCharge has a new minspan=INT minimum spanning bp at end of reads (trims from PAF alignments).
  • ChromSyn has undergone numerous updates and improvement: see the ChromSyn github for details.

NOTE: Several tools are now maintained and updated more regularly in their own GitHub repos.

Wednesday, 12 January 2022

SLiMSuite release v1.11.0 (2022-01-12)

DOI

SLiMSuite v1.11 sees the introduction of six genome assembly tools:

  • DepthCharge = Genome assembly quality control and misassembly repair. DepthCharge is an assembly quality control and misassembly repair program. It uses mapped long read depth of coverage to charge through a genome assembly and identify coverage “cliffs” that may indicate a misassembly. If appropriate, it will then blast the assembly into fragment at those misassemblies.
  • DepthKopy = DepthKopy: Read-depth based copy number estimation. DepthKopy applies the same single-copy read depth estimate as DepthSizer to estimate the copy number of different gene regions in a slightly modified version of the approach used in the basenji genome paper.
  • DepthSizer = DepthSizer: Read-depth based genome size prediction. DepthSizer uses long-read depth profiles and BUSCO single-copy orthologues to predict genome size. DepthSizer works on the principle that Complete BUSCO genes should represent predominantly single copy (diploid read depth) regions along with some poor quality and/or repeat regions. Assembly artefacts and collapsed repeats etc. are predicted to deviate from diploid read depth in an inconsistent manner. Therefore, even if less than half the region is actually diploid coverage, the modal read depth is expected to represent the actual single copy read depth.
  • GapSpanner = GapSpanner: Genome assembly gap long read support and reassembly tool. GapSpanner uses (or generates) a BAM file of long reads mapped to a genome assembly to assess assembly “gaps” for spanning read support. Optionally, reads spanning each gap can be extracted and re-assembled with Flye. If the new assembly spans the gap, crude gap-filling can be performed. This will be reversed if edits are not subsequently supported by spanning reads mapped onto the updated assembly.
  • NUMTFinder = NUMTFinder: Nuclear mitochondrial fragment (NUMT) search tool. NUMTFinder uses a mitochondrial genome to search against genome assembly and identify putative NUMTs. NUMT fragments are then combined into NUMT blocks based on proximity.
  • Taxolotl = Taxolotl: Genome assembly taxonomy summary and assessment tool. Taxolotl combines the MMseqs2 easy-taxonomy with GFF parsing to perform taxonomic analysis of a genome assembly (and any subsets given by taxsubsets=LIST) using an annotated proteome. Taxonomic assignments are mapped onto genes as well as assembly scaffolds and (if assembly=FILE is given) contigs.

Documentation for these tools can be found in their individual repos. Please note that individual repos may be ahead of the main SLiMSuite repo.

More information can also be found in the corresponding publications:

See also the included release_notes.txt on GitHub for a full list of the python module updates since v1.9.0.

Monday, 11 October 2021

BUSCOMP v0.13.0 (MetaEuk) release

BUSCOMP v0.13.0 is now on GitHub. This release features updates to parse additional BUSCO v5 outputs, including transcriptome and proteome mode. It has also been updated to be compatible with MetaEuk runs by generating the missing *.fna files where possible.

The citation remains:

Stuart KC, Edwards RJ, Cheng Y, Warren WC, Burt DW, Sherwin WB, Hofmeister NR, Werner SJ, Ball GF, Bateson M, Brandley MC, Buchanan KL, Cassey P, Clayton DF, De Meyer T, Meddle SL, Rollins LA (preprint): Transcript- and annotation-guided genome assembly of the European starling. bioRxiv 2021.04.07.438753; doi: 10.1101/2021.04.07.438753. [*Joint first authors]

DepthSizer v1.4.0 (IndelRatio) release

DepthSizer v1.4.0 has been released on GitHub. DepthSizer is a program to estimate genome size from an assembly, long-read sequencing data, and BUSCO single-copy orthologue predictions.

DepthSizer works on the principle that Complete BUSCO genes should represent predominantly single copy (diploid read depth) regions along with some poor quality and/or repeat regions. Assembly artefacts and collapsed repeats etc. are predicted to deviate from diploid read depth in an inconsistent manner. Therefore, even if less than half the region is actually diploid coverage, the modal read depth is expected to represent the actual single copy read depth.

This release features an extensive reworking under the hood, which moves the main calculation into R and smooths the read depth modal density calculation. Some of the older, less accurate, approaches have been dropped in favour of some additional mapping adjustments that aim to frame the upper and lower bounds of genome size.

Current citation:

Chen SH, Rossetto M, van der Merwe M, Lu-Irving P, Yap JS, Sauquet H, Bourke G, Bragg JG & Edwards RJ (preprint): Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C. bioRxiv 2021.06.02.444084; doi: 10.1101/2021.06.02.444084.

Sunday, 27 December 2020

SLiMSuite release v1.9.1 (2020-12-27)

DOI

SLiMSuite release v1.9.1 (2020-12-27) is now on GitHub and Zenodo:

SLiMSuite v1.9 sees the introduction of four genome assembly tools:

  • Diploidocus = Diploid genome assembly analysis toolkit. Includes assembly cleanup (haplotig/artefact removal), genome size prediction and read depth copy number analysis.
  • PAFScaff = Pairwise mApping Format reference-based scaffold anchoring and super-scaffolding. Uses minimap2 to map a genome assembly onto reference chromosomes.
  • SAAGA = Summarise, Annotate & Assess Genome Annotations. Uses a reference proteome to summarise and assess genome annotations.
  • SynBad = Synteny-based scaffolding adjustment tool for comparing two related genome assemblies and identify putative translocations and inversions between the two that correspond to gap positions. (Development only.)

There have also been significant updates to:

  • BUSCOMP = BUSCO Compiler and Comparison tool. Used for genome assembly completeness estimates that are robust to sequence quality, and for compiling BUSCO results.

Other changes include some initial reformatting for Python3 compatibility. This is ongoing work; please report any odd behaviour.

See the included release_notes.txt for a full list of the python module updates since v1.8.1.

NOTE: At time of posting, the REST servers have not yet been updated with the latest version. This will happen soon.

Monday, 27 May 2019

SLiMSuite release v1.8.1 (2019-05-27)

SLiMSuite release v1.8.1 (2019-05-27) is now on GitHub and Zenodo:

This update has fast-forwarded the SLiMSuite release to v1.8.1 to be consistent with the tools/slimsuite.py wrapper script. A top level SLiMSuite.py file can now be run to access the main tools and functions of the package. The REST servers have also been updated to run this version of the code.

This release of SLiMSuite contains a number of updates related to the REST servers and some new tools, notably SAMPhaser long read diploid phasing algorithm, and BUSCOMP BUSCO compiler and comparison tool. See release notes (below) for more details.

SLiMSuite updates

Updates in extras/:

• rje_pydocs: Updated from Version 2.16.7.
→ Version 2.16.8: Updated to to parse https.
→ Version 2.16.9: Tweaked docstring parsing.

Updates in libraries/:

• rje: Updated from Version 4.19.0.
→ Version 4.19.1: Added code for catching non-ASCII log filenames.
→ Version 4.20.0: Added quiet mode to log object and output of errors to stderr. Fixed rankList(unique=True)
→ Version 4.21.0: Added hashlib MD% functions.
→ Version 4.21.1: Fixed bug where silent=T wasn't running silent.

• rje_blast_V2: Updated from Version 2.22.2.
→ Version 2.23.3: Fixed LocalIDCut error for GABLAM and QAssemble stat filtering.

• rje_db: Updated from Version 1.9.0.
→ Version 1.9.1: Updated logging of adding/removing fields: default is now when debugging only.

• rje_disorder: Updated from Version 1.2.0.
→ Version 1.3.0: Switched default behaviour to be md5acc=T.
→ Version 1.4.0: Fixed up disorder=parse and disorder=foldindex.
→ Version 1.5.0: Added iupred2 and anchor2 parsing from URL using accnum. Made default disorder=iushort2.

• rje_genbank: Updated from Version 1.5.3.
→ Version 1.5.4: Added recognition of *.gbff for genbank files.

• rje_obj: Updated from Version 2.2.2.
→ Version 2.3.0: Added quiet mode to object and stderr output.
→ Version 2.4.0: Added vLog() and bugLog() methods.
→ Version 2.4.1: Fixed bug where silent=T wasn't running silent.

• rje_paf: Created/Renamed/moved.
→ Version 0.0.0: Initial Compilation.
→ Version 0.1.0: Initial working version. Compatible with GABLAM v2.30.0 and Snapper v1.7.0.
→ Version 0.2.0: Added endextend=X : Extend minimap2 hits to end of sequence if with X bp [10]
→ Version 0.3.0: Added mapsplice mode for dealing with transcript mapping.
→ Version 0.3.1: Correct PAF splicing bug.
→ Version 0.4.0: Added TmpDir and forking for GABLAM conversion.
→ Version 0.5.0: Added uniquehit=T/F : Option to use *.hitunique.tdt table of unique coverage for GABLAM coverage stats [False]

• rje_ppi: Updated from Version 2.8.1.
→ Version 2.9.0: Added ppiout=FILE : Save pairwise PPI file following processing (if rest=None) [None]

• rje_qsub: Updated from Version 1.9.2.
→ Version 1.9.3: Updates the order of the qsub -S /bin/bash flag.

• rje_rmd: Created/Renamed/moved.
→ Version 0.0.0: Initial Compilation.

• rje_samtools: Updated from Version 1.20.0.
→ Version 1.20.1: Fixed mlen bug. Added catching of unmapped reads in SAM file. Fixed RLen bug. Changed softclip defaults.
→ Version 1.20.2: Fixed readlen coverage bug and acut bug.

• rje_seq: Updated from Version 3.25.0.
→ Version 3.25.1: Fixed -long_seqids retrieval bug.
→ Version 3.25.2: Fixed 9spec filtering bug.

• rje_seqlist: Updated from Version 1.29.0.
→ Version 1.30.0: Updated and improved DNA2Protein.
→ Version 1.31.0: Added genecounter to rename option for use with other programs, e.g. PAGSAT.
→ Version 1.31.1: Fixed edit bug when not in DNA mode.
→ Version 1.32.0: Added genomesize and NG50/LG50 to DNA summarise.
→ Version 1.32.1: Fixed LG50/L50 bug.

• rje_sequence: Updated from Version 2.6.0.
→ Version 2.7.0: Added shift=X to maskRegion() for 1-L input. Fixed cterminal maskRegion.

• rje_slimcore: Updated from Version 2.9.0.
→ Version 2.10.0: Added seqfilter=T/F : Whether to apply sequence filtering options (goodX, badX etc.) to input [False]
→ Version 2.10.1: Fixed default results file bug.
→ Version 2.10.2: Improved handling and REST output of disorder scores.
→ Version 2.11.0: Modified qregion=X,Y to be 1-L numbering.

• rje_slimlist: Updated from Version 1.7.3.
→ Version 1.7.4: Modified concetanation of SLiMSuite results to use "|" in place of "#" for better compatibility.

• rje_uniprot: Updated from Version 3.25.0.
→ Version 3.25.1: Fixed proteome download bug following Uniprot changes.
→ Version 3.25.2: Fixed Uniprot protein extraction issues by using curl. (May not be a robust fix!)

Updates in tools/:

• buscomp: Created/Renamed/moved.
→ Version 0.0.0: Initial Compilation.
→ Version 0.1.0: Basic working version.
→ Version 0.2.0: Functional version with basic RMarkdown HTML output.
→ Version 0.3.0: Added ratefas=FILELIST: Additional fasta files of assemblies to rate with BUSCOMPSeq (No BUSCO run) [].
→ Version 0.4.0: Implemented forking and tidied up output a little.
→ Version 0.5.0: Updated genome stats and RMarkdown HTML output. Reorganised assembly loading and proeccessing. Added menus.
→ Version 0.5.1: Reorganised code for clearer flow and documentation. Unique and missing BUSCO output added.
→ Version 0.5.2: Dropped paircomp method and added Rmarkdown control methods. Updated Rmarkdown descriptions. Updated log output.
→ Version 0.5.3: Tweaked log output and fixed a few minor bugs.
→ Version 0.5.4: Deleted some excess code and tweaked BUSCO percentage plot outputs.
→ Version 0.5.5: Fixed minlocid bug and cleared up minimap temp directories. Added LnnIDxx to BUSCOMP outputs.
→ Version 0.5.6: Added uniquehit=T/F : Option to use *.hitunique.tdt table of unique coverage for GABLAM coverage stats [False]
→ Version 0.6.0: Added more minimap options, changed defaults and dev generation of a table changes in ratings from BUSCO to BUSCOMP.
→ Version 0.6.1: Fixed bug that was including Duplicated sequences in the buscomp.fasta file. Added option to exclude from BUSCOMPSeq compilation.
→ Version 0.6.2: Fixed bug introduced that had broken manual group review/editing.
→ Version 0.7.0: Updated the defaults in the light of test analyses. Tweaked Rmd report.
→ Version 0.7.1: Fixed unique group count bug when some genomes are not in a group. Fixed running with non-standard options.
→ Version 0.7.2: Added loadsummary=T/F option to regenerate summaries and fixed bugs running without BUSCO results.

• comparimotif_V3: Updated from Version 3.13.0.
→ Version 3.14.0: Modified memsaver mode to take different input formats.

• gablam: Updated from Version 2.29.0.
→ Version 2.30.0: Added mapper=X : Program to use for mapping files against each other (blast/minimap) [blast]
→ Version 2.30.1: Fixed BLAST LocalIDCut error for GABLAM and QAssemble stat filtering.

• gopher: Updated from Version 3.4.3.
→ Version 3.5.0: Added separate outputs for trees with different alignment programs.
→ Version 3.5.1: Added capacity to run DNA GOPHER with tblastx. (Not tested!)
→ Version 3.5.2: Added acc=LIST as alias for uniprotid=LIST and updated docstring for REST to make it clear that rest=X needed.

• haqesac: Updated from Version 1.12.0.
→ Version 1.13.0: Modified qregion=X,Y to be 1-L numbering.

• pagsat: Updated from Version 2.4.0.
→ Version 2.5.0: Reduced the executed code when mapfas=T assessment=F. (Recommended first run.) Added renaming.
→ Version 2.5.1: Added recognition of *.gbff for genbank files.
→ Version 2.6.0: Added mapper=X : Program to use for mapping files against each other (blast/minimap) [blast]
→ Version 2.6.1: Switch failure to find key report files to a long warning, not program exit.
→ Version 2.6.2: Fixed bugs with mapper=minimap mode and started adding more internal documentation.
→ Version 2.6.3: Fixed default behaviour to run report=T mode.
→ Version 2.6.4: Fixed summary table merge bug.
→ Version 2.6.5: Fixed compile path bug.
→ Version 2.6.6: Fixed BLAST LocalIDCut error for GABLAM and QAssemble stat filtering.
→ Version 2.6.7: Generalised compile path bug fix.
→ Version 2.6.8: Added ChromXcov fields to PAGSAT Compare.

• pingu_V4: Updated from Version 4.9.0.
→ Version 4.9.1: Fixed Pairwise parsing and filtering for more flexibility of input. Fixed fasid=X bug and ppiseqfile names.
→ Version 4.10.0: Added hubfield and spokefield options for parsing hublist.

• qslimfinder: Updated from Version 2.2.0.
→ Version 2.3.0: Modified qregion=X,Y to be 1-L numbering.

• samphaser: Created/Renamed/moved.
→ Version 0.0.0: Initial Compilation.
→ Version 0.1.0: Updated SAMPhaser to be more memory efficient.
→ Version 0.2.0: Added reading of sequence and generation of SNP-altered haplotype blocks.
→ Version 0.2.1: Fixed bug in which zero-phasing sequences were being excluded from blocks output.
→ Version 0.3.0: Made a new unzip process.
→ Version 0.4.0: Added RGraphics for unzip.
→ Version 0.4.1: Fixed MeanX bug in devUnzip.
→ Version 0.4.2: Made phaseindels=F by default: mononucleotide indel errors will probably add phasing noise. Fixed basefile R bug.
→ Version 0.4.3: Fixed bug introduced by adding depthplot code. Fixed phaseindels bug. (Wasn't working!)
→ Version 0.4.4: Modified mincut=X to adjust for samtools V1.12.0.
→ Version 0.4.5: Updated for modified RJE_SAMTools output.
→ Version 0.4.6: splitzero=X : Whether to split haplotigs at zero-coverage regions of X+ bp (-1 = no split) [100]
→ Version 0.5.0: snptable=T/F : Output filtered alleles to SNP Table [False]
→ Version 0.6.0: Converted haplotig naming to be consistent for PAGSAT generation. Updated for rje_samtools v1.21.1.
→ Version 0.7.0: Added skiploci=LIST and phaseloci=LIST : Optional list of loci to skip phasing []
→ Version 0.8.0: poordepth=T/F : Whether to include reads with poor track probability in haplotig depth plots (random track) [False]

• seqmapper: Updated from Version 2.2.0.
→ Version 2.3.0: Added GABLAM-free method.

• seqsuite: Updated from Version 1.19.1.
→ Version 1.20.0: Added rje_paf.PAF.
→ Version 1.21.0: Added NG50 and LG50 to batch summarise.
→ Version 1.22.0: Added BUSCOMP to programs.
→ Version 1.23.0: Added rje_ppi.PPI to programs.

• slimbench: Updated from Version 2.18.2.
→ Version 2.18.3: Added better handling of motifs without TP occurrences for OccBench. Added minocctp=INT.
→ Version 2.18.4: Fixed ELMBench rating bug.
→ Version 2.18.5: Fixed Balanced=F bug.
→ Version 2.19.0: Implemented dataset=LIST: List of headers to split dataset into. If blank, will use datatype defaults. []

• slimfarmer: Updated from Version 1.9.0.
→ Version 1.10.0: Added appending contents of jobini file to slimsuite=F farm commands.

• slimfinder: Updated from Version 5.3.4.
→ Version 5.3.5: Fixed slimcheck and advanced stats models bug.
→ Version 5.4.0: Modified qregion=X,Y to be 1-L numbering.

• slimparser: Updated from Version 0.5.0.
→ Version 0.5.1: Minor docs and bug fixes.
→ Version 0.6.0: Improved functionality as replacement pureapi with rest=jobid and rest=check functions.

• slimsuite: Updated from Version 1.7.1.
→ Version 1.8.0: Added BUSCOMP and basic test function.
→ Version 1.8.1: Updated documentation and added IUPred2. General tidy up and new example data for protocols paper.

• smrtscape: Updated from Version 2.2.2.
→ Version 2.2.3: Fixed bug where SMRT subreads are not returned by seqlist in correct order. Fixed RQ=0 bug.

• snapper: Updated from Version 1.6.1.
→ Version 1.7.0: Added mapper=minimap setting, compatible with GABLAM v2.30.0 and rje_paf v0.1.0.


© RJ Edwards 2019. Last modified 27 May 2019.

Monday, 2 July 2018

SLiMSuite Downloads

UPDATE: Please see the Downloads page for the most recent release.



The current SLiMSuite release is v1.4.0 (2018-07-02) and can be downloaded by clicking the button (left).

In addition to the tarball available via the links above, SLiMSuite is available as a GitHub repository (right).

DOI

See also: Installation and Setup.

Previous Releases