Wednesday 17 June 2015

Sequence names and species codes for GOPHER

GOPHER (and any tools using orthologue alignments produced by GOPHER) needs sequence names to be formatted in a particular way so that the species information can be corrected parsed. This “SLiMSuite fasta” format is the only sequence format fully supported by SLiMSuite. If you are getting an unexpected error, sequence formatting and naming is one of the first things to check. It should not break any other programs that I know about.

This format is:

>Gene_SPCODE__AccNum [Description]


  • Gene is not used for anything and is purely for easy visual identification.
  • SPCODE is the species code. Where possible, Uniprot species mnemonics should be used but any short code can be used as long as (a) it contains uppercase letter and numbers only (no symbols), and (b) it is consistently used within a species/database. (i.e. you can make it up as long as all sequences from the same species use the same code.)
  • AccNum is the accession number, which is what is used as the unique sequence identifier.
  • Description is optional and can contain any other text.
  • SEQUENCE can be on one or more lines and contain spaces. However, it is best to have a single SEQUENCE line with no whitespace. (Some programs may enforce this.)

Seqsuite can be used to rename and reformat sequences, using the seq and seqlist programs.

Uniprot downloads should be automatically recognised and converted where needed.

Monday 1 June 2015

SLiMSuite release 2015-06-01 now available

A new download of SLiMSuite (release 2015-06-01) is now available. This is the first release in the new git repository at A tarball slimsuite.2015-06-01.tgz is also available, containing the same code. Once unpacked, it should be possible to pull down additional updates with git. (This release corresponds to the UCD svn repo r895.)

The major change since the last release is a general tidying of the repository in preparation for going on GitHub and tidying documentation for the new online help via the SLiMSuite REST Server:


To try out the new documentation for a given program, replace sitemap in the box and click View Documentation. Leaving sitemap in the box will list all modules, which can then be clicked on.

The old PDF Manuals are still included in the release and can be accessed from the EdwardsLab Software page. These will be updated eventually but the focus is currently on getting module docstrings and the online help up-to-date. As ever, please get in touch if you have any questions.

This release also sees the addition of a new tool, SLiMParser for running/parsing the new REST servers. SLiMMaker has also undergone some improvements and now features: (1) basic peptide alignment prior to motif generation; (2) extension of degenerate sites using an “equivalence” list of similar amino acids.

A full list of updates is given below.

Updates since previous release

Updates in tools/:

• gablam: Updated from Version 2.16.1.
→ Version 2.17.0: Added localalnfas=T/F : Whether to output local alignments to *.local.fas fasta file (if local=T) [False]
→ Version 2.17.1: Fixed bug where query and hit lengths were not being output for fullblast.
→ Version 2.18.0: Added blaste filtering to be applied to existing BLAST results.
→ Version 2.19.0: Added maxall=X limits to all-by-all analyses. Added qassemble=T.
→ Version 2.19.1: Fixed handling of basefile and results generation for blastres=FILE.
→ Version 2.19.2: Modified output to be in rank order.

• gopher: Updated from Version 3.4.
→ Version 3.4.1: Fixed stripXGap issue. (Why was this being implemented anyway?). Added REST output.

• haqesac: Updated from Version 1.10.
→ Version 1.10.1: Tweaked QryVar interactivity.
→ Version 1.10.2: Corrected typos and disabled buggy post-HAQESAC data reduction.

• multihaq: Updated from Version 1.2.
→ Version 1.2.1: Updated documentation to include the HAQESAC reference.
→ Version 1.2.2: Switched default to keepblast=T. Added forking blasta=X command to BLAST.

• peptcluster: Updated from Version 1.4.
→ Version 1.5.0: Added peptalign=T/F/X function for aligning peptides using regex or minimal gap addition. Added REST.
→ Version 1.5.1: Updated REST output. Removed peptide redundancy.

• pingu_V4: Updated from Version 4.3.
→ Version 4.4.0: Converted ppicompile=T to ppicompile=LIST.
→ Version 4.5.0: Added hublist=LIST : List of hub genes to restrict pairwise PPI to, and pairwise parsing.

• qslimfinder: Updated from Version 2.0.
→ Version 2.1.0: Added PTMData and PTMList options.

• seqsuite: Updated from Version 1.4.0.
→ Version 1.5.0: Added extatic.ExTATIC and revert.REVERT. NOTE: Dev only.
→ Version 1.5.1: Added 'seq' as alias for 'rje_seq' - want to avoid rje_ prefix requirements.
→ Version 1.6.0: Added mitab and rje_mitab for MITAB parsing.
→ Version 1.6.1: Added extra error messages.
→ Version 1.7.0: Added pingu_V4.PINGU.
→ Version 1.8.0: Added rje_pacbio.PacBio.

• slimbench: Updated from Version 2.8.0.
→ Version 2.8.1: Removed use of Protein name for ELM Uniprot entries due to problems mapping old IDs.
→ Version 2.9.0: Added SLiMMaker ELM reduction table and output.
→ Version 2.9.1: Enabled download only with generate=F benchmark=F.
→ Version 2.10.0: Add generation of table mapping PPIBench dataset generation.

• slimfarmer: Updated from Version 1.4.1.
→ Version 1.4.2: Fixed log transfer issues due to new #VIO line. Better handling of crashed runs.

• slimfinder: Updated from Version 5.1.
→ Version 5.1.1: Modified alphabet handling and fixed musthave bug.
→ Version 5.2.0: Added PTMList and PTMData modes (dev only).

• slimmaker: Updated from Version 1.2.0.
→ Version 1.3.0: Added varlength option to identify gaps in aligned peptides and generate variable length motif.
→ Version 1.3.1: Fixed varlength option to work with end of peptide gaps. (Gaps ignored completely - should not be there!)
→ Version 1.4.0: Add iteration REST output.
→ Version 1.4.1: Add unmatched peptides REST output.
→ Version 1.4.2: Fixed bug with variable length wildcards at start of sequence.
→ Version 1.5.0: Added peptalign=X functionality, using PeptCluster peptide alignment.
→ Version 1.6.0: Added equiv=LIST : List (or file) of TEIRESIAS-style ambiguities to use [AGS,ILMVF,FYW,FYH,KRH,DE,ST]
→ Version 1.6.1: Fixed peptide case bug.

• slimparser: Created/Renamed/moved.
→ Version 0.0.0: Initial Compilation.
→ Version 0.0.1: Fixed RestKeys bug.
→ Version 0.1.0: Added retrieval and parsing of existing server job. Added password.
→ Version 0.2.0: Added API access to REST server if restin is REST call (i.e. starts with http:)
→ Version 0.2.1: Added PureAPI output of API REST call returned text.
→ Version 0.3.0: Added parsing of input files to give to rest calls.
→ Version 0.3.1: Fixed issue that had broken REST server full output.

• slimprob: Updated from Version 2.2.0.
→ Version 2.2.1: Updated REST output.
→ Version 2.2.2: Modified input to allow motif=X in addition to motifs=X.
→ Version 2.2.3: Tweaked basefile setting and citation.

• slimsuite: Updated from Version 1.3.0.
→ Version 1.4.0: Added RLC and Disorder progs to call SLiMCore. Added CompariMotif.
→ Version 1.5.0: Added peptcluster and peptalign calls.

Updates in extras/:

• file_monster: Created/Renamed/moved.
→ Version 0.0: Initial Compilation.
→ Version 1.0: Initial Working version
→ Version 1.1: Broadened away from strict extension-based scavenging to whole file names with wildcards
→ Version 1.2: Added DirSum function and updated FileMonster slightly.
→ Version 1.3: Added redundant file cleanup
→ Version 1.4: Added skiplist and purgelist
→ Version 1.5: Added rename function (to replace Perl module)
→ Version 1.6: Minor bug fix.
→ Version 2.0: Major reworking with new object making use of rje_db tables etc. Old functions to be ported with time.
→ Version 2.1: Added dirsum function.
→ Version 2.2: Added fixendings=FILELIST to convert Mac \\r into UNIX \\n

• prodigis: Created/Renamed/moved.
→ Version 0.0: Initial Compilation.
→ Version 0.1: Added probability calculations based on hydrophobicity, serine and cysteine.
→ Version 0.2: Added cysteine count and cysteine weighting.

• rje_glossary: Created/Renamed/moved.
→ Version 0.0: Initial Compilation.
→ Version 1.0: Working version, including text setup for webserver.
→ Version 1.1: Added href=T option to add external hyperlinks for and [text] in descriptions [True]
→ Version 1.2: Added recognition of _italics_ markup.
→ Version 1.3: Fixed minor italicising bug.
→ Version 1.4: Added keeporder=T/F to maintain input order (e.g. for MapTime).

• rje_itunes: Created/Renamed/moved.
→ Version 0.0: Initial Compilation.
→ Version 0.1: Added Plays/Track, default Album Artist and topHTML() method.

• rje_phos: Created/Renamed/moved.
→ Version 0.0: Initial Compilation. Basic pELM parsing done.
→ Version 0.1: Added phosBLAST method.

• rje_pydocs: Updated from Version 2.14.0.
→ Version 2.15.0: Added parsing and generation of "pages" for new rest server docs functions.
→ Version 2.15.1: Tweaked formatting of outfmt and docstring documentation.
→ Version 2.15.2: Tweaked formatting of docstring documentation.
→ Version 2.15.3: Fixed URL formatting of docstring documentation.
→ Version 2.16.0: Added Webserver tab to doc parsing from settings/*.form.
→ Version 2.16.1: Added parsing of imports within a try/except block. (Cannot be on same line as try: or except:)
→ Version 2.16.2: Tweaked makePages() output.

• rje_seqplot: Created/Renamed/moved.
→ Version 0.0: Initial Compilation.

• rje_ssds: Created/Renamed/moved.
→ Version 0.0: Initial Compilation.

• rje_yeast: Created/Renamed/moved.
→ Version 0.0: Initial Compilation.

• wormpump: Created/Renamed/moved.
→ Version 0.0: Initial Compilation.

Updates in libraries/:

• rje: Updated from Version 4.13.1.
→ Version 4.13.2: Removed excess REST HTML methods.
→ Version 4.13.3: Added uselower=False to dataDict() method.
→ Version 4.13.4: Added maxrep=X to listCombos() method.
→ Version 4.14.0: Added listToDict() method.
→ Version 4.15.1: Fixed matchExp method to be able to handline multilines. (Shame re.DOTALL doesn't work!)

• rje_blast_V2: Updated from Version 2.7.
→ Version 2.7.1: Added capacity to keep alignments following GABLAM calculations.
→ Version 2.7.2: Fixed bug with hitToSeq fasta output for rje_seqlist.SeqList objects.
→ Version 2.8.0: A more significant BLAST e-value setting will filter read results.
→ Version 2.9.0: Added qassemble=T/F : Whether to fully assemble query stats from all hits [False].
→ Version 2.9.1: Updated default BLAST and BLAST+ paths to '' for added modules.

• rje_db: Updated from Version 1.7.1.
→ Version 1.7.2: Fixed numerical join issue during Table.compress().
→ Version 1.7.3: Added lower case enforcement of headers for reading tables from file.
→ Version 1.7.4: Added optional restricted Field set for output.
→ Version 1.7.5: Added more error messages and tableNames() method.

• rje_ensembl: Updated from Version 2.14.
→ Version 2.15.0: Added capacity to download/process a section of Ensembl with speclist=LIST.
→ Version 2.15.1: Improved error handling for too many FTP connections: still need to fix problem!
→ Version 2.15.2: Trying to improve speed of Uniprot parsing for EnsLoci.

• rje_genbank: Updated from Version 1.2.2.
→ Version 1.3.0: Added split viral output.
→ Version 1.3.1: Fixed bug in split viral output.

• rje_html: Updated from Version 0.1.
→ Version 0.2.0: Added delimited text to HTML table conversion.
→ Version 0.2.1: Updated default CSS to

• rje_mitab: Created/Renamed/moved.
→ Version 0.0.0: Initial Compilation.
→ Version 0.1.0: Added complex=LIST : Complex identifier prefixes to expand from mapped PPI [complex]
→ Version 0.1.1: Fixed Evidence/IType parsing bug for BioGrid/Intact.
→ Version 0.2.0: Added splicevar=T/F option.

• rje_obj: Updated from Version 2.1.0.
→ Version 2.1.1: Removed excess REST HTML methods.
→ Version 2.1.2: Tweaked glist cmdRead warnings.

• rje_qsub: Updated from Version 1.6.1.
→ Version 1.6.2: Updated module list: blast+/2.2.30,clustalw,clustalo,fsa,mafft,muscle,pagan,R/3.1.1

• rje_scoring: Updated from Version -.

• rje_seq: Updated from Version 3.21.0.
→ Version 3.22.0: Added loading sequences from provided sequence files contents directly, bypassing file reading.
→ Version 3.22.1: Fixed problem if seqin is blank, triggering odd Uniprot download.
→ Version 3.23.0: Add speclist to reformat options.

• rje_seqlist: Updated from Version 1.10.0.
→ Version 1.11.0: Added more dna2prot reformatting options.

• rje_slim: Updated from Version 1.9.
→ Version 1.10.0: Added varlength option to makeSlim() method.
→ Version 1.10.1: Fixed varlength and terminal position compatibility.
→ Version 1.10.2: Fixed issue of [] returns.
→ Version 1.10.3: Fixed makeSlim bug with variable length wildcards at start of sequence.
→ Version 1.11.0: Added splitMotif() function.
→ Version 1.12.0: Added equiv to makeSlim() function.

• rje_slimcore: Updated from Version 2.6.1.
→ Version 2.7.0: Updating MegaSLiM function to work with REST server. Allow megaslim=seqin. Added iuscoredir=PATH and protscores=T/F.
→ Version 2.7.1: Modified iuscoredir=PATH and protscores=T/F to work without megaslim. Fixed UPC/SLiMdb issue for GOPHER.
→ Version 2.7.2: Fixed iuscoredir=PATH to stop raising errors when file not previously made.
→ Version 2.7.3: Fixed serverend message error.

• rje_slimhtml: Created/Renamed/moved.
→ Version 0.0: Initial Compilation.
→ Version 0.3: Added code for making Random Dataset pages
→ Version 0.4: Updated UPC pages and added additional front pages.
→ Version 0.5: Split front page into front and full. Added GO tabs/pages.
→ Version 0.6: Added XGMML output.
→ Version 0.7: Modified output for HumSF10 and HAPPI analysis.
→ Version 0.8: Added SVG output. Integrated better with HAPPI code.
→ Version 0.9: Added SLiM Descriptions.

• rje_slimlist: Updated from Version 1.6.
→ Version 1.7.0: Added direct feeding of motif file content for loading (for REST servers).
→ Version 1.7.1: Modified input to allow motif=X in additon to motifs=X.
→ Version 1.7.2: Fixed bug that could not accept variable length motifs from commandline. Improved error message.

• rje_specificity: Updated from Version -.

• rje_tree: Updated from Version 2.11.0.
→ Version 2.11.1: Tweaked QryVar interactivity.
→ Version 2.11.2: Updated tree paths.

• rje_tree_group: Updated from Version -.

• rje_uniprot: Updated from Version 3.20.3.
→ Version 3.20.4: Fixed bug introduced by REST access modifications.
→ Version 3.20.5: Improved handling of downloads for uniprot IDs that have been updated (i.e. no direct mapping).
→ Version 3.20.6: Improved handling of zero accession numbers for extraction.
→ Version 3.20.7: Fixed uniformat default error.
→ Version 3.21.0: Added uparse=LIST option to try and accelerate parsing of large datasets for limited information.
→ Version 3.21.1: FullText is no longer stored in Uniprot object. Will need special handling if required.
→ Version 3.21.2: Fixed single uniprot extraction bug.
→ Version 3.21.3: Added REST datout to proteomes extraction.

• rje_xref: Updated from Version 1.3.0.
→ Version 1.3.1: Fixed xref list bug.
→ Version 1.4.0: Added optional Mapping dictionary for speeding up recurring mapping (should avoid if memsaver=F).
→ Version 1.5.0: Added stripvar=CDICT removal of variants using Field:Char list, e.g. Uniprot:-,GenPept:. []
→ Version 1.6.0: Added mapxref=LIST List of identifiers to map to KeyIDs using mapfields []

• rje_zen: Updated from Version 1.3.0.
→ Version 1.3.1: Added some more words.