Friday, 21 December 2012

New SLiMSuite, SeqSuite and RJESuite downloads available

Just in time for Christmas, new releases of all the downloads are available at the Edwards Lab software page. Documentation is still lagging behind but will hopefully catch up (along with a bit of an overhaul of this blog). Questions welcome in the meantime.

In addition to QSLiMFinder 1.4, the biggest change this release is probably the upgrade of GOPHER. Version 3.x features improved organisation of output files for queries from different species in addition to a capacity to have several different multiple alignment programs run on the same orthologue sets. See the website for more info.

Updates since last release:

• gopher: Created.
→ Version 3.0: See archived GOPHER 1.9 and gopher_V2 2.9 for history and obselete options.
→ Version 3.0: Added organise=T/F and gopherdir=PATH for improved file organisation. Tightened savespace.
→ Version 3.0: Added compfilter=T/F for improved complexity filter and composition statistics control for *initial* BLAST.
→ Version 3.0: Changed default tree extension to *.nwk for compatibility with MEGA. Deleted _phosAlign() method.
→ Version 3.0: Added orthology ID option and alignment program to customise output further.
→ Version 3.1: Added full reciprocal best hit method. (fullrbh=T/F)

• gopher_V2: Updated from Version 2.8.
→ Version 2.9: Deleted oldStigg() method. Added simple Reciprocal Best Hit orthology prediction.

• qslimfinder: Updated from Version 1.2.
→ Version 1.3: Updated the output for Max/Min filtering and the pickup options.
→ Version 1.4: Added additional dictionary and list to store Query dimers and SLiMs for motif space calculations.
→ Version 1.4: Added qexact=T/F option for calculating Exact Query motif space (True) or estimating from dimers (False).

• slimfinder: Updated from Version 4.2.
→ Version 4.3: Updated the output for Max/Min filtering and the pickup options. Removed TempMaxSetting.
→ Version 4.4: Modified to work with GOPHER V3.0.

• rje: Updated from Version 4.3.
→ Version 4.4: Added lineFromIndex(target,file,re_index='^(\S+)\s',sortunique=False,xreplace=True).

• rje_seq: Updated from Version 3.13.
→ Version 3.14: Added CLUSTAL Omega alignment program ['clustalo']
→ Version 3.15: Added PAGAN alignment program ['pagan'] and (hopefully) fixed minor Windows fastacmd bug.

• rje_sequence: Updated from Version 2.1.
→ Version 2.2: Added more yeast species.

• rje_slimcalc: Updated from Version 0.4.
→ Version 0.5: Altered to use GOPHER V3 and handle nested alignment directories.

• rje_slimlist: Updated from Version 1.0.
→ Version 1.1: Modified to work with GOPHER V3.0 for alignments.

Wednesday, 28 November 2012

QSLiMFinder 1.4: quicker and more efficient - available on request

The on-going benchmarking of QSLiMFinder has thrown up a couple of discoveries to date. The first is that, reassuringly, it appears to work. (More on this another time.) The second is that it is slow. Or, at least, it was slow.

Thankfully, the cause of its surprisingly slow performance (compared to SLiMFinder) has been tracked down and fixed. At the same time, a (related) potential memory issue with large query sequences has also been sorted out.

The underlying problem is unlikely to have had a large effect on the SLiM prediction itself, although this is currently under investigation. The last release of SLiMSuite was only last week and, as QSLiMFinder is not officially published and released yet, I will not be compiling a new download immediately to take advantage of the improvements. The revised code is available on request if anyone is using QSLiMFinder.

Saturday, 24 November 2012

New SLiMSuite, SeqSuite and RJESuite releases are now available

New releases of SLiMSuite, SeqSuite and RJESuite are now available from the Edwards Lab software page.

Please note that the documentation (particularly the manuals) are still lagging a bit behind, so do report anything that does not make sense. The default settings also need to be verified as there is a chance that some of these may have inadvertently changed over the years. (The same core code is now used for the webservers, which often have different defaults.) Checking these along with updating and checking the servers themselves are ongoing priorities.

A full list of updated modules is given below. As well as SLiMMaker now handling end of sequence characters, the biggest changes this release are updates to CompariMotif to (3.7) output unmatched input motifs and (3.8) improve handling of partially overlapping ambiguous positions (e.g. [AGS] and [ST]). The motivation behind both these changes is the ongoing benchmarking (and preparation for publication) of QSLiMFinder and the creation of SLiMBench for benchmarking motif prediction methods. A QSLiMFinder section has been added to the SLiMFinder Manual (section 5.4). SLiMBench is still a work in progress and will be documented in a later release.

Updates since last release:

• comparimotif_V3: Updated from Version 3.6.
→ Version 3.7: Added coreIC and output of unmatched motifs.
→ Version 3.8: Added overlaps=T/F : Whether to include overlapping ambiguities (e.g. [KR] vs [HK]) as match [True]
→ Version 3.8: Changed scoring of overlapping ambiguities - uses IC of all possible ambiguities. Added "Ugly" match type.

• slimbench: Created.
→ Version 0.0: Initial Compilation.
→ Version 0.1: Functional version with benchmarking dataset generation.
→ Version 1.0: Consolidation of "working" version with additional basic benchmarking analysis.
→ Version 1.1: Added simulated dataset construction and benchmarking.
→ Version 1.2: Added MinIC filtering to benchmark assessment. Sorted beginning/end of line for reduced ELMs.
→ Version 1.3: Made SimCount a list rather than Integer. Sorted CompariMotif assessment issue.
→ Version 1.4: Added ICCut and SLiMLenCut as lists and output columns.
→ Version 1.5: Added Summary Results output table. Removed PropRes.

• slimmaker: Updated from Version 1.0.
→ Version 1.1: Modified to work with end of line characters.

• slimsearch: Updated from Version 1.5.
→ Version 1.6: Minor tweaks to Log output. Add option for UPC number in occ output.

• rje: Updated from Version 4.1.
→ Version 4.2: Modified INI reading across the board to look in ../settings/ and look for defaults.ini as well as rje.ini.
→ Version 4.2: Enabled handing on -ini FILE in addition to ini=FILE.
→ Version 4.3: Added ilist and nlist types to cmdRead for objects. (Lists of integers and floats). Add ratio() function.

• rje_blast: Updated from Version 1.13.
→ Version 1.14: Added blast.checkProg(qtype,stype) to check whether blastp setting matches sequence formats.

• rje_db: Created.
→ Version 0.0: Initial Compilation.
→ Version 0.1: Added merge tables option.
→ Version 0.2: Miscellaneous updates to various methods.
→ Version 0.3: Minor doc tweaks and added keepFields().

• rje_seq: Updated from Version 3.12.
→ Version 3.13: Updated sequence type checking for use with GABLAM 2.10.

• rje_seqlist: Created.
→ Version 0.0: Initial Compilation. Based on rje_seq 3.10.
→ Version 0.1: Added basic species filtering and sequence output.
→ Version 0.2: Added upper case filtering.
→ Version 0.3: Added accnum filtering and sequence renaming.
→ Version 0.4: Added sequence redundancy filtering.
→ Version 0.5: Added newgene=X for sequence renaming (newgene_spcode__newaccXXX). NewAcc no longer fixed Upper Case.
→ Version 1.0: Upgraded to "ready" Version 1.0. Added concatenate=T and split=X options for sequence concatenation.
→ Version 1.0: Added reading of sequence type from rje_seq.py and mixed=T/F.
→ Version 1.1: Added shortName() and modified SeqDict.

• rje_sequence: Updated from Version 2.0.
→ Version 2.1: Added re_unirefprot = re.compile('^([A-Za-z0-9\-]+)\s+([A-Za-z0-9]+)_([A-Za-z0-9]+)\s+')

• rje_slim: Updated from Version 1.5.
→ Version 1.6: Fixed splitting bug introduced by lower case motifs.

• rje_slimcore: Updated from Version 1.8.
→ Version 1.9: Minor modifications to Log output. Updated motifSeq() function to output unmasked sequences.

• rje_slimlist: Updated from Version 0.6.
→ Version 1.0: Functional module with lower case motif splitting fixed and ? -> .{0,1} replacement.

• rje_zen: Updated from Version 1.0.
→ Version 1.1: Added a few more words here and there.

Saturday, 17 November 2012

Using SLiMFinder to discover "local motifs" in protein sequences

The makers of the highly successful MEME Suite have another tool out:
DLocalMotif: A discriminative approach for discovering local motifs in protein sequences
I've not had a chance to go over it in detail but it looks like it could be pretty useful, especially for subcellular targeting motifs. There is one thing that rankles me slightly, though. They define a "local motif" as
"patterns in DNA or protein sequences that occur in a short sequence interval relative to a sequence anchor or landmark."
They then go on to say:
"We believe that DLocalMotif is the only tool for discovering local motifs in protein sequences."
This is just a quick post to point out that SLiMFinder will happily find "local motifs" in protein sequences using the start and end of the sequence as an anchor or landmark. I think it is more limited than DLocalMotif as it is restricted to SLiMs that are very proximal to the sequence termini but it features the usual SLiMChance probability calculations and corrections for evolutionary relationships. (Even without restricting to searches relative to anchor points, SLiMFinder is very successful at finding the KDEL motif and C-terminal PDZ ligand motifs.) The max distance from the termini can be set by maxwild=X up to a limit of 9aa.

If you want to restrict yourself to just N- or C-terminal motifs, use the musthave=LIST option:
  • musthave="^" for N-terminal motifs.
  • musthave="$" for C-terminal motifs.
  • musthave="^,$" for both.
  • If you want to anchor the motifs internally, this can be done too with a bit of imagination. Just insert an non-standard amino acid character (e.g. Z) at the anchor position, set the expanded alphabet using alphabet=LIST and then force the motif to have the new symbol using musthave=X, e.g.:
    alphabet="A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y,Z" musthave=Z
    I must confess that I have never tried this but it should work and I am happy to help iron out any wrinkles.

    You can also use position-specific or case masking to restrict motif analysis to certain regions of input proteins. This is probably even better than simply constraining the motif location, as it will reduce the sequence search space rather than the motif search space.

    (BTW, SLiMFinder also has an experimental feature for using a negative dataset (negatives=FILE if anyone wants to try it out.)

    Saturday, 27 October 2012

    SLiMMaker now handles end of sequence characters

    SLiMMaker is a simple tool for generating regular expression motifs from aligned peptide sequences. It was originally made for making new SLiM definitions based on a set of aligned occurrences and therefore the ends of these peptides are typically not the actual ends of sequences. Sometimes, however, such as in the KDEL ER retrieval motif, they are at (or near) the end (or beginning) of a protein and you might want that taken into consideration when generating a motif.

    The SLiMMaker website will now accept beginning (^) and end ($) of seuqence characters. The peptides still need to be aligned, so if using them you should fill in any non-conforming peptides with an X. If the peptides are of different lengths and the end of lines character ends up appearing within an ambiguous position with regular amino acids (e.g. [$AGS]), SLiMMaker will truncate the regular expression at that point (not including that position) - otherwise it just gets too messy!

    This upgrade will appear in the next distributions of SLiMSuite and SeqSuite. As always, if it is not clear then just try it out with some test sequences. (And report any odd behaviour.)

    Wednesday, 26 September 2012

    SLiMPrints paper published

    The SLiMPrints paper is now available online at Nucleic Acids Research (doi: 10.1093/nar/gks854; PMID: 22977176).

    SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions

    Davey NE, Cowan JL, Shields DC, Gibson TJ, Coldwell MJ, Edwards RJ.
    Large portions of higher eukaryotic proteomes are intrinsically disordered, and abundant evidence suggests that these unstructured regions of proteins are rich in regulatory interaction interfaces. A major class of disordered interaction interfaces are the compact and degenerate modules known as short linear motifs (SLiMs). As a result of the difficulties associated with the experimental identification and validation of SLiMs, our understanding of these modules is limited, advocating the use of computational methods to focus experimental discovery. This article evaluates the use of evolutionary conservation as a discriminatory technique for motif discovery. A statistical framework is introduced to assess the significance of relatively conserved residues, quantifying the likelihood a residue will have a particular level of conservation given the conservation of the surrounding residues. The framework is expanded to assess the significance of groupings of conserved residues, a metric that forms the basis of SLiMPrints (short linear motif fingerprints), a de novo motif discovery tool. SLiMPrints identifies relatively overconstrained proximal groupings of residues within intrinsically disordered regions, indicative of putatively functional motifs. Finally, the human proteome is analysed to create a set of highly conserved putative motif instances, including a novel site on translation initiation factor eIF2A that may regulate translation through binding of eIF4E.
    Server available at bioware.ucd.ie.

    Tuesday, 21 August 2012

    CompariMotif V3.6

    Both the webserver and SLiMSuite download have now been updated to CompariMotif Version 3.6.

    One of the main changes over earlier versions of CompariMotif is that ELM Class exports can now be used directly as input. Any tab-delimited file (*.tdt or *.tsv) with a "Regex" field header will be recognised and motifs extracted using the ELMIdentifier, Regex and Description fields.

    To handle some of the more complex motifs now in ELM, the motif splitting function has also been expanded and strengthened. (The caveat is the small bug in the current download which will be fixed in the next release.) In addition to either/or regex elements, variable length non-wildcards are now recognised and will be reformatted for CompariMotif searching.

    For example:
    Motif [IL]{1,2}[^P].(RG|K)
    will be split to form:
    Motif_a [IL][^P].RG
    Motif_b [IL][^P].K
    Motif_c [IL][IL][^P].RG
    Motif_d [IL][IL][^P].K
    The descriptions will also be appended with "Version 1", "Version 2" etc.

    The final minor update replaces ? in motif Regex patterns with .{0,1}. As it uses the same modules for motif input, these changes will also affect SLiMSearch input but not (yet) be taken into consideration for the SLiMChance statistics.

    Note. The 2008 version of ELM on the bioware server was manually split and therefore the suffixes added by CompariMotif for later versions of ELM will not necessarily match up.

    Monday, 20 August 2012

    Minor bug in CompariMotif 3.6 download

    The recent fixing up of the server has highlighted a couple of bugs that have crept in during the upgrading of CompariMotif to handle complex motifs better. (More on this in a later post.) The problem is restricted to lower case motifs, so stick to upper case and all will be well. It will be fixed in the next release. A replacement rje_slimlist.py file is available on request if anyone needs to fix it urgently. (This also features replacement of "?" characters with "{0,1}" so these motifs are no longer rejected.)

    Saturday, 18 August 2012

    Bioware servers back up

    Following the recent move-induced technical issues, the Bioware webservers are now back on line, with the exception of SLiMSearch 1.0. SLiMSearch2 is also operating with reduced function until another upgrade can be performed in the near future. CompariMotif has been upgraded to Version 3.6 in line with the recent package updates.

    A few of the links to help pages etc. might still be broken, so please report anything that is not quite behaving as expected.

    Friday, 17 August 2012

    New software downloads available

    SLiMsuite and SeqSuite downloads have now been updated and are available from my homepage.

    The only real change of note since the last release is that CompariMotif can now recognise ELM Class downloads and the motif splitting has been updated to cope with variable-length non-wildcard repeats (e.g. R{0,1}).

    The CompariMotif webserver is undergoing some final checks and should be back up shortly, with SLiMFinder and SLiMSearch to follow.

    Saturday, 11 August 2012

    SLiMSuite servers temporarily down

    Following some technical issues that have arisen due to some quite major behind-the-scenes re-organisation, the Bioware webserver implementations of SLiMSuite programs have been temporarily taken off-line.

    The plan is to give them a bit of attention, iron out any wrinkles that have developed, and then hopefully get them back online over the next week or so. The servers affected are:
    • CompariMotif
    • GOPHER
    • SLiMDisc (now part of SLiMFinder)
    • SLiMFinder
    • SLiMSearch
    • SLiMSearch 2.0
    These programs (except SLiMSearch 2.0) are available for download. If you have a particular need for any of these servers, please contact me and I will accelerate its re-appearance.

    As each server becomes available, it will be posted here.

    Friday, 10 August 2012

    Bioware servers behaving erratically

    The SLiMSuite servers housed at bioware.ucd.ie are currently behaving a bit strangely. This is under investigation and we hope to have it sorted out soon. If you spot any odd or unexpected behaviour with the servers, please let us know.

    Friday, 27 July 2012

    New software downloads now available

    Updated versions of all packages are now available for download. Unfortunately, due to limited time availability, the manuals are getting a little out of date with respect to all the available functions but the readme pages contain all the latest options and defaults. Please contact me if you find any bugs and/or want specific documentation improved. It is on the (long) list of things to try and get done over the summer!

    Note that in a slight modification of previous releases, zip downloads now contain the creation date in the name (e.g. rjesuite.2012-07-26.tar.gz) and will be archived.

    Tuesday, 19 June 2012

    "Iterative" SLiMMaker function added

    The SLiMMaker website (and download once the new release is put up) now has an "iterate" function that will produce both a motif and a set of sequences, all of which match that motif. Basically, the input sequences matching the motif produced by SLiMMaker keep getting put back through SLiMMaker using the same settings until the motif produced matches all of the input (or there is no motif produced). Obviously, if the first SLiM produced matches all of the input, this mode will behave just like the original.

    At some point, I will add some more documentation, including some examples.

    Tuesday, 15 May 2012

    Bioinformatics Postdoc Position available!

    A two-year BBSRC-funded postdoc position is now available to work in the Edwards lab developing and applying QSLiMFinder. Informal enquiries are encouraged. You can apply or get further details here. The blurb:
    You are invited to apply for the post of Research Fellow to work closely with Dr Richard Edwards on a BBSRC-funded project to develop and apply computational tools for the prediction of protein motifs that mediate protein-protein interactions.

    Many protein-protein interactions are mediated by Short Linear Motifs (SLiMs): short stretches of proteins (5-15 amino acids long), of which only a few positions are critical to function. These motifs are vital for biological processes of fundamental importance, such as signalling pathways and targeting proteins to the correct part of a cell.

    This position represents an exciting opportunity to join one of the early pioneers in the growing field of SLiM prediction. The primary objective of this project is to integrate a number of leading computational techniques to predict novel SLiMs and, in so doing, add crucial detail to protein-protein interaction networks. This will generate a valuable resource of potential SLiMs, including defined occurrences and interactions.

    The project will use a number of computational and sequence analysis techniques. Basic programming skills are essential. Experience with database design, HPC and web programming are desirable. You will be required to develop a thorough knowledge of SLiM-mediated protein-protein interactions and should therefore be comfortable with biological literature, biochemistry, molecular evolution and structural biology.

    A background in either computer science or biology, with a PhD in a relevant subject area, is essential. Previous research experience (PhD or Postdoctoral) in computational biology is highly desirable. Candidates with a computer science background must demonstrate an interest and aptitude for molecular biology. Similarly, candidates with a biology background must demonstrate an interest and aptitude for computer programming.

    You should be an enthusiastic researcher, a good team-worker and an excellent communicator. Project management skills and independent research experience are desirable.

    The position is full-time and available immediately for a period of up to two years.

    The closing date for this position is 15 June 2012. Please apply online through www.jobs.soton.ac.uk or alternatively telephone 023 8059 2750 for an application form. Please quote reference number 119512BJ on all correspondence. In addition to submitting your CV, please enclose a personal statement highlighting your research interests and experience, as outlined in the accompanying Further Particulars. Please note that the project is 100% computational.

    Wednesday, 9 May 2012

    SLiMSuite servers and programs

    An emerging field of biology is the role of intrinsically disordered regions in protein function and, specifically, protein-protein interactions (PPI) [1-2]. Of particular interest, Short, Linear Motifs (SLiMs) playing a vital role in disorder-mediated PPI, acting as ligands for molecular signalling, post-translational modifications and subcellular targeting [3]. SLiMs have extremely compact protein interaction interfaces, generally encoded by less than 4 major affinity-/specificity-determining residues within a stretch of 2-10 residues [4]. Their small size enables high functional density and evolutionary plasticity, which is frequently exploited by rapidly evolving pathogens that use them to hijack cellular processes [5]. These same features also make experimental discovery a challenge and considerable attention has therefore been given to computational methods for SLiM prediction and analysis [6].

    A number of these tools have been developed by the Edwards and Shields labs [7-11] and made available as part of the SLiMSuite package and online as webservers (http://bioware.ucd.ie) [9-10,12-14], with two new tools, SLiMPrints and QSLiMFinder, currently in preparation for submission, and SLiMMaker to be added soon. The main tools that form the SLiMSuite package/servers are as follows:
    • SLiMFinder [8,13]: de novo SLiM prediction based on a statistical model of over-represented motifs in unrelated proteins.
    • SLiMDisc [7,12]: de novo SLiM prediction based on heuristic ranking of over-represented motifs in unrelated proteins.
    • SLiMPred [11]: de novo SLiM/MoRF prediction in single proteins based machine learning of motif attributes.
    • SLiMSearch [10]: biological context (disorder & conservation) for searches of pre-defined motifs with under- and over-representation statistics, correcting for evolutionary relationships.
    • SLiMSearch 2.0 [14]: biological context (disorder & conservation) and ranking for proteome-wide searches of pre-defined motifs.
    • SLiMPrints (in prep.): de novo SLiM/MoRF prediction in single proteins from statistical clustering of conserved disordered residues.
    • QSLiMFinder (server coming soon): Query-based variant of SLiMFinder with increased sensitivity and specificity.
    • CompariMotif [9]: Motif-motif comparison tool.
    • SLiMMaker (coming soon): Simple tool for converting aligned peptides or SLiM occurrences into a regular expression motif.
    • GOPHER [12]: Automated orthologue prediction and alignment algorithm. Used for conservation-based masking (SLiMFinder/SLiMSearch) and prediction (SLiMPrints).
    • GABLAM [7] (server coming soon): BLAST-based protein similarity scoring and clustering. Used for SLiMFinder and SLiMSearch adjustments for evolutionary relationships.
    Personnel (and funding applications) permitting, a number of improvements for these resources are planned, including updates to the underlying databases for proteome-wide predictions (SLiMSearch 1.0 & 2.0), conservation analyses (SLiMSearch 1.0 & 2.0, SLiMPrints, GOPHER) and SLiM comparisons (CompariMotif). We also intend to improve the integration of different tools, allowing seamless continuation of analyses. Motif predictions ((Q)SLiMFinder/SLiMPrints/SLiMPred) will be able to be searched directly against known motifs (CompariMotif) or proteomes (SLiMSearch); GOPHER alignments will be accessible for SLiMPrints analyses and even SLiMSearch/(Q)SLiMFinder input; outputs of motif occurrences ((Q)SLiMFinder/SLiMSearch) can be used to redefine motifs using SLiMMaker etc. If you have any other suggestions for improvements, please let us know.


    References:
    [1] Tompa P (2011) Unstructural biology coming of age. Curr Opin Struct Biol 21: 419; [2] Babu MM et al. (2011) Intrinsically disordered proteins: regulation and disease. Curr Opin Struct Biol 21:432; [3] Diella F et al. (2008) Understanding eukaryotic linear motifs and their role in cell signaling and regulation. Front Biosci 13:6580; [4] Davey NE et al. (2012) Attributes of short linear motifs. Mol Biosyst 8:268; [5] Davey NE, Trave G & Gibson TJ (2011) How viruses hijack cell regulation. Trends Biochem Sci 36:159; [6] Davey NE, Edwards RJ & Shields DC (2010) Computational identification and analysis of protein short linear motifs. Front Biosci 15:801; [7] Davey NE, Shields DC & Edwards RJ (2006): SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent. Nucleic Acids Res. 34:3546; [8] Edwards RJ, Davey NE & Shields DC (2007): SLiMFinder: A probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PLoS ONE 2:e967; [9] Edwards RJ, Davey NE & Shields DC (2008): CompariMotif: Quick and easy comparisons of sequence motifs. Bioinformatics 24:1307; [10] Davey NE et al. (2010): SLiMSearch: a webserver for finding novel occurrences of short linear motifs in proteins, incorporating sequence context. Lecture Notes in Bioinformatics 6282:50; [11] Mooney C et al. (2012): Prediction of short linear protein binding regions. J Mol Biol 415:193; [12] Davey NE, Edwards RJ & Shields DC (2007): The SLiMDisc server: short, linear motif discovery in proteins. Nuc Acids Res 35:W455; [13] Davey NE et al. (2010): SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs. Nuc Acids Res 38:W534; [14] Davey NE et al. (2011): SLiMSearch 2.0: biological context for short linear motifs in proteins. Nuc Acids Res 39:W56.

    Sunday, 29 April 2012

    SLiMMaker: regular expressions from aligned peptide sequences

    SLiMMaker has a fairly simple function of reading in a set of sequences and generating a regular expression motif from them. It is designed with protein sequences in mind but should work for DNA sequences too. Input sequences can be in fasta format or just plain text (with no sequence headers) and should be aligned already. Gapped positions will be ignored (treated as Xs) and variable length wildcards are not returned.

    SLiMMaker considers each column of the input in turn and compresses it into a regular expression element according to some simple rules, screening out rare amino acids and converting particularly degenerate positions into wildcards. Each amino acid in the column that occurs at least X times (as defined by minseq=X) is considered for the regular expression definition for that position. The full set of amino acids meeting this criterion is then assessed for whether to keep it as a defined position, or convert into a wildcard.

    First, if the number of different amino acids meeting this criterion is zero or above a second threshold (maxaa=X), the position is defined as a wildcard. Second, the proportion of input sequences matching the amino acid set is compared to a minimum frequency criterion (minfreq=X). Failing to meet this minimum frequency will again result in a wildcard. Otherwise, the amino acid set is added to the SLiM definition as either a fixed position (if only one amino acid met the minseq criterion) or as a degenerate position. Finally, leading and trailing wildcards are removed.

    By default, each defined position in a motif will contain amino acids that (a) occur in at least three sequences each, (b) have a combined frequency of >=75%, and (c) have 5 or fewer different amino acids (that occur in 3+ sequences).

    Note. The final motif only contains defined positions that match a given frequency of the input (75% by default). Because positions are considered independently, however, the final motif might occur in fewer than 75% of the input sequences. Results will indicate the coverage of the input data but SLiMSearch can be used to check the occurrence stats more thoroughly.

    Citation: SLiMMaker is part of the ongoing benchmarking of QSLiMFinder, which should be submitted for publication soon. In the meantime, please cite the SLiMMaker URL: http://bioware.soton.ac.uk/slimmaker.html.

    Availability: SLiMMaker is available on request and will shortly be part of the SLiMSuite package.


    Tuesday, 17 April 2012

    New software downloads now available

    Updated software packages for SeqSuite, SLiMSuite and RJESuite are now available from the Edwards Lab software page. These downloads incorporate a variety of miscellaneous bug fixes and minor updates.

    Monday, 9 January 2012

    Websites back up

    After some scheduled maintenance work that unfortunately took down the SeqSuite and SLiMSuite download and documentation pages, the website is now back up and running again.