Saturday, 17 November 2012

Using SLiMFinder to discover "local motifs" in protein sequences

The makers of the highly successful MEME Suite have another tool out:
DLocalMotif: A discriminative approach for discovering local motifs in protein sequences
I've not had a chance to go over it in detail but it looks like it could be pretty useful, especially for subcellular targeting motifs. There is one thing that rankles me slightly, though. They define a "local motif" as
"patterns in DNA or protein sequences that occur in a short sequence interval relative to a sequence anchor or landmark."
They then go on to say:
"We believe that DLocalMotif is the only tool for discovering local motifs in protein sequences."
This is just a quick post to point out that SLiMFinder will happily find "local motifs" in protein sequences using the start and end of the sequence as an anchor or landmark. I think it is more limited than DLocalMotif as it is restricted to SLiMs that are very proximal to the sequence termini but it features the usual SLiMChance probability calculations and corrections for evolutionary relationships. (Even without restricting to searches relative to anchor points, SLiMFinder is very successful at finding the KDEL motif and C-terminal PDZ ligand motifs.) The max distance from the termini can be set by maxwild=X up to a limit of 9aa.

If you want to restrict yourself to just N- or C-terminal motifs, use the musthave=LIST option:
  • musthave="^" for N-terminal motifs.
  • musthave="$" for C-terminal motifs.
  • musthave="^,$" for both.
  • If you want to anchor the motifs internally, this can be done too with a bit of imagination. Just insert an non-standard amino acid character (e.g. Z) at the anchor position, set the expanded alphabet using alphabet=LIST and then force the motif to have the new symbol using musthave=X, e.g.:
    alphabet="A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y,Z" musthave=Z
    I must confess that I have never tried this but it should work and I am happy to help iron out any wrinkles.

    You can also use position-specific or case masking to restrict motif analysis to certain regions of input proteins. This is probably even better than simply constraining the motif location, as it will reduce the sequence search space rather than the motif search space.

    (BTW, SLiMFinder also has an experimental feature for using a negative dataset (negatives=FILE if anyone wants to try it out.)

    No comments:

    Post a Comment