SLiMMaker considers each column of the input in turn and compresses it into a regular expression element according to some simple rules, screening out rare amino acids and converting particularly degenerate positions into wildcards. Each amino acid in the column that occurs at least X times (as defined by minseq=X) is considered for the regular expression definition for that position. The full set of amino acids meeting this criterion is then assessed for whether to keep it as a defined position, or convert into a wildcard.
First, if the number of different amino acids meeting this criterion is zero or above a second threshold (maxaa=X), the position is defined as a wildcard. Second, the proportion of input sequences matching the amino acid set is compared to a minimum frequency criterion (minfreq=X). Failing to meet this minimum frequency will again result in a wildcard. Otherwise, the amino acid set is added to the SLiM definition as either a fixed position (if only one amino acid met the minseq criterion) or as a degenerate position. Finally, leading and trailing wildcards are removed.
By default, each defined position in a motif will contain amino acids that (a) occur in at least three sequences each, (b) have a combined frequency of >=75%, and (c) have 5 or fewer different amino acids (that occur in 3+ sequences).
Note. The final motif only contains defined positions that match a given frequency of the input (75% by default). Because positions are considered independently, however, the final motif might occur in fewer than 75% of the input sequences. Results will indicate the coverage of the input data but SLiMSearch can be used to check the occurrence stats more thoroughly.
Citation: SLiMMaker is part of the ongoing benchmarking of QSLiMFinder, which should be submitted for publication soon. In the meantime, please cite the SLiMMaker URL: http://bioware.soton.ac.uk/slimmaker.html.
Availability: SLiMMaker is available on request and will shortly be part of the SLiMSuite package.