The program to search for a protein with a phylogenetic profile that best matches the profile of a pair of given lists of genomes
The Protfile
program is designed to search for proteins according to a given
phylogenetic profile, which is determined by two given lists of genomes. The algorithm determines
the protein that best matches these lists, i.e. the protein whose homologs are present in all
genomes from the first list (plus list), but at the same time the best homolog in each genome
from the second list (minus list) is less similar to this protein than the best homolog from any
genome belonging to the first list. More precisely, the algorithm searches for several best
(suboptimal) proteins that satisfy this condition.
In practice, this is how regulatory proteins can be searched by their potential DNA or RNA binding sites, when the plus list consists of genomes containing at least one regulatory site of the type in question, and the minus list consists of genomes that do not contain such a site. Another example is the application of this algorithm to search for proteins encoding characteristic features of an organism (presence/absence of flagellum or photosystems, etc.).
Detailed description of the program is given in its documentation. The program is implemented as a CLI-script in PHP language with a binary computing kernel compiled for 32-bit Windows environment. The source code allows building in UNIX/Linux operating systems.
- Programm documentation (in Russian)
- Download executable (Win32)
- Download an example dataset (1.5 MB)