The program to search for a protein with a phylogenetic profile that best matches the profile of a pair of given lists of genomes

The Protfile program is designed to search for proteins according to a given phylogenetic profile, which is determined by two given lists of genomes. The algorithm determines the protein that best matches these lists, i.e. the protein whose homologs are present in all genomes from the first list (plus list), but at the same time the best homolog in each genome from the second list (minus list) is less similar to this protein than the best homolog from any genome belonging to the first list. More precisely, the algorithm searches for several best (suboptimal) proteins that satisfy this condition.

In practice, this is how regulatory proteins can be searched by their potential DNA or RNA binding sites, when the plus list consists of genomes containing at least one regulatory site of the type in question, and the minus list consists of genomes that do not contain such a site. Another example is the application of this algorithm to search for proteins encoding characteristic features of an organism (presence/absence of flagellum or photosystems, etc.).

Detailed description of the program is given in its documentation. The program is implemented as a CLI-script in PHP language with a binary computing kernel compiled for 32-bit Windows environment. The source code allows building in UNIX/Linux operating systems.