Lab.6 IITP RAS logo
26/04/24
11:06:26

Laboratory of Mathematic methods and models in bioinformatics,
Institute for Information Transmission Problems,
Russian Academy of Sciences

« back

Program for phylogenetic study of joint evolution of species and genes

The program Embed3GL is intended for solving four phylogenetic problems on the basis of original algorithm [1-4] of polynomial (cubic) complexity. Common source data for the first three problems include:

- rooted species tree, initially a binary one, which is then provided with additional nodes to divide the tree into contemporary slices so that all leaves (extant species) are in the same (deepest) slice. The number af additional nodes on an edge is specified as a "length" of that edge: if the length equals 1 or omitted, no nodes are added; otherwise, the length e.g. 2 indicates that one node is added on that edge, etc. The tree must contain an outgroup, a "species" named Out. The independent program (see below link) can be applied for the species tree time-slicing and insertion the outgroup.

- a set of rooted gene trees, which may not contain polytomous nodes in the current version.

Detailed structure of input/output data is described in the manual.

Problem 1 involves, for each gene tree, the cost computation of this tree embedding into the species tree. The cost value is provided for each gene tree of the input set as well as total over the set. A side effect of the Problem 1 solution is the binarization (binary resolution) of a gene tree if it contains polytomous nodes (not implemented in current version).

Problem 2 is solved on the basis of working data obtained from Problem 1. As a result, for each gene tree, the evolutionary scenario of its embedding into the species tree is built. The scenario is shaped as a tree of evolutionary events that contains both unary and binary edges.

Problem 3 can be solved after binary resolution of input gene trees. Additional user-specified data are required for this problem, namely:
(1) I-type - a fixed set of types of evolutionary events (e.g. loss, gain, duplication, transfer, etc.); and
(2) T-type - a set of gene tree nodes, whose all descendant leaves are specially labeled in one or multiple gene trees (e.g. "a set of ancestors of ribosomal genes").
The Problem 3 output are two functions in tabular form: f(I,x) - average number of I type events in the tube (=edge) x of the species tree; and g(I,T) - average number of I type events occurring for the edges of type T.

Starting from the current version, Embed3GL also provides for solving Problem 4, which is the building of a supertree that amalgamates given binary trees. Instead of gene trees, here Embed3GL uses a set of basis trees built with the program Basis3GL. This method of the supertree building is more precise than the algorithm implemented in Super3GL, but it is significantly slower. This is why we recommend to run Embed3GL on a high performance cluster if Problem 4 is to be solved.

The program embed3GL is written in C/C++ and has a command line interface. The program supports parallelization if an MPI 1.2 (or above) environment is available. The program is portable and can be compiled for Windows 32/64-bit, Linux, Unix, MacOS.

Windows executables (32/64-bit, non-MPI/MPI versions) and the source code for Linux can be freely downloaded from below links. The source code is available free of charge under the GNU General Public License (GPL) version 3.GNU GPL V3

Downloadable files

  Version without MPI Version for MPICH2 1.4.1p
Embed3GL executables for Windows 32bit 1.1.7 1.1.7
Embed3GL executables for Windows 64bit 1.1.7 1.1.7
Embed3GL user's manual (pdf) embed3gl_en
Embed3GL source code for Linux - GNU GPL V3 1.1.7
Windows executable for time-slicing of a species tree time_slices

References

1. Lyubetsky V.A., Rubanov L.I., Rusin L.Yu., Gorbunov K.Yu. "Cubic time algorithms of amalgamating gene trees and building evolutionary scenarios", Biology Direct, 2012, 7:48

2. Gorbunov,K.Yu. and Lyubetsky,V.A. (2009) Reconstructing the evolution of genes along the species tree. Mol. Biol. (Mosk)., 43(5), 881-893.

3. Gorbunov,K.Yu. and Lyubetsky,V.A. (2010) An algorithm of reconciliation of gene and species trees and inferring gene duplications, losses and horizontal transfers. Information Processes, 10(2), 140-144, in Russian.

4. Gorbunov,K.Yu. and Lyubetsky,V.A. (2011) The tree nearest on average to a given set of trees. Problems of Information Transmission, 47(3), 274–288.

« back