Lab.6 IITP RAS logo
25/04/24
17:53:48

Laboratory of Mathematic methods and models in bioinformatics,
Institute for Information Transmission Problems,
Russian Academy of Sciences

« back

Identification of highly conserved elements (HCEs) in the set of genomes

The software package iHCE implements the method described in [1]. It is aimed at the identification of HCEs in the set of relatively well assembled complete genomes. These programs have been evaluated on nuclear genomes of the superphylum Alveolata [1] as well as mitochondrial genomes of infusoria (the phylum Ciliophora) [2] and monocotyledonous plants. The package consists of the following programs intended for MPI-enabled supercomputer:

  • The program PairHits founds all pairs of approximately matching words in two sequences from different genomes, thus generating edges of the source graph. This is the first stage of the method set forth in [1].
  • The program BldGraph accomplishes a compaction of the source graph converting it into the initial multipartite graph (each part corresponds to a genome). This is the second stage of the method set forth in [1].
  • The program FinDense transforms the initial graph into the final one and identifies m-dense subgraphs (clusters) in the latter graph. Each cluster is a connected component composed of vertices (i.e., similar words) that belong to at least m parts and are connected by edges of the greatest total weight. This is the third stage of the method set forth in [1].

These programs assume the processing of Big Data and are intended for only 64-bit CPUs and operating systems. Different stages of the algorithm have different computational complexity and scalability. This is why the package has been split into separate programs. In order to decrease the size of files and speed up computation, the programs often use specific data formats almost without any checks. The user is fully responsible for correct compiling and interpreting of such formatted files. For example, he might create a database to store the source data in any desired format and develop himself a database application or script to get source files in the required format. This is a way we used, but we do not describe it in detail.

All programs are written in C++ and have the command line interface to specify most important parameters. Settings made in the command line have the highest priority. All adjustable parameters can be set in the configuration file, which is required and used by these programs for all parameters except modified through the command line. If the parameter is specified in neither command line nor configuration file, the default value will be set in a program, though not for every parameter. A template configuration file is provided in downloadable examples below. Short help on the command line options will be displayed if running the program with argument -? or --help.

Windows 64-bit executables (variants with and without MPI) and source codes for Linux can be downloaded through the below links. The source codes are compatible with most implementations of MPI v.1.2 and above; they are provided under the GNU General Public License (GPL) v.3.GNU GPL V3
The MPI-enabled executables assume that MPICH2 64 bit v.1.4.1p1 (the last version developed for Windows) has been installed in the system. The user can download a complete release from the developer's site or use the below link to get only 64 bit installable file.
An alternative variant, which is supported by the separate set of executables, requires installing Microsoft MPI v.7.1 64 bit. The redistributable setup file is available through the link below.

Downloadable files

  Variant without MPI Variant with MPICH2 1.4.1p1 Variant with Microsoft MPI 7.1
PairHits executable for Windows 64-bit pairhits64nompi-1.12.zip pairhits64-1.12.zip pairhits64ms-1.12.zip
BldGraph executable for Windows 64-bit bldgraph64nompi-2.16.zip bldgraph64-2.16.zip bldgraph64ms-2.16.zip
FinDense executable for Windows 64-bit findense64nompi-1.6.zip findense64-1.6.zip findense64ms-1.6.zip
Test example for Windows ihce-wintest-4.34.zip
MPICH2 1.4.1p1 installable file for Windows 64-bit mpich2-1.4.1p1-win-x86-64.msi
Microsoft MPI 7.1 redistributable for Windows 64-bit MSMpiSetup.exe
iHCE v.4.34 source codes and test example for Linux - GNU GPL V3 ihce-src-4.34.tgz

References

[1] Rubanov L.I., Seliverstov A.V., Zverkov O.A. and Lyubetsky V.A. Method for identification of highly conserved elements and evolutionary analysis of superphylum Alveolata. (2016) BMC Bioinformatics 17:385. Open Access

[2] Application of our HCE identification method for a study of the chromosomal structure evolution of mitochondrial genome of protists from the phylum Ciliophora. (In Russian)

« back