Identification of highly conserved elements (HCEs) in the set of genomes
The iHCE
software package implements the method described in
[1]. It is aimed at identification of HCEs in a set of relatively well
assembled complete genomes. These programs have been evaluated on nuclear genomes of the
superphylum Alveolata [1] as well as mitochondrial genomes of infusoria (the
phylum Ciliophora) [2] and monocotyledonous plants. The package consists of the
following three programs intended for MPI-enabled supercomputer, corresponding to the three stages
of the method set forth in [1].
- The
PairHits
program finds all pairs of approximately matching words in two sequences from different genomes, thus generating edges of the source graph. - The
BldGraph
program accomplishes a compaction of the source graph converting it into the initial multipartite graph (each part corresponds to a genome). - The
FinDense
program transforms the initial graph into the final one and identifies m-dense subgraphs (clusters) in the latter graph. Each cluster is a connected component composed of vertices (i.e., similar words) that belong to at least m parts and are connected by edges of the greatest total weight.
These programs assume the processing of Big Data and are intended only for 64-bit CPUs and operating systems. Different stages of the algorithm have different computational complexity and scalability. This is why the package has been split into separate programs. In order to decrease the size of files and speed up computation, the programs often use specific data formats almost without any checks. The user is fully responsible for correct compiling and interpreting of such formatted files. For example, they might create a database to store the source data in any desired format and develop a database application or script to get source files in the required format. This is the way we used, but we do not describe it in detail.
All programs are written in C++ and have the command line interface to specify most important
parameters. Settings made in the command line have the highest priority. All adjustable parameters
can be set in the configuration file, which is required and used by these programs for all
parameters except those modified through the command line. If the parameter is not specified
either in command line or in configuration file, the default value will be set in a program,
though not for every parameter. A template configuration file is provided in downloadable examples
below. Short help on the command line options will be displayed if running the program with
argument -?
or --help
.
The Windows 64-bit executables (variants with and without MPI) and the source code for Linux
can be downloaded through the links below. The source code is compatible with most implementations
of MPI v.1.2 and above; it is provided under the GNU General Public License (GPL) v.3.
The MPI-enabled executables assume that MPICH2 64 bit v.1.4.1p1 (the last version developed for
Windows) has been installed in the system. The user can download a complete release from the
developer's site
or use the link below to get just the 64 bit installable file.
An alternative variant, which is supported by the separate set of executables, requires installing
Microsoft MPI v.7.1 64 bit. The redistributable setup file is available through the link below.
Downloadable files
Variant without MPI | Variant with MPICH2 1.4.1p1 | Variant with Microsoft MPI 7.1 | |
---|---|---|---|
PairHits executable for Windows 64-bit |
pairhits64nompi-1.12.zip | pairhits64-1.12.zip | pairhits64ms-1.12.zip |
BldGraph executable for Windows 64-bit |
bldgraph64nompi-2.16.zip | bldgraph64-2.16.zip | bldgraph64ms-2.16.zip |
FinDense executable for Windows 64-bit |
findense64nompi-1.6.zip | findense64-1.6.zip | findense64ms-1.6.zip |
Test example for Windows | ihce-wintest-4.34.zip | ||
MPICH2 1.4.1p1 installable file for Windows 64-bit | mpich2-1.4.1p1-win-x86-64.msi | ||
Microsoft MPI 7.1 redistributable for Windows 64-bit | MSMpiSetup.exe | ||
iHCE v.4.34 source codes and test example for Linux — GNU GPL V3 | ihce-src-4.34.tgz |
References
- L.I. Rubanov, A.V. Seliverstov, O.A. Zverkov, V.A. Lyubetsky. A method for identification of highly conserved elements and evolutionary analysis of superphylum Alveolata. BMC Bioinformatics, 2016, Vol. 17, Art. 385. DOI: 10.1186/s12859-016-1257-5
- R.A. Gershgorin, K.Yu. Gorbunov, O.A. Zverkov, L.I. Rubanov, A.V. Seliverstov, V.A. Lyubetsky. Highly conserved elements and chromosome structure evolution in mitochondrial genomes in ciliates. Life, 2017, Vol. 7, Iss. 1, Art. 9. DOI: 10.3390/life7010009