Laboratory of Mathematic methods and models in bioinformatics,
Institute for Information Transmission Problems,
Russian Academy of Sciences
Laboratory 6, IITP RAS
Key Achievements
Using our developed supercomputer model of chordate evolution, we predicted that gene
ENSXETG00000033176 is responsible for the loss of regenerative potential and development
of the telencephalon in warm-blooded vertebrates compared to cold-blooded ones. Experimental
studies of this gene at the IBCh RAS laboratory confirmed the prediction.
Applying the same model to rodents and primates, we identified genes absent in long-lived
species but present in short-lived ones. We hypothesize these genes are associated with
lifespan, which aligns well with physiological, anatomical, cancer-resistant, and other
characteristics.
Through big data analysis, we statistically demonstrated that bacterial, archaeal, and
algal plastid adaptation to environmental conditions (particularly temperature) correlates
with specific changes in intergenic distances. For instance, high temperatures correspond
to relatively small intergenic distances, while low temperatures correspond to larger ones.
We achieved clustering of proteins encoded in rhodophyte plastids, resulting in a refined
classification within protein families. This revealed genes specific to red algal plastids.
We predicted transcription regulation of the moeB gene by transcription factor Ycf28,
encoded in red algal plastids.
We developed an efficient algorithm and software implementation for predicting ancestral
chromosomal structures and their evolutionary scenarios. The program was used to reconstruct
evolutionary pathways of plastid, mitochondrial, and eukaryotic nuclear gene chromosomal
structures.
We classified and discovered new types of attenuator regulation at transcription and
translation initiation levels, particularly in actinobacteria and alpha-proteobacteria.
We were first to predict T-box-mediated translation initiation regulation and identified
a novel attenuator type characterized by short distances (10-13 bp) between structural
and leader genes, which we hypothesize relates to ribosome reinitialization during leader
peptide translation. We conducted large-scale screening of all regulatory systems in
actinobacteria and alpha-proteobacteria, predicting regulation of genes encoding proteins
with PF00480 or PF14340 domains and hypothesizing their important role in sulfur metabolism.
We advanced the attenuator regulation model by incorporating DNA/RNA secondary structure
dynamics, RNA triplexes, bacterial habitat temperature, and G-quadruplexes. The model
was implemented as an efficient parallel computing program and validated experimentally.
We predicted co-regulation of all genes encoded in Toxoplasma gondii apicoplasts.
Comparative analysis of apicoplast-targeted nuclear-encoded protein extensions revealed
that T. gondii N-termini are on average 1.5 times longer than Neospora caninum
and twice as long as Plasmodium falciparum. We proposed a hypothesis about coccidian
apicoplast activity regulation through post-translational modification of excessively long
N-termini in apicoplast-targeted proteins, playing a key role in coccidian reactivation
via apicoplast reactivation. This hypothesis was recently confirmed experimentally.
We analyzed the complete mitochondrial genome of orthonectid Intoshia linei, placing
it within the annelid crown group. Orthonectid position within annelids was further supported
by synapomorphies shared between I. linei and annelid crown groups in mitochondrial
proteins cox1, cytb, nad6, atp6, and by trnN-cox2 gene order. Orthonectids
were previously considered a separate phylum. We substantiated our hypothesis that orthonectids
are deviant annelids and studied the phylogenetic position of Dicyema sp.
We developed a mathematical model and efficient algorithm for reconstructing joint evolution
of genomic elements and species considering diverse evolutionary events. The model comprises
three modules: reconciliation of gene/protein and species phylogenetic trees, reconstruction
of chromosomal rearrangement evolution, and joint scenarios of regulatory system, gene, and
species evolution. The model was implemented as a supercomputing program.
We constructed evolutionary trees of eubacteria, plastids, and cyanobacteria (the latter
aligned with species trees, enabling identification of species groups with plastids of common
origin), trees of mitochondrial chromosomal structures in sporozoans and rhodophyte plastids,
and seed plant plastid trees with minimal polytomy.
We predicted NtcA and NtcB regulons in cyanobacteria and studied direct repeat insertions
in microevolution of seed plant mitochondria and plastids.
We developed a computational model of RNA polymerase competition in plastids and mitochondria,
validated by known experiments on RNA polymerase competition in barley and Arabidopsis
plastids. We predicted transcriptional regulation of plastid genes involved in sulfate
transport in Viridiplantae. Applying the model to chordates, we proposed mechanisms for
MELAS syndrome and thyroid hormone deficiency effects on phenotype.
We found an efficient algorithm reducing the NP-complete problem of weighted set partition
into two equal-sum subsets to finding a special point on a hypersurface defined by a
low-rank cubic form, yielding an efficient heuristic for the original problem.
We identified a pair of parallel hyperplanes in 30-dimensional space uniquely determined
by unit cube vertices lying on each, without cube vertices but with integer points strictly
between them. We also found a triple of such hyperplanes in 37-dimensional space.
We obtained recursive solutions to several descriptive set theory problems, including an
explicit description of a simple uncountable number set containing no numbers that can
be explicitly described or effectively defined.
We developed an efficient probabilistic algorithm reducing systems of linear equations
to a single linear equation with relatively small integer coefficients having the same
Boolean solutions as the original system, provided the first equation has few redundant
Boolean solutions not satisfying the entire system.