Lab.6 IITP RAS logo
25/04/24
23:09:46

Laboratory of Mathematic methods and models in bioinformatics,
Institute for Information Transmission Problems,
Russian Academy of Sciences

« back

Finding of Multi-Box Regulatory Signal in the Set of Unaligned Sequences

V. Lyubetsky, L. Rubanov

Annotation

In the "Program description" we present an original fast algorithm for local multiple alignment of sequences. Such alignments usually lead to several sites in each source sequence, and the sites correspond to different local alignments of approximately the same quality. In "An application example" and TwoBox distribution we provide several examples of the local alignement found by our algorithm. The optimal local alignment is usually found on the basis of calculating a quality which is a sum of pairwise likenesses for all constituent sites. Our algorithm is based on an idea of the sum calculation only for some constituent sites selected by special procedure of random choice.

An application example

Program description (PDF)

TwoBox distribution

MPICH2 V1.2 libraries & executables


———

General information

The program TwoBox V3.17 is developed for finding a system of similar sites in the set of given sequences. One site (or none) is chosen from each sequence, and those sites should be as close as possible to each other. The program primarily tries to find sites in all sequences, but it also may reject few sequences if such system is better in terms of the functionals used.

The sought-for sites can consist of a single box, or of multiple boxes spaced by linkers of the length which is fixed or varies within the given interval. The lengths of the boxes and intervals may be specified independently of each other. The current version 3.17 supports only sites consisting of one or two boxes, though the same algorithm can also be generalized for greater number of boxes in the site.

In addition, the program allows a user to search for signal in presence of known data about conserved positions within a box. Such data may be submitted in the form of a 'motif' for a box (or boxes). Detailed information about input data and arguments of the program can be found in the program documentation.

The program TwoBox implements an elaboration of previously developed algorithm for finding one-box regulatory signal [1-3] by global optimization of predefined functional of the signal quality. The result is a quasi-optimal solution with the greatest value of the functional reached during the search, which is limited by internal criteria or/and duration or/and number of the algorithm steps.

Due to computational complexity of the algorithm (that grows as the number of boxes increase), the program TwoBox from the very beginning was intended for parallel cluster with intra-processor communications via MPI. Any number of processors will do; the program can busy all CPU available, and calculation time will decrease approximately s-1 times, where s is the number of CPUs. At least two logical processors are required, therefore, the program is capable of working on typical dual-core PC.

The program is provided as an executable for x86 architecture. It is intended for a cluster consisting of several PCs with Windows, interconnected by TCP/IP LAN. The MPI environment is to be established with use of the public domain software MPICH2 v.1.2 (by Argonne National Laboratory). This product (or later version of it) have to be installed at all computers of the cluster. (If it is undesirable to install MPICH2 at the computer, the user can copy mpich2 libraries into the program folder. However, we cannot guarantee operation and functionality of the program in such case.) See the program documentation for additional details.

TwoBox uses a command line interface; it has to be run from a command shell of the operating system. The programming language is С, the compiler is Microsoft Visual Studio 2005 Service Pack 1, name of the executable - twobox.exe. Target CPU is Intel 32-bit. The operating systems tested were Microsoft Windows XP Service Pack 3 and Microsoft Windows Server 2003 Service Pack 2. Using TwoBox on other processors and/or operating systems is possible, but may require to re-compile the program or to carry out additional testing.

Software developer: Dr. L.I. Rubanov, leading scientist, IITP RAS (Kharkevich Institute)
E-mail: rubanov@iitp.ru

References

  1. Danilova L.V., Gorbunov K.U., Gelfand M.S., Lyubetsky V.A. Algorithm of regulatory signal recognition in DNA sequences (2) // Molecular Biology, 2001, V. 35, No. 6, p. 841-848.
  2. S.N. Istomina, L.I. Rubanov. Parallel algorithm of regulatory signal search in bacterial genomes // Information Processes, 2002, V. 2, No 1, p. 85-90 (in Russian)
    http://www.jip.ru/2002/Isto.pdf
  3. L.V. Danilova, V.A. Lyubetsky, M.S. Gelfand. An algorithm for identification of regulatory signals in unaligned DNA sequences, its testing and parallel implementation. In Silico Biology, 2003. In Silico Biology, V. 3, No 1,2, 2003, p. 33-47.
    http://www.bioinfo.de/isb/2003/03/0004/
« back