Model of RNA-related regulation in bacteria

Definition of the model, its implementation and use was described in [1–3, 5, 6]. The model uses Monte-Carlo method with large number of repetitions, so it is designed as a command line application for batch processing in Windows and Linux environment. This Web site allows to run the model directly on the server. It is designed to acquaint with the model and perform some preliminary tests. Due to server constraints, the model operation is somewhat limited; the downloadable version of the program does not have such limitations.

The model has many parameters and options to control its operating mode, performance and output. Some of those arguments are available via the interface form which opens when you “Run model online”. The provided controls are listed below with explanation and references to the equation numbers in [5]. The program distribution includes a detailed description of the complete set of parameters available in the local version.

Description of the interface form

The whole form consists of the following three sections separated by horizontal rules.

The top section of the form: Problem definition

Examples: For testing convenience, here is a list of leader regions for several operons in certain bacteria. Each sequence starts from the start codon of the leader peptide gene and ends with U-run. Wild type genomes are presented as well as two E. coli mutants obtained by substitute A for G at positions 75 and 132 from the start of transcription. (These mutants and experimental data are described in [4].) Select an example and click “Select organism” button; the selected sequence will appear in the “Sequence” field, and other arguments will set to their default values.
Sequence: This field allows user to enter or paste a sequence or make some mutation manually. The sequence must consist of only letters G, C, A, T, U; no other characters are allowed. The sequence is not case sensitive, T and U are considered equivalent.
Amino acid(s): This list allows to select the amino acid on which concentration depends the expression of the gene. Multiple amino acids may be selected using the Ctrl key.
Concentration: For the amino acid selected, user should enter its relative concentration that is a quantity in the range [0, 1]. Hereinafter a period is used as decimal separator. Exponential form is also allowed, e.g. 2.0e-3 = 0.002. Each model run deals with the concentration value specified here. Therefore, if you want to obtain a dependency of the concentration, you need to run the model several times or to use the downloadable version, which includes automation scripts.
Exclude positions: Here user may specify a comma-separated list of up to 6 segments of the sequence that cannot participate in complementary bond, e.g. due to a ligand bond. Each segment is encoded by two numbers, first being start position in the sequence and second being length of the segment. The field cannot contain spaces or other characters except digits or commas.

The middle section of the form: Model parameters

The “Energy calculation” row:

α, l_max: These parameters allow for correction in bonding energy of the microstate according to Eq. 26. By default, α=0 and the correction is not applied.
B_loop, B_bulge, B_intloop, C: Parameters B and C for computation of loops energy (see Eq. 2), respectively, for terminal loop (hairpin loop), bulge and internal loop (two-side bulge) cases. Value of C is the same in all cases.
T°: Thermodynamic temperature of the environment.

The “Transitions” row:

Formula: Choice of equation for transition rate between macrostates: symmetric (Eq. 6) or asymmetric (Eqs. 4,5). Options marked “+q” use additional weighting factor for calculated transition rate, see [5, sec. 4.1]. The weighting factor exponentially decrease with occurrence frequency of the target macrostate. This helps to reduce an oscillatory looping, thus improving the model performance.
MAX #: A limit of slow transitions between macrostates. Upon reaching this threshold, either ribosome or polymerase move becomes possible rather than further transitions.
κ: A constant of slow closure that represents the “viscosity” of cytoplasm.
λ_pol: A constant of polymerase transition rate to the next nucleotide (in absence of slowdown caused by the secondary structure).
λ_rib: A constant of ribosome transition rate to the next codon (from a non-regulatory codon).
λ_ura: A constant of the polymerase slip off rate at a U-rich region.
q: A constant determining diminution rate of weighting factor mentioned in “Formula” item above. Only used if an equation marked “+q” is chosen.
S_pol: Initial position after which 5′-edge of polymerase starts.
S_rib: Initial position of the P-region of ribosome.

The “Polymerase slowdown” row:

β: Additional coefficient in argument of tangent in Eq. 16. Equals 1 by default, i.e. not used. A zero value is also allowed.
L₁, p₀, r₀, δ: These parameters are used in Eq. 14 which defines a polymerase slowdown force caused by a hairpin of RNA secondary structure.
Hairpin: Choice of algorithm using Eq. 20 or Eq. 21, that is to consolidate slowdown from all existing hairpins (SUM) or only from one hairpin having maximum effect on polymerase (MAX).
MAX_bulge, MAX_intloop: Respectively, threshold size of bulge and internal loop that will be ignored in hairpin stem determination.

The “Limits” row:

Helix: Minimum shoulder length of a helix.
Hypoelix: Minimum shoulder length of a hypohelix.
MIN_loop, MAX_loop: Minimum and maximum length of helix loop.
L_pol: “Size” of polymerase from exit point of RNA strand to the point of transcription.
L_rib: “Size” of ribosome from the P-region to 3′-edge.
U_minlen: Minimum length of U-rich segment or minimal number of letters U in U-rich segment, depending on U_fraction value.
U_fraction: Minimum fraction of letters U in U-rich segment (if a number is between 0 and 1) or maximum gap of non-U letters between next U's within U-rich segment (if a number is equal to or greater than 1).
U_{last #}: Number of U-rich segments taken into account, starting from 3′-edge of the sequence. By default, equals 1 so the very last seqment is taken, other are ignored.

The bottom section of the form: Run parameters

Repetition

This field is provided for setting desired number of repeated runs of the model used to estimate the probability of premature termination. The value is limited by 100 for online version, and is in effect only in “std” and “more” output modes (see “Output” below). In the “most” output modes only a single run is performed. In the mode of finding stable RNA secondary structures (see below) the user may enter in this field the desired number of such structures (20 by default).

Seed

Random number generator of the model will be initialized by the timer value, if this field is not set. To obtain reproducible results, the user can enter a certain number here that will be used instead of the timer value.

Lic.

The field is reserved for service purposes.

Clear

Click this button to cancel changes of the parameters and restore initial state of the form.

Equilibrium

Clicking this button will run the model in a special mode with motionless ribosome and polymerase. It can be helpful in finding stable RNA secondary structures.
Attention! You cannot stop or otherwise affect the process after clicking the button.

Run model

Clicking this button will run the model in normal mode.
Attention! You cannot alter parameters or otherwise affect the process after clicking the button.

Output

Three types of output result are available:

std: Standard output mode where each run of the model results in one output line containing outcome of modeling (i.e., termination or anti-termination) and some additional characteristics of that run.
more: Extended output mode where an additional page is generated, which contains the final macrostate and outcome for each modeling trajectory.
most: Maximum output mode in a single run of the model. “Repetition” field is ignored if user chooses this mode, and the mode is presumed if user sets repetition number to 1. Each move of ribosome or polymerase generates one output line. In addition, another page is generated with graphical representation of model trajectory in terms of original sequence where current window and hypohelices are color-coded. To have more obvious representation, possible loops of any length are presented only once. Nevertheless, the file can include many lines so online version outputs only begin and end of the trajectory.

After clicking “Run model” or “Equilibrium” button, the interface form disappears and (provided that parameters are correct) the message “Your request has been accepted with ID: xxxxx” appears along with a link “view results” to the future result page. Time of processing varies in a wide range depending on the request and the server load. On completion, the message “Request successfully completed” is displayed, and you can go to link “view results”. Alternatively, you may save the link and close the window rather than wait for completion. Later you can go to the link and check if the processing is completed, and to get complete or partial results.

The “view results” link opens a new browser window with the output text file depending on the previously selected parameters and output mode. The “Delete result” buttons in the top and bottom of the page allow the user to remove all data and result files related to their request. In the “more” and “most” output modes, a link is also provided to view a graphical presentation of the final states or a model trajectory in a separate window. It is highly recommended to save both pages on a local system and then click the “Delete result” button. Otherwise, the server may delete the result at any moment without notice, and it will be impossible to save the data.

For a bug report or more information on the model, please use the contact email adress.

References

V.A. Lyubetsky, L.I. Rubanov, A.V. Seliverstov, S.A. Pirogov. Model of gene expression regulation in bacteria via formation of RNA secondary structures. Molecular Biology, 2006, Vol. 40, No. 3, P. 440–453. DOI: 10.1134/S0026893306030113
V.A. Lyubetsky, K.Yu. Gorbunov, S.A. Pirogov, L.I. Rubanov, A.V. Seliverstov. Algorithm and results for model of gene expression regulation in bacteria based on formation of RNA secondary structures. Information Processes, 2005, Vol. 5, No. 5, P. 337–366. (In Russian) text
V.A. Lyubetsky, A.V. Seliverstov. Computation of effectiveness of tryptophan biosynthesis regulation in bacteria based on model of classic attenuation. Information Processes, 2006, Vol. 6, No. 1, P. 55–57. (In Russian) text
A. Das, I.P. Crawford, C. Yanofsky Regulation of tryptophan operon expression by attenuation in cell-free extracts of Escherichia coli. Journal of Biological Chemistry, 1982, Vol. 257, Iss. 15, P. 8795–8798. DOI: 10.1016/S0021-9258(18)34200-5
V.A. Lyubetsky, S.A. Pirogov, L.I. Rubanov, A.V. Seliverstov. Modeling classic attenuation regulation of gene expression in bacteria. Journal of Bioinformatics and Computational Biology, 2007, Vol. 5, Iss. 1, P. 155–180. DOI: 10.1142/S0219720007002576
L.I. Rubanov, V.A. Lyubetsky. RNAmodel web server: modeling classic attenuation in bacteria. In Silico Biology, 2007, Vol. 7, No. 3, P. 285–308.