We develop a dynamic programming algorithm to solve the problem. Before we describe the method, we would like to define some variations of alignments which will be used in our algorithm. Let *S*[1…*m*] be the query sequence with known structure *M* and *T*[1…*n*] be the target sequence with unknown structure.

**Definition 4** Optimal prefix-global structural alignment *between S*[1…*m*] *and T*[1…*n*] *is to find a prefix S*[1…*y*] *where* 0 ≤ *y* ≤ *m* *(i.e. S is an empty string when y* = 0*) such that the score of the optimal global structural alignment between the prefix S*[1…*y*] *and T*[1…*n*] *is maximum*.

**Definition 5** Optimal suffix-global structural alignment *between S*[1…*m*] *and T*[1…*n*] *is to find S*[*x*…*m*] *where* 1 ≤ *x* ≤ *m* + 1 (*i.e. S is an empty string when x* = *m* + 1) *such that the score of the optimal global structural alignment between the suffix S*[*x*…*m*] *and T*[1…*n*] *is maximum*.

**Definition 6** Optimal semi-global structural alignment *between S*[1…*m*] *and T*[1…*n*] *is to find a substring S*[*x*…*y*] *where* 1 ≤ *x*, *y* ≤ *m such that the score of the optimal global structural alignment between the substring S*[*x*…*y*] *and T*[1…*n*] *is maximum*.

Let the affine gap model be *h* + *sL*, where *h* is the gap opening penalty, *s* represents a gap extension penalty, and *L* denotes the length of gap. Our method consists of two steps. In the first step, we compute the optimal semi-global structural alignment between *S* and all possible substrings of *T*. In the second step, we obtain the optimal local structural alignment between *S* and *T* resulted in the first step. Define *A*(*p*, *q*, *e*, *f*) to be the score of the optimal *semi-global* structural alignment between *S*[*p*…*q*] and *T* [*e*…*f*]. The score of the optimal *local* structural alignment between *S* and *T* can be obtained from the entry max_{
e
}_{≤}_{
f
}_{+1}*A*(1, *m*, *e*, *f*). We first show how to compute *A*, then show how to use the structure of *S* to guide the computation of *A* without considering all possible combinations of *p*, *q*.

When considering any substring *S*′ = *S*[*x*′…*y*′] of *S*[*x*…*y*], there are four possible cases: (1) *S*′ is equal to *S* (i.e. *x*′ = *x*, *y*′ = *y*); (2) *S*′ is a proper prefix in *S* (i.e. *x*′ = *x*, *y*′ <*y*); (3) *S*′ is a proper suffix in *S* (i.e. *x*′ >*x*, *y*′ = *y*); (4) *S*′ is a substring of *S*[*x* + 1…*y* – 1] (i.e. *x*′ >*x*, *y*′ <*y*); Therefore, we can consider each case one by one when computing the value of *A*.

Define *A*_{1}(*p*, *q*, *e*, *f*) to be the score of the optimal *global* structural alignment between *S*[*p*…*q*] and *T*[*e*…*f*]. Define *A*_{2}(*p*, *q*, *e*, *f*) to be the score of the optimal *prefix-global* structural alignment between *S*[*p*…*q* – 1] and *T*[*e*…*f*]. Define *A*_{3}(*p*, *q*, *e*, *f*) to be the score of the optimal *suffix-global* structural alignment between *S*[*p* + 1…*q*] and *T*[*e*…*f*]. Define *A*_{4}(*p*, *q*, *e*, *f*) to be the score of the optimal *semi-global* structural alignment between *S*[*p* + 1…*q* – 1] and *T*[*e*…*f*].

The value of *A*(*p*, *q*, *e*, *f*) can be computed recursively and it is the maximum value of four cases: (1) when *S*′ = *S*[*p*, *q*] (i.e. *A*_{1}(*p*, *q*, *e*, *f*)); (2) when *S*′ is a proper prefix of *S*[*p*, *q*] (i.e. *A*_{2}(*p*, *q*, *e*, *f*)); (3) when *S*′ is a proper suffix of *S*[*p*, *q*] (i.e. *A*_{3}(*p*, *q*, *e*, *f*); (4) when *S*′ is a substring of *S*[*p* + 1, *q* – 1] (i.e. *A*_{4}(*p*, *q*, *e*, *f*); Lemma 1 summarizes these cases.

The following subsections describe how to compute *A*_{1},*A*_{2},*A*_{3},*A*_{4}.

### Calculation of *A*_{1}

When considering the optimal global structural alignment (with affine gap model) between *S*[*p*…*q*] and *T* [*e*…*f*], there are nine possible cases: (1) *S*[*p*] is aligned with *T*[*e*] and *S*[*q*] with *T*[*f*]; (2) *S*[*p*] with *T*[*e*] and *S*[*q*] with space;(3) *S*[*p*] with *T*[*e*] and *T*[*f*]*withspace*; (4) *S*[*p*] with space and *S*[*q*] with *T*[*f*]; (5) *S*[*p*] with space and *S*[*q*] with space; (6) *S*[*p*] with space and *T*[*f*] with space; (7) *T*[*e*] with space and *S*[*q*] with *T*[*f*]; (8) *T*[*e*] with space and *S*[*q*] with space; (9) *T*[*e*] with space and *T*[*f*] with space. Hence, we can consider each case one by one when computing the value of *A*_{1}.

Define *A*_{1}_{
x
}(*p*, *q*, *e*, *f*), where 1 ≤ *x* ≤ 9, to be the score of the optimal *global* structural alignment between *S*[*p…q*] and *T*[*e…f*] where the above case *x* is satisfied. (i.e. if *x* = 1, then *S*[*p*] is aligned with *T*[*e*] and *S*[*q*] with *T*[*f*]).

The value of *A*_{1}(*p*, *q*, *e*, *f*) can be computed recursively and it is the maximum value of nine cases. Lemma 2 summarizes these cases.

We will describe the calculation of *A*_{12}. Similar skill can be applied for the others (i.e. *A*_{11}, *A*_{13}, … , *A*_{19}).

#### Calculation of A_{12}

*A*_{12}(*p*, *q*, *e*, *f*) is the score of the optimal global structural alignment between *S*[*p…q*] and *T*[*e…f*], which aligns *S*[*p*] with *T*[*e*] and *S*[*q*] with space. There are three situations and we need to consider them one by one. Note that according to the affine gap model, the penalty of a first space in a gap (i.e. which is *h* + *s*) is different from the penalty of the other space in a gap (i.e. which is *s*). Situation I: when (*p*, *q*) is a base pair - aligning the base pair *S*[*p*] with *T*[*e*] and *S*[*q*] with space. Considering the alignment between *S*[*p* + 1…*q* – 1] and *T*[*e* + 1…*f*], if *S*[*q* – 1] is aligned with space (i.e. case 2, case 5 and case 8), then a penalty *s* should be considered. Otherwise (i.e. for the other six cases), a penalty *h* + *s* should be considered. Situation II: when ∃*q*′ where *p* <*q*′ <*q* such that (*p*, *q*′) is a base pair - we need to find *k* ∈ [*e* – 1, *f*] such that the sum of the alignment score between *S*[*p*, *q*′] and *T*[*e*, *k*], and that between *S*[*q*′ + 1, *q*] and *T*[*k* + 1, *f*] is maximum. Since *S*[*p*] is aligned with *T*[*e*] and *S*[*q*] with space, the alignment between *S*[*p*,*q*′] and *T*[*e*, *k*] should satisfy the case 1, case 2 and case 3 (i.e. *S*[*p*] is aligned with *T*[*e*]). Similarly, the alignment between *S*[*q*′ + 1, *q*] and *T*[*k* + 1, *f*] should satisfy the case 2, case 5 and case 8 (i.e. *S*[*q*] is aligned with space). Situation III: when *p* does not form base pair with any base *q*′ ∈ [*p*, *q*] - we align base *S*[*p*] with *T*[*e*]. Then the alignment between *S*[*p* + 1…*q*] and *T*[*e*+ 1…*f*] should satisfy the case 2, case 5 and case 8 (i.e. *S*[*q*] is aligned with space). Lemma 3 summarizes these situations:

### Calculation of *A*_{2}

When considering the optimal prefix-global structural alignment (with affine gap model) between *S*[*p*…*q*] and *T*[*e*…*f*], there are four possible cases: (1) *S*[*p*] is aligned with *T*[*e*]; (2) *S*[*p*] with space; (3) *T*[*f*] with space; and (4) an empty string of *S* with *T*.

Define *A*_{2x}(*p*, *q*, *e*, *f*), where 1 ≤ *x* ≤ 3, to be the score of the optimal *prefix-global* structural alignment between *S*[*p*…*q*] and *T*[*e*…*f*] where the above case *x* is satisfied. (i.e. if *x* = 1, then *S*[*p*] is aligned with *T*[*e*]). Note that we do not need to define function for the case 4 because the corresponding score is – *h – s*(*f – e* + 1). The value of *A*_{2}(*p*, *q*, *e*, *f*) can be computed recursively and it is the maximum value of four cases. Lemma 4 summarizes these cases.

**Lemma 4**

*A*_{2}(*p*, *q*, *e*, *f*) = max{*A*_{21}[ *p*, *q*, *e*, *f*], *A*_{22}[*p*, *q*, *e*, *f*], *A*_{23}[*p*, *q*, *e*, *f*], – *h* – *s*(*f* – *e* + 1)}

We will describe the calculation of *A*_{22}. Similar skill can be applied to calculate *A*_{21} and *A*_{23}.

#### Calculation of A_{22}

The following lemma lists out the computation of *A*_{22}.

*A*_{22}(*p*, *q*, *e*, *f*) is the score of the optimal prefix-global structural alignment between *S*[*p*…*q* – 1] and *T*[*e*…*f*], where *S*[*p*] is aligned with space. Similar to *A*_{12}, there are also the same three situations. Situation I: when (*p*, *q*) is a base pair - aligning the base pair *S*[*p*] with space. Since a prefix of *S*[*p*…*q* – 1] is considered, there are two possibilities: a prefix of *S*[*p* + 1…*q* – 1] is aligned with *T*[*e*…*f*] (i.e. semi-global alignment), or the whole sequence *S*[*p*+ 1…*q* – 1] is aligned with *T*[*e*…*f*] (i.e. global alignment). Situation II: when ∃*q*′ where *p* <*q*′ <*q* such that (*p*, *q*′) is a base pair - we need to find *k* ∈ [*e* – 1, *f*] such that the sum of the alignment score between *S*[*p*, *q*′] and *T*[*e*, *k*], and that between *S*[*q*′ + 1, *q*] and *T*[*k* + 1, *f*] is maximum. Since a prefix of *S*[*p*…*q* – 1] is considered, there are two possibilities: (1) the whole sequence *S*[*p*, *q*′] is aligned with *T*[*e*, *k*] (i.e. global alignment) and a prefix of *S*[*q*′ + 1, *q*] is aligned with *T*[*k* + 1, *f*] (i.e. semi-global); (2) a prefix of *S*[*p*, *q*′] is aligned with *T*[*e*, *k*] (i.e. semi-global) only. Situation III: when *p* does not form base pair with any base *q*′ ∈ [*p*, *q*] - we align base *S*[*p*] with space. For each possibility of situation I & III, there are also two conditions: if *S*[*p* + 1] is aligned with *T*[*e*] or *T*[*e*] is aligned with space, the penalty score *h* + *s* should be considered. Otherwise, if *S*[*p* + 1] is aligned with space, then the penalty score *s* should be considered. The lemma 5 summarizes these cases.

The calculations for *A*_{3} and *A*_{4} are similar. In the following subsection, we will describe the time complexity of the algorithm.

### Time complexity

To fill the dynamic programming table, not all entries for all possible subrange of *S* needs to be filled. According to the design of the dynamic programming, there are three cases:

Case 1: if (*p*, *q*) ∈ *M*_{
p,q
}, then all the entries for *S*[*p*, *q*] of all tables (i.e. *A*, *A*_{1}, *A*_{2}, *A*_{3}, *A*_{4}, *A*_{11}, …, etc.) can be computed from the entries for *S*[*p* – 1, *q* + 1].

Case 2: if ∃*q*′ <*q* s.t. (*p*, *q*′) ∈ *M*_{
p,q
}, then all the entries for *S*[*p*, *q*] of all tables can be computed from the entries for *S*[*p*, *q*′] and *S*[*q*′ + 1, *q*].

Case 3: if ∄*q*′ s.t. (*p*, *q*′) ∈ *M*_{
p,q
}, then all the entries for *S*[*p*, *q*] of all tables can be computed from the entries for *S*[*p* + 1, *q*].

Therefore, we define a function

*ζ*(

*p*,

*q*) to determine for which set of subregions in

*S*, we need to fill the corresponding entires in all the tables.

We only need to fill in the entries for all the tables provided (*p*, *q*) can be obtained from (1, *m*) by applying *ζ* function repeatedly. Considering the *ζ* function, each time the total size of the subregions outputted cannot be greater than the size of the input region and each of the subregions outputted is smaller than the input region. Therefore, in total there are only *O*(*m*) such (*p*, *q*) values. Also, there are *O*(*n*^{2}) values of different (*e*, *f*) values, and for each entry, it takes *O*(*n*) because of the consideration of *e* – 1 ≤ *k* ≤ *f* in the case that ∃*q*′ <*q* s.t. (*p*, *q*′) ∈ *M*_{
p,q
}. After finishing the calculation of values *A*(1, *m*, *e*, *f*) for all 1 ≤ *e*, *f* ≤ *n*, the final answer (i.e. max_{e≤f+1}{*A*(1, *m*, *e*, *f*)}) can be computed in *O*(*n*^{2}) time. Therefore the total time complexity = *O*(*mn*^{3}) + *O*(*n*^{2}) = *O*(*mn*^{3}).

**Theorem 1** *For any sequence S*[1..*m*] *with regular structure and any sequence T*[1…*n*] *with unknown structure, the optimal local alignment score between S*[1*..m*] *and T*[1..*n*] *can be computed in O*(*mn*^{3}).