Budget Amount *help |
¥1,600,000 (Direct Cost: ¥1,600,000)
Fiscal Year 2004: ¥500,000 (Direct Cost: ¥500,000)
Fiscal Year 2003: ¥500,000 (Direct Cost: ¥500,000)
Fiscal Year 2002: ¥600,000 (Direct Cost: ¥600,000)
|
Research Abstract |
In this research project, we studied the effect of a hybrid approach to computationally hard problems. This approach combines three basic approaches (namely, approximation, randomization, and parallelization) to computationally hard problems. Previously, not so many algorithms were based on such a hybrid approach. The main purpose here is to use this hybrid approach to solve computationally hard problems that have not been solved so far. This may lead to the finding of new design techniques of efficient algorithms for hard problems. We focused on three computationally hard problems arising from the field of computational biology. The first is the protein NMR peak assignment problem which is crucial towards the automation of assigning a group of "spin systems" obtained experimentally to a protein sequence of amino acids. We formulated this problem as an interval scheduling problem (ISP), where a protein sequence P of amino acids is viewed as a discrete time interval I (the amino acids on
… More
P one-to-one correspond to the time units of I), each subset S of spin systems that are known to originate from consecutive amino acids from P is viewed as a "job" js, the preference of assigning S to a subsequence Q of consecutive amino acids on P is viewed as the profit of executing job js in the subinterval of I corresponding to Q, and the goal is to maximize the total profit of executing the jobs (on a single machine) during I. We showed that the interval scheduling problem is Max SNP-hard (even if each job takes either one or two consecutive time units), and designed an efficient 2-approximation algorithm for it. However, our experiments show that the 2-approximation algorithm does not output satisfactory assignments in practice. The reason is as follows : In the real practice of protein NMR peak assignment, each job js usually requires at most 10 consecutive time units, and typically the jobs that require one or two consecutive time units are the most difficult to assign/schedule. For this reason, we then designed several efficient heuristics for the problem ; some of them run on PC-clusters in short (parallel) time. Our experiments show that these heuristics work very well in practice. The second problem we considered is the following : Given a set of species and their similarity data, reconstruct a phylogeny (also called evolutionary tree) so that species are close in the phylogeny if and only if they have high similarity. Assume that the similarity data are represented as a graph G=(V, E), where each vertex represents a species and two vertices are adjacent if they represent species of high similarity. The phylogeny reconstruction problem can then be abstracted as a graph-theoretic problem called the phylogenetic k-th root problem (PR_k), where k is a predetermined proximity threshold. We showed that the problem can be solved in linear time if the input data have no errors and the phylogeny to be constructed is of bounded degree. We also showed that the problem is NP-hard if the input data have errors (no matter the phylogeny to be constructed is of bounded degree or not). The third problem we considered is the problem of DNA sequence alignment with inversions and reversals. Previously, inversions and reversals had not been considered seriously in sequence alignment although there are real in practice ; the only known algorithm previously runs in O(n^2m^2) time and consumes O(n^2m^2) space, where n and m are the lengths of the two input sequences respectively. We designed a space-efficient algorithm for this problem which consumes only O(nm) space with the same amount of time. Our algorithm enables the computation for a pair of DNA sequences of length up to 10,000 to be carried out on an ordinary desktop computer. Less
|