scispace - formally typeset
Search or ask a question

Showing papers by "Ming-Yang Kao published in 2001"


Journal ArticleDOI
Ting Chen1, Ming-Yang Kao, Matthew Tepel1, John Rush1, George M. Church1 
TL;DR: In this paper, the authors proposed a dynamic programming-based method to reconstruct the peptide sequence from a given tandem mass spectral data of k ions by implicitly transforming the spectral data into an NC-spectrum graph G (V, E).
Abstract: Tandem mass spectrometry fragments a large number of molecules of the same peptide sequence into charged molecules of prefix and suffix peptide subsequences and then measures mass/charge ratios of these ions. The de novo peptide sequencing problem is to reconstruct the peptide sequence from a given tandem mass spectral data of k ions. By implicitly transforming the spectral data into an NC-spectrum graph G (V, E) where /V/ = 2k + 2, we can solve this problem in O(/V//E/) time and O(/V/2) space using dynamic programming. For an ideal noise-free spectrum with only b- and y-ions, we improve the algorithm to O(/V/ + /E/) time and O(/V/) space. Our approach can be further used to discover a modified amino acid in O(/V//E/) time. The algorithms have been implemented and tested on experimental data.

224 citations


Journal ArticleDOI
TL;DR: An algorithm for comparing trees that are labeled in an arbitrary manner is presented, which is faster than the previous algorithms and an efficient algorithm is obtained for a new matching problem called the hierarchical bipartite matching problem.

55 citations


Journal ArticleDOI
04 Mar 2001
TL;DR: It is proved that allowing pseudoknots makes it NP-hard to maximize the number of stacking pairs in a planar secondary structure.
Abstract: In this paper we investigate the computational problem of predicting RNA secondary structures that allow any kinds of pseudoknots. The general belief is that allowing pseudoknots makes the problem very difficult. Existing polynomial-time algorithms, which aim at structures that optimize some energy functions, can only handle a certain types of pseudoknots. In this paper we initiate the study of approximation algorithms for handling all kinds of pseudoknots. We focus on predicting RNA secondary structures with a maximum number of stacking pairs and obtain two approximation algorithms with worst-case approximation ratios of 1/2 and 1/3 for planar and general secondary structures, respectively. Furthermore, we prove that allowing pseudoknots would make the problem of maximizing the number of stacking pairs on planar secondary structure to be NP-hard. This result should be contrasted with the recent NP-hard results on psuedoknots which are based on optimizing some peculiar energy functions.

55 citations


Posted Content
TL;DR: In this article, the authors presented an algorithm for comparing trees that are labeled in an arbitrary manner, which is faster than the previous algorithms and is at the core of their maximum agreement subtree algorithm.
Abstract: A widely used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure is only concerned with the comparison of labeled trees of two special kinds, namely, uniformly labeled trees (i.e., trees with all their nodes labeled by the same symbol) and evolutionary trees (i.e., leaf-labeled trees with distinct symbols for distinct leaves). This paper presents an algorithm for comparing trees that are labeled in an arbitrary manner. In addition to this generality, this algorithm is faster than the previous algorithms. Another contribution of this paper is on maximum weight bipartite matchings. We show how to speed up the best known matching algorithms when the input graphs are node-unbalanced or weight-unbalanced. Based on these enhancements, we obtain an efficient algorithm for a new matching problem called the hierarchical bipartite matching problem, which is at the core of our maximum agreement subtree algorithm.

51 citations


Posted Content
TL;DR: These examples are novel applications of small cycle separators of planar graphs and settle several problems that have been open since Tutte's census series were published in 1960's.
Abstract: We propose a fast methodology for encoding graphs with information-theoretically minimum numbers of bits. Specifically, a graph with property pi is called a pi-graph. If pi satisfies certain properties, then an n-node m-edge pi-graph G can be encoded by a binary string X such that (1) G and X can be obtained from each other in O(n log n) time, and (2) X has at most beta(n)+o(beta(n)) bits for any continuous super-additive function beta(n) so that there are at most 2^{beta(n)+o(beta(n))} distinct n-node pi-graphs. The methodology is applicable to general classes of graphs; this paper focuses on planar graphs. Examples of such pi include all conjunctions over the following groups of properties: (1) G is a planar graph or a plane graph; (2) G is directed or undirected; (3) G is triangulated, triconnected, biconnected, merely connected, or not required to be connected; (4) the nodes of G are labeled with labels from {1, ..., ell_1} for ell_1 <= n; (5) the edges of G are labeled with labels from {1, ..., ell_2} for ell_2 <= m; and (6) each node (respectively, edge) of G has at most ell_3 = O(1) self-loops (respectively, ell_4 = O(1) multiple edges). Moreover, ell_3 and ell_4 are not required to be O(1) for the cases of pi being a plane triangulation. These examples are novel applications of small cycle separators of planar graphs and are the only nontrivial classes of graphs, other than rooted trees, with known polynomial-time information-theoretically optimal coding schemes.

24 citations


Posted Content
Ting Chen1, Ming-Yang Kao, Matthew Tepel1, John Rush1, George M. Church1 
TL;DR: In this paper, the authors proposed to transform the spectral data into an NC-spectrum graph and solve the de novo peptide sequencing problem in O(|V|+|E|) time and space using dynamic programming.
Abstract: The tandem mass spectrometry fragments a large number of molecules of the same peptide sequence into charged prefix and suffix subsequences, and then measures mass/charge ratios of these ions. The de novo peptide sequencing problem is to reconstruct the peptide sequence from a given tandem mass spectral data of k ions. By implicitly transforming the spectral data into an NC-spectrum graph G=(V,E) where |V|=2k+2, we can solve this problem in O(|V|+|E|) time and O(|V|) space using dynamic programming. Our approach can be further used to discover a modified amino acid in O(|V||E|) time and to analyze data with other types of noise in O(|V||E|) time. Our algorithms have been implemented and tested on actual experimental data.

15 citations


Posted Content
TL;DR: In this article, the authors consider a sequence of arbitrary distinct values arriving in random order, and devise strategies for selecting low values followed by high values in such a way as to maximize the expected gain in rank from low values to high values.
Abstract: In this paper we examine problems motivated by on-line financial problems and stochastic games. In particular, we consider a sequence of entirely arbitrary distinct values arriving in random order, and must devise strategies for selecting low values followed by high values in such a way as to maximize the expected gain in rank from low values to high values. First, we consider a scenario in which only one low value and one high value may be selected. We give an optimal on-line algorithm for this scenario, and analyze it to show that, surprisingly, the expected gain is n-O(1), and so differs from the best possible off-line gain by only a constant additive term (which is, in fact, fairly small -- at most 15). In a second scenario, we allow multiple nonoverlapping low/high selections, where the total gain for our algorithm is the sum of the individual pair gains. We also give an optimal on-line algorithm for this problem, where the expected gain is n^2/8-\Theta(n\log n). An analysis shows that the optimal expected off-line gain is n^2/6+\Theta(1), so the performance of our on-line algorithm is within a factor of 3/4 of the best off-line strategy.

13 citations


Posted Content
TL;DR: This paper focuses on predicting RNA secondary structures with a maximum number of stacking pairs and obtains two approximation algorithms with worst-case approximation ratios of 1/2 and 1/3 for planar and general secondary structures, respectively.
Abstract: The paper investigates the computational problem of predicting RNA secondary structures. The general belief is that allowing pseudoknots makes the problem hard. Existing polynomial-time algorithms are heuristic algorithms with no performance guarantee and can only handle limited types of pseudoknots. In this paper we initiate the study of predicting RNA secondary structures with a maximum number of stacking pairs while allowing arbitrary pseudoknots. We obtain two approximation algorithms with worst-case approximation ratios of 1/2 and 1/3 for planar and general secondary structures,respectively. For an RNA sequence of $n$ bases, the approximation algorithm for planar secondary structures runs in $O(n^3)$ time while that for the general case runs in linear time. Furthermore, we prove that allowing pseudoknots makes it NP-hard to maximize the number of stacking pairs in a planar secondary structure. This result is in contrast with the recent NP-hard results on psuedoknots which are based on optimizing some general and complicated energy functions.

13 citations


Posted Content
TL;DR: In this article, a mathematical model of DNA self-assembly using 2D tiles to form 3D nanostructures is proposed, which is a more precise superset of their Tile Assembly Model that facilitates building scalable 3D molecules.
Abstract: We propose a mathematical model of DNA self-assembly using 2D tiles to form 3D nanostructures. This is the first work to combine studies in self-assembly and nanotechnology in 3D, just as Rothemund and Winfree did in the 2D case. Our model is a more precise superset of their Tile Assembly Model that facilitates building scalable 3D molecules. Under our model, we present algorithms to build a hollow cube, which is intuitively one of the simplest 3D structures to construct. We also introduce five basic measures of complexity to analyze these algorithms. Our model and algorithmic techniques are applicable to more complex 2D and 3D nanostructures.

11 citations


Posted Content
Ming-Yang Kao1
TL;DR: In this paper, the authors report a master theorem on tight asymptotic solutions to divide-and-conquer recurrences with more than one recursive term: T = 1/4 T(n/16) + 1/3 T(3n/5) + 4 Tn/100 + 10n/300 + n^2.
Abstract: This short note reports a master theorem on tight asymptotic solutions to divide-and-conquer recurrences with more than one recursive term: for example, T(n) = 1/4 T(n/16) + 1/3 T(3n/5) + 4 T(n/100) + 10 T(n/300) + n^2.

10 citations


Posted Content
TL;DR: In this paper, the authors give an optimal linear-time algorithm for testing whether there exist nontrivial analytic invariants in terms of the suppressed cells in a given set of suppressed cells.
Abstract: To protect sensitive information in a cross tabulated table, it is a common practice to suppress some of the cells in the table. An analytic invariant is a power series in terms of the suppressed cells that has a unique feasible value and a convergence radius equal to +\infty. Intuitively, the information contained in an invariant is not protected even though the values of the suppressed cells are not disclosed. This paper gives an optimal linear-time algorithm for testing whether there exist nontrivial analytic invariants in terms of the suppressed cells in a given set of suppressed cells. This paper also presents NP-completeness results and an almost linear-time algorithm for the problem of suppressing the minimum number of cells in addition to the sensitive ones so that the resulting table does not leak analytic invariant information about a given set of suppressed cells.

Posted Content
TL;DR: An optimal deterministic hybrid algorithm and an efficient randomized hybrid algorithm are constructed, which resolves an open question on searching with multiple robots posed by Baeza-Yates, Culberson, and Rawlins.
Abstract: We study on-line strategies for solving problems with hybrid algorithms. There is a problem Q and w basic algorithms for solving Q. For some lambda <= w, we have a computer with lambda disjoint memory areas, each of which can be used to run a basic algorithm and store its intermediate results. In the worst case, only one basic algorithm can solve Q in finite time, and all the other basic algorithms run forever without solving Q. To solve Q with a hybrid algorithm constructed from the basic algorithms, we run a basic algorithm for some time, then switch to another, and continue this process until Q is solved. The goal is to solve Q in the least amount of time. Using competitive ratios to measure the efficiency of a hybrid algorithm, we construct an optimal deterministic hybrid algorithm and an efficient randomized hybrid algorithm. This resolves an open question on searching with multiple robots posed by Baeza-Yates, Culberson and Rawlins. We also prove that our randomized algorithm is optimal for lambda = 1, settling a conjecture of Kao, Reif and Tate.

Posted Content
TL;DR: A new approach to the double digest problem is presented, which can be solved in linear time in certain theoretically interesting cases and is also NP-hard.
Abstract: The double digest problem is a common NP-hard approach to constructing physical maps of DNA sequences. This paper presents a new approach called the enhanced double digest problem. Although this new problem is also NP-hard, it can be solved in linear time in certain theoretically interesting cases.

Posted Content
TL;DR: In this article, the authors give three coding schemes for planar graphs, which all take O(m+n) time for encoding and decoding, with the best known bit count being 2m+2n + o(n) for any constant k > 0.
Abstract: Let G be a plane graph of n nodes, m edges, f faces, and no self-loop. G need not be connected or simple (i.e., free of multiple edges). We give three sets of coding schemes for G which all take O(m+n) time for encoding and decoding. Our schemes employ new properties of canonical orderings for planar graphs and new techniques of processing strings of multiple types of parentheses. For applications that need to determine in O(1) time the adjacency of two nodes and the degree of a node, we use 2m+(5+1/k)n + o(m+n) bits for any constant k > 0 while the best previous bound by Munro and Raman is 2m+8n + o(m+n). If G is triconnected or triangulated, our bit count decreases to 2m+3n + o(m+n) or 2m+2n + o(m+n), respectively. If G is simple, our bit count is (5/3)m+(5+1/k)n + o(n) for any constant k > 0. Thus, if a simple G is also triconnected or triangulated, then 2m+2n + o(n) or 2m+n + o(n) bits suffice, respectively. If only adjacency queries are supported, the bit counts for a general G and a simple G become 2m+(14/3)n + o(m+n) and (4/3)m+5n + o(n), respectively. If we only need to reconstruct G from its code, a simple and triconnected G uses roughly 2.38m + O(1) bits while the best previous bound by He, Kao, and Lu is 2.84m.

Book ChapterDOI
28 Aug 2001
TL;DR: In this article, the authors developed three polynomial-time techniques for pricing European Asian options with provably small errors, where the stock prices follow binomial trees or trees of higher degree.
Abstract: This paper develops three polynomial-time techniques for pricing European Asian options with provably small errors, where the stock prices follow binomial trees or trees of higher-degree. The first technique is the first known Monte Carlo algorithm with analytical error bounds suitable for pricing single-stock options with meaningful confidence and speed. The second technique is a general recursive bucketing-based scheme that enables robust trade-offs between accuracy and run-time. The third technique combines the Fast Fourier Transform with bucketing-based schemes for pricing basket options. This technique is extremely fast, polynomial in the number of days and stocks, and does not add any errors to those already incurred in the companion bucketing scheme.

Journal ArticleDOI
TL;DR: The problem of designing proxies (or portfolios) for various stock market indices based on historical data is studied and it is shown that the problem is NP-hard, and hence most likely intractable.
Abstract: In this paper, we study the problem of designing proxies (or portfolios) for various stock market indices based on historical data. We use four different methods for computing market indices, all of which are formulae used in actual stock market analysis. For each index, we consider three criteria for designing the proxy: the proxy must either track the market index, outperform the market index, or perform within a margin of error of the index while maintaining a low volatility. In eleven of the twelve cases (all combinations of four indices with three criteria except the problem of sacrificing return for less volatility using the price-relative index) we show that the problem is NP-hard, and hence most likely intractable.

Proceedings ArticleDOI
09 Jan 2001
TL;DR: A simple agent-based model for a stock market where the agents are traders equipped with simple trading strategies, and their trades together determine the stock prices is developed.
Abstract: This paper initiates a study into the century-old issue of market predictability from the perspective of computational complexity. We develop a simple agent-based model for a stock market where the agents are traders equipped with simple trading strategies, and their trades together determine the stock prices. Computer simulations show that a basic case of this model is already capable of generating price graphs which are visually similar to the recent price movements of high tech stocks. In the general model, we prove that if there are a large number of traders but they employ a relatively small number of strategies, then there is a polynomial-time algorithm for predicting future price movements with high accuracy. On the other hand, if the number of strategies is large, market prediction becomes complete in two new computational complexity classes CPP and BCPP, where PNP[O(log n)]e BCPP e CPP = PP. These computational completeness results open up a novel possibility that the price graph of a actual stock could be sufficiently deterministic for various prediction goals but appear random to all polynomial-time prediction algorithms.

Posted Content
TL;DR: This paper studies position-randomized auctions, which form a special class of multiple-object auctions where a bidding algorithm consists of an initial bid sequence and an algorithm for randomly permuting the sequence, and gives an optimal bidding algorithm for the disadvantaged bidder.
Abstract: In a multiple-object auction, every bidder tries to win as many objects as possible with a bidding algorithm. This paper studies position-randomized auctions, which form a special class of multiple-object auctions where a bidding algorithm consists of an initial bid sequence and an algorithm for randomly permuting the sequence. We are especially concerned with situations where some bidders know the bidding algorithms of others. For the case of only two bidders, we give an optimal bidding algorithm for the disadvantaged bidder. Our result generalizes previous work by allowing the bidders to have unequal budgets. One might naturally anticipate that the optimal expected numbers of objects won by the bidders would be proportional to their budgets. Surprisingly, this is not true. Our new algorithm runs in optimal O(n) time in a straightforward manner. The case with more than two bidders is open.

Posted Content
TL;DR: In this paper, a linear-time algorithm for the problem of adding the smallest number of edges to make a bipartite graph componentwise biconnected while preserving its bipartiteness is presented.
Abstract: A graph is componentwise biconnected if every connected component either is an isolated vertex or is biconnected. We present a linear-time algorithm for the problem of adding the smallest number of edges to make a bipartite graph componentwise biconnected while preserving its bipartiteness. This algorithm has immediate applications for protecting sensitive information in statistical tables.

Posted Content
TL;DR: This paper develops three polynomial-time techniques for pricing European Asian options with provably small errors, where the stock prices follow binomial trees or trees of higher-degree.
Abstract: This paper develops three polynomial-time pricing techniques for European Asian options with provably small errors, where the stock prices follow binomial trees or trees of higher-degree. The first technique is the first known Monte Carlo algorithm with analytical error bounds suitable for pricing single-stock options with meaningful confidence and speed. The second technique is a general recursive bucketing-based scheme that can use the Aingworth-Motwani-Oldham aggregation algorithm, Monte-Carlo simulation and possibly others as the base-case subroutine. This scheme enables robust trade-offs between accuracy and time over subtrees of different sizes. For long-term options or high frequency price averaging, it can price single-stock options with smaller errors in less time than the base-case algorithms themselves. The third technique combines Fast Fourier Transform with bucketing-based schemes for pricing basket options. This technique takes polynomial time in the number of days and the number of stocks, and does not add any errors to those already incurred in the companion bucketing scheme. This technique assumes that the price of each underlying stock moves independently.

Posted Content
TL;DR: A toolbox of combinatorial techniques for protein landscape analysis in the Grand Canonical model of Sun, Brem, Chan, and Dill is developed, based on linear programming, network flow, and a linear-size representation of all minimum cuts of a network.
Abstract: In modern biology, one of the most important research problems is to understand how protein sequences fold into their native 3D structures. To investigate this problem at a high level, one wishes to analyze the protein landscapes, i.e., the structures of the space of all protein sequences and their native 3D structures. Perhaps the most basic computational problem at this level is to take a target 3D structure as input and design a fittest protein sequence with respect to one or more fitness functions of the target 3D structure. We develop a toolbox of combinatorial techniques for protein landscape analysis in the Grand Canonical model of Sun, Brem, Chan, and Dill. The toolbox is based on linear programming, network flow, and a linear-size representation of all minimum cuts of a network. It not only substantially expands the network flow technique for protein sequence design in Kleinberg's seminal work but also is applicable to a considerably broader collection of computational problems than those considered by Kleinberg. We have used this toolbox to obtain a number of efficient algorithms and hardness results. We have further used the algorithms to analyze 3D structures drawn from the Protein Data Bank and have discovered some novel relationships between such native 3D structures and the Grand Canonical model.

Book ChapterDOI
19 Dec 2001
TL;DR: A toolbox of combinatorial techniques for protein landscape analysis in the Grand Canonical model of Sun, Brem, Chan, and Dill is developed, based on linear programming, network flow, and a linear-size representation of all minimum cuts of a network.
Abstract: In modern biology, one of the most important research problems is to understand how protein sequences fold into their native 3D structures. To investigate this problem at a high level, one wishes to analyze the protein landscapes, i.e., the structures of the space of all protein sequences and their native 3D structures. Perhaps the most basic computational problem at this level is to take a target 3D structure as input and design a fittest protein sequence with respect to one or more fitness functions of the target 3D structure. We develop a toolbox of combinatorial techniques for protein landscape analysis in the Grand Canonical model of Sun, Brem, Chan, and Dill. The toolbox is based on linear programming, network flow, and a linear-size representation of all minimum cuts of a network. It not only substantially expands the network flow technique for protein sequence design in Kleinberg's seminal work but also is applicable to a considerably broader collection of computational problems than those considered by Kleinberg. We have used this toolbox to obtain a number of efficient algorithms and hardness results. We have further used the algorithms to analyze 3D structures drawn from the Protein Data Bank and have discovered some novel relationships between such native 3D structures and the Grand Canonical model.

Posted Content
TL;DR: This work gives an algorithm to determine the largest possible number of leaves in any agreement subtree of two trees T1 and T2 with n leaves each, if the maximum degree d of these trees is bounded by a constant and is within a log n factor of optimal.
Abstract: An evolutionary tree is a rooted tree where each internal vertex has at least two children and where the leaves are labeled with distinct symbols representing species. Evolutionary trees are useful for modeling the evolutionary history of species. An agreement subtree of two evolutionary trees is an evolutionary tree which is also a topological subtree of the two given trees. We give an algorithm to determine the largest possible number of leaves in any agreement subtree of two trees T_1 and T_2 with n leaves each. If the maximum degree d of these trees is bounded by a constant, the time complexity is O(n log^2(n)) and is within a log(n) factor of optimal. For general d, this algorithm runs in O(n d^2 log(d) log^2(n)) time or alternatively in O(n d sqrt(d) log^3(n)) time.

Journal ArticleDOI
TL;DR: This work studies how to efficiently evaluate prefix sums on positive floating-point numbers such that the worst-case roundoff error of each sum is minimized, and provides experimental comparisons of all the algorithms studied in this paper on inputs that are randomly and uniformly generated.

Posted Content
TL;DR: In this paper, the authors investigated four levels of data security of a two-dimensional table concerning the effectiveness of suppressing some of the cells in the table and presented efficient algorithms and NP-completeness results for testing and achieving these four levels.
Abstract: To protect sensitive information in a cross tabulated table, it is a common practice to suppress some of the cells in the table. This paper investigates four levels of data security of a two-dimensional table concerning the effectiveness of this practice. These four levels of data security protect the information contained in, respectively, individual cells, individual rows and columns, several rows or columns as a whole, and a table as a whole. The paper presents efficient algorithms and NP-completeness results for testing and achieving these four levels of data security. All these complexity results are obtained by means of fundamental equivalences between the four levels of data security of a table and four types of connectivity of a graph constructed from that table.

Posted Content
TL;DR: In this paper, the authors improved this bound to 2.835m bits, which is the smallest size known to date for a planar undirected graph with n vertices, m edges, and f faces.
Abstract: Let G be an embedded planar undirected graph that has n vertices, m edges, and f faces but has no self-loop or multiple edge. If G is triangulated, we can encode it using {4/3}m-1 bits, improving on the best previous bound of about 1.53m bits. In case exponential time is acceptable, roughly 1.08m bits have been known to suffice. If G is triconnected, we use at most (2.5+2\log{3})\min\{n,f\}-7 bits, which is at most 2.835m bits and smaller than the best previous bound of 3m bits. Both of our schemes take O(n) time for encoding and decoding.

Posted Content
TL;DR: It is shown that this problem is NP-complete in general and solvable in $O(\inputsize\log \inputsize)$ time for the special case in which, for each input family CCi, each set in CCi induces a connected subgraph of the input graph $\Ggg$.
Abstract: Given a planar graph G and a sequence C_1,...,C_q, where each C_i is a family of vertex subsets of G, we wish to find a plane embedding of G, if any exists, such that for each i in {1,...,q}, there is a face F_i in the embedding whose boundary contains at least one vertex from each set in C_i. This problem has applications to the recovery of topological information from geographical data and the design of constrained layouts in VLSI. Let I be the input size, i.e., the total number of vertices and edges in G and the families C_i, counting multiplicity. We show that this problem is NP-complete in general. We also show that it is solvable in O(I log I) time for the special case where for each input family C_i, each set in C_i induces a connected subgraph of the input graph G. Note that the classical problem of simply finding a planar embedding is a further special case of this case with q=0. Therefore, the processing of the additional constraints C_1,...,C_q only incurs a logarithmic factor of overhead.

Posted Content
TL;DR: In this article, the authors present an algorithm for computing a maximum agreement subtree of two unrooted evolutionary trees in O(n −1.5} log n) time for trees with unbounded degrees.
Abstract: We present an algorithm for computing a maximum agreement subtree of two unrooted evolutionary trees. It takes O(n^{1.5} log n) time for trees with unbounded degrees, matching the best known time complexity for the rooted case. Our algorithm allows the input trees to be mixed trees, i.e., trees that may contain directed and undirected edges at the same time. Our algorithm adopts a recursive strategy exploiting a technique called label compression. The backbone of this technique is an algorithm that computes the maximum weight matchings over many subgraphs of a bipartite graph as fast as it takes to compute a single matching.

Posted Content
TL;DR: In this paper, the problem of finding an optimal portfolio for aggressive or risk-averse investors is solved using an algorithm based on a fast greedy solution to a maximum flow problem, where the probability distribution of the return values of the stocks considered by the investor are assumed to be known, while the joint distribution is unknown.
Abstract: This work initiates research into the problem of determining an optimal investment strategy for investors with different attitudes towards the trade-offs of risk and profit. The probability distribution of the return values of the stocks that are considered by the investor are assumed to be known, while the joint distribution is unknown. The problem is to find the best investment strategy in order to minimize the probability of losing a certain percentage of the invested capital based on different attitudes of the investors towards future outcomes of the stock market. For portfolios made up of two stocks, this work shows how to exactly and quickly solve the problem of finding an optimal portfolio for aggressive or risk-averse investors, using an algorithm based on a fast greedy solution to a maximum flow problem. However, an investor looking for an average-case guarantee (so is neither aggressive or risk-averse) must deal with a more difficult problem. In particular, it is #P-complete to compute the distribution function associated with the average-case bound. On the positive side, approximate answers can be computed by using random sampling techniques similar to those for high-dimensional volume estimation. When k>2 stocks are considered, it is proved that a simple solution based on the same flow concepts as the 2-stock algorithm would imply that P = NP, so is highly unlikely. This work gives approximation algorithms for this case as well as exact algorithms for some important special cases.