scispace - formally typeset
Search or ask a question

Showing papers by "Zhiyuan Chen published in 2007"


Proceedings ArticleDOI
11 Jun 2007
TL;DR: A two-step solution to address the diversity issue of user preferences for the categorization approach using a cost-based algorithm which considers the cost of visiting both intermediate nodes and leaf nodes in the tree.
Abstract: Database queries are often exploratory and users often find their queries return too many answers, many of them irrelevant. Existing work either categorizes or ranks the results to help users locate interesting results. The success of both approaches depends on the utilization of user preferences. However, most existing work assumes that all users have the same user preferences, but in real life different users often have different preferences. This paper proposes a two-step solution to address the diversity issue of user preferences for the categorization approach. The proposed solution does not require explicit user involvement. The first step analyzes query history of all users in the system offline and generates a set of clusters over the data, each corresponding to one type of user preferences. When user asks a query, the second step presents to the user a navigational tree over clusters generated in the first step such that the user can easily select the subset of clusters matching his needs. The user then can browse, rank, or categorize the results in selected clusters. The navigational tree is automatically constructed using a cost-based algorithm which considers the cost of visiting both intermediate nodes and leaf nodes in the tree. An empirical study demonstrates the benefits of our approach.

100 citations


Journal ArticleDOI
01 Feb 2007
TL;DR: This paper presents a framework defining a family of index structures that includes most existing XML path indices and proposes two novel index structures in this family, with different space-time tradeoffs, that are effective for the evaluation of XML branching path expressions (i.e., twigs) with value conditions.
Abstract: Various index structures have been proposed to speed up the evaluation of XML path expressions However, existing XML path indices suffer from at least one of three limitations: they focus only on indexing the structure (relying on a separate index for node content), they are useful only for simple path expressions such as root-to-leaf paths, or they cannot be tightly integrated with a relational query processor Moreover, there is no unified framework to compare these index structures In this paper, we present a framework defining a family of index structures that includes most existing XML path indices We also propose two novel index structures in this family, with different space-time tradeoffs, that are effective for the evaluation of XML branching path expressions (ie, twigs) with value conditions We also show how this family of index structures can be implemented using the access methods of the underlying relational database system Finally, we present an experimental evaluation that shows the performance tradeoff between index space and matching time The experimental results show that our novel indices achieve orders of magnitude improvement in performance for evaluating twig queries, albeit at a higher space cost, over the use of previously proposed XML path indices that can be tightly integrated with a relational query processor

41 citations


Journal ArticleDOI
TL;DR: This paper outlines techniques to integrate numerous water quality monitoring data sources, to resolve data disparities, and to retrieve data using semantic relationships among data sources taking advantage of customized user profiles to enhance quantity and quality of information available for water quality management.

35 citations


Journal ArticleDOI
TL;DR: A new metadata approach to elicit semantic information from environmental data is described and semantic-based techniques to assist users in integrating, navigating, and mining multiple environmental data sources are implemented.
Abstract: Environmental research and knowledge discovery both require extensive use of data stored in various sources and created in different ways for diverse purposes. We describe a new metadata approach to elicit semantic information from environmental data and implement semanticsbased techniques to assist users in integrating, navigating, and mining multiple environmental data sources. Our system contains specifications of various environmental data sources and the relationships that are formed among them. User requests are augmented with semantically related data sources and automatically presented as a visual semantic network. In addition, we present a methodology for data navigation and pattern discovery using multi-resolution browsing and data mining. The data semantics are captured and utilized in terms of their patterns and trends at multiple levels of resolution. We present the efficacy of our methodology through experimental results.

27 citations


01 Jan 2007
Abstract: In this paper we identify a major area of research as a topic for next generation data mining. The research effort in the last decade on privacy preserving data mining has resulted in the development of numerous algorithms. However, most of the existing research has not been applied in any particular application context. Hence it is unclear whether the current algorithms are directly applicable in any particular problem context. In this paper we identify a significant application context that not only requires protection of privacy but also sophisticated data analysis. The area in question is supply chain management, arguably one of the most important research areas in production and operations management that has enormous practical relevance. We examine the area of supply chain management and identify research challenges and opportunities for privacy preserving data mining in the next generation. 1. New frontiers of privacy preserving data mining The area of privacy preserving data mining (PPDM) started with the seminal paper by [1] and prompted numerous research efforts since then. Although many fundamental questions still remain unanswered, the area has matured to the point where it must establish its relevance to larger societal needs, including those of businesses and industries. Most of the existing methods may need to be extended or modified before being directly applicable to real-world settings. In this paper we describe one such area, supply chain management (SCM) that encompasses the entire business processes of all industries that deal with products involving multiple interrelated trading partners. Although data mining is not part of every day tools that are applied in SCM, statistical data analysis for demand forecasting is an essential part of 1 Research supported in part by NSF grant IIS-IPS 0713345 to Zhiyuan Chen and Aryya Gangopadhyay. 2 Corresponding author. Authors are listed in alphabetical order. Authors’ address: Department of Information Systems, University of Maryland Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD 21250. production planning, inventory management, and order processing in most industries. Supply chain management covers a multitude of tasks ranging from procurement of materials to transformation of these materials into intermediate and finished products and the distribution of these finished products to customers. The objective is to manage and control the material and information flow along the whole supply chain, so that the right products can be delivered in the right quantities at the right places at the right time at minimal cost. Demand forecasting is an important task in the management and optimization of supply chains that has a huge impact on a firm’s profitability under uncertain business environment. Since the party closest to the market has most information, system-wide information asymmetry exists in the supply chain. A well-known phenomenon, known to researchers and practitioners in operations management for many years is the “bullwhip” effect [2], which refers to amplified demand fluctuation from downstream to upstream trading partners caused by multi-point forecasting at each echelon of a supply chain. The bullwhip effect has caused supply chains in the retail industry as a whole and in textile retail in particular, to lose billions of dollars every year in lost revenues and inventory cost. Lack of information sharing has been identified as one of the major reasons leading to SCM inefficiency. Many organizations have realized that sharing information with other supply chain partners can lead to significant cost reduction. Collaborative planning, forecasting and replenishment (CPFR) is a relatively new approach aimed at achieving accurate demand forecasts and improving supply chain operations by sharing demand relevant information between trading partners in the supply chain. The key information includes point-of-sales (POS) data, future planned sales promotions, or inventory adjustments that would not have been known to the upstream partners if not shared. With the enhanced information visibility into the replenishment planning processes beyond the usual order cycle, demand forecast accuracy can be greatly improved. The reduction in forecast error across the supply chain improves operational efficiency among the supply chain partners and, therefore, yields mutual benefits. Although conceptually attractive, a major challenge is the trading partners’ unwillingness to share detailed information with the perception that other parties can unfairly exploit the information for their own benefits. Private information is normally viewed as a source of competitive advantage and is not freely shared among supply chain entities without a proper incentive mechanism. Due to firms’ unwillingness to disclose proprietary demand information, credible information sharing is always viewed as a big obstacle in effective supply chain management. Firms have recognized the need to hide sensitive information before sharing databases. The rest of the paper is organized as follows. The next section introduces supply chain management and the problem of bullwhip effect, which motivates the need for privacy protection methods in supply chain management. In Section 3 we discuss the challenges and future research topics for applying privacy preserving data mining methods in SCM. 2. What are SCM and the “bullwhip effect”? In a typical supply chain, there are five different types of entities: raw materials providers (i.e., suppliers), manufacturers, distributors, retailers, and customers. The raw material providers initiate the supply chain by drawing natural resources from the earth. Then, the manufacturers transform those resources into semifinished or finished goods via conversion, manufacture, or assembly. The products then pass through necessary channels of distribution, often including warehousing. After some form of storage and delivery, the goods arrive at retail outlets. And the cycle ends with consumption and recycling by the consumer. As shown in Figure 1, a traditional supply chain has three distinct dimensions [3] – the actual physical distribution of tangible (“hard”) goods with inbound and outbound logistics systems, the exchange of currency or payment, and the exchange of information among various economic players. As raw material flows downstream from raw material suppliers through the supply chain to the manufacturers, it is transformed into more functional and integrated products with a higher economic value. Further downstream, it flows through distribution channels to retail outlets, and finally reaches the consumer. Information can flow from retail outlets to the trading partners upstream in the form of market forecasts and orders, and also from suppliers/manufacturers to the trading partners downstream in the form of order status and shipment information. These information flows have a direct impact on the production scheduling, inventory control, and delivery plans of individual members in the supply chain. In order to meet consumer demand, a large number of suppliers and manufacturers must work together to manage the flow of material and information. Without proper streamlining of the information and material flow in this highly complex supply chain, billions of dollars can be lost in the form of stockouts, defects, mark-downs, and inventory costs. While the above sequence of business processes describes a supply chain, supply chain management refers to planning, design, and control of the flow of information and materials along the supply chain in order to meet customer requirements in an efficient manner. In traditional supply chain management, distributors play an important role in providing a shipment consolidation/integration function. Distributors collect orders from the retailers, fill the orders from their own warehouse inventory, and order products with the manufacturers. Since out-of-stock merchandise results in lost sales and possibly lost customers, distributors must be able to supply retail product demands quickly from inventory on hand. Thus, distributors have to maintain large inventories in warehouses as a buffer against demand uncertainty and possible product delivery delays by manufacturers. Accurate forecasts on both the retailers’ orders and the end consumer market demand have an effect on the distributors’ efficient inventory management. Distributors usually adopt a periodic review inventory policy. When the inventory level is lower than a specific amount, distributors order products from manufacturers. Distributors place orders with manufacturers based on two important criteria: the retail demand and the wholesale price. Thus, distributors must forecast both future demand and future manufacturer pricing levels. Generally, distributors order products from manufactures in full truckload quantities to minimize shipping costs. There is also a processing cost for a purchase transaction. These factors contribute to orders in large batch sizes that do not reflect real demand. To take advantage of the trade promotions (i.e., wholesale price discounts) provided by the manufacturer during a short period of time, strategic distributors tend to order with a deviation from actual demand. The distorted demand information can be a problem to the manufacturers, as it leads to uneven production schedule and unnecessary inventory cost. This can cause one of the biggest problems in traditional supply chain management, termed the “bullwhip effect” [2], a phenomenon that creates fluctuation of order information and is amplified from downstream to upstream in the supply chain. Sharing POS data, exchange of inventory status information, order coordination, and simplified pricing scheme can help mitigate the bullwhip effect. However, it remains a challenging question as to why the downstream players in the supply chain would provide upstream partners wi

4 citations