scispace - formally typeset
Search or ask a question
Author

Debprakash Patnaik

Other affiliations: Indian Institute of Science, General Motors, Amazon.com  ...read more
Bio: Debprakash Patnaik is an academic researcher from Virginia Tech. The author has contributed to research in topics: Spike train & Dynamic Bayesian network. The author has an hindex of 13, co-authored 38 publications receiving 595 citations. Previous affiliations of Debprakash Patnaik include Indian Institute of Science & General Motors.

Papers
More filters
Proceedings ArticleDOI
28 Jun 2009
TL;DR: A temporal data mining solution to model and optimize performance of data center chillers, a key component of the cooling infrastructure, and has the ability to intersperse "don't care" transitions in continuous-valued time series data.
Abstract: Motivation: Data centers are a critical component of modern IT infrastructure but are also among the worst environmental offenders through their increasing energy usage and the resulting large carbon footprints. Efficient management of data centers, including power management, networking, and cooling infrastructure, is hence crucial to sustainability. In the absence of a 'first-principles' approach to manage these complex components and their interactions, data-driven approaches have become attractive and tenable.Results: We present a temporal data mining solution to model and optimize performance of data center chillers, a key component of the cooling infrastructure. It helps bridge raw, numeric, time-series information from sensor streams toward higher level characterizations of chiller behavior, suitable for a data center engineer. To aid in this transduction, temporal data streams are first encoded into a symbolic representation, next run-length encoded segments are mined to form frequent motifs in time series, and finally these metrics are evaluated by their contributions to sustainability. A key innovation in our application is the ability to intersperse "don't care" transitions (e.g., transients) in continuous-valued time series data, an advantage we inherit by the application of frequent episode mining to symbolized representations of numeric time series. Our approach provides both qualitative and quantitative characterizations of the sensor streams to the data center engineer, to aid him in tuning chiller operating characteristics. This system is currently being prototyped for a data center managed by HP and experimental results from this application reveal the promise of our approach.

79 citations

Proceedings ArticleDOI
21 Aug 2011
TL;DR: An EMR mining system called EMRView is demonstrated that enables exploration of the precedence relationships to quickly identify and visualize partial order information encoded in key classes of patients.
Abstract: The standardization and wider use of electronic medical records (EMR) creates opportunities for better understanding patterns of illness and care within and across medical systems. Our interest is in the temporal history of event codes embedded in patients' records, specifically investigating frequently occurring sequences of event codes across patients. In studying data from more than 1.6 million patient histories at the University of Michigan Health system we quickly realized that frequent sequences, while providing one level of data reduction, still constitute a serious analytical challenge as many involve alternate serializations of the same sets of codes. To further analyze these sequences, we designed an approach where a partial order is mined from frequent sequences of codes. We demonstrate an EMR mining system called EMRView that enables exploration of the precedence relationships to quickly identify and visualize partial order information encoded in key classes of patients. We demonstrate some important nuggets learned through our approach and also outline key challenges for future research based on our experiences.

74 citations

Journal ArticleDOI
TL;DR: It is shown that the frequent episode mining methods from the field of temporal data mining can be very useful in this context and it is demonstrated that these methods are useful for unearthing patterns of neuronal network connectivity.
Abstract: Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent developments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.

70 citations

Journal ArticleDOI
01 Jan 2012
TL;DR: Four novel visual analytics methods are introduced to interactively examine motifs and gain new insights into the recurring patterns to analyze system operations, and both power consumption and server utilization in data centers are predicted.
Abstract: The detection of frequently occurring patterns, also called motifs, in data streams has been recognized as an important task. To find these motifs, we use an advanced event encoding and pattern discovery algorithm. As a large time series can contain hundreds of motifs, there is a need to support interactive analysis and exploration. In addition, for certain applications, such as data center resource management, service managers want to be able to predict the next day's power consumption from the previous months' data. For this purpose, we introduce four novel visual analytics methods: {i} motif layout - using colored rectangles for visualizing the occurrences and hierarchical relationships of motifs; {ii} motif distortion - enlarging or shrinking motifs for visualizing them more clearly; {iii} motif merging - combining a number of identical adjacent motif instances to simplify the display; and {iv} pattern preserving prediction - using a pattern-preserving smoothing and prediction algorithm to provide a reliable prediction for seasonal data. We have applied these methods to three real-world datasets: data center chilling utilization, oil well production, and system resource utilization. The results enable service managers to interactively examine motifs and gain new insights into the recurring patterns to analyze system operations. Using the above methods, we have also predicted both power consumption and server utilization in data centers with an accuracy of 70-80%.

50 citations

Patent
13 Jan 2010
TL;DR: In this article, a vehicle fault diagnosis and prognosis system includes a computing platform configured to receive a classifier from a remote server, the computing platform tangibly embodying computer-executable instructions for evaluating data sequences received from a vehicle control network and applying the classifier to the data sequences.
Abstract: A vehicle fault diagnosis and prognosis system includes a computing platform configured to receive a classifier from a remote server, the computing platform tangibly embodying computer-executable instructions for evaluating data sequences received from a vehicle control network and applying the classifier to the data sequences, wherein the classifier is configured to determine if the data sequences define a pattern that is associated with a particular fault.

47 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The analysis of time series: An Introduction, 4th edn. as discussed by the authors by C. Chatfield, C. Chapman and Hall, London, 1989. ISBN 0 412 31820 2.
Abstract: The Analysis of Time Series: An Introduction, 4th edn. By C. Chatfield. ISBN 0 412 31820 2. Chapman and Hall, London, 1989. 242 pp. £13.50.

1,583 citations

Journal Article
TL;DR: In this paper, an archaeal light-driven chloride pump (NpHR) was developed for temporally precise optical inhibition of neural activity, allowing either knockout of single action potentials, or sustained blockade of spiking.
Abstract: Our understanding of the cellular implementation of systems-level neural processes like action, thought and emotion has been limited by the availability of tools to interrogate specific classes of neural cells within intact, living brain tissue. Here we identify and develop an archaeal light-driven chloride pump (NpHR) from Natronomonas pharaonis for temporally precise optical inhibition of neural activity. NpHR allows either knockout of single action potentials, or sustained blockade of spiking. NpHR is compatible with ChR2, the previous optical excitation technology we have described, in that the two opposing probes operate at similar light powers but with well-separated action spectra. NpHR, like ChR2, functions in mammals without exogenous cofactors, and the two probes can be integrated with calcium imaging in mammalian brain tissue for bidirectional optical modulation and readout of neural activity. Likewise, NpHR and ChR2 can be targeted together to Caenorhabditis elegans muscle and cholinergic motor neurons to control locomotion bidirectionally. NpHR and ChR2 form a complete system for multimodal, high-speed, genetically targeted, all-optical interrogation of living neural circuits.

1,520 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: A novel scalable algorithm for time series subsequence all-pairs-similarity-search that computes the answer to the time series motif and time series discord problem as a side-effect, and incidentally provides the fastest known algorithm for both these extensively-studied problems.
Abstract: The all-pairs-similarity-search (or similarity join) problem has been extensively studied for text and a handful of other datatypes. However, surprisingly little progress has been made on similarity joins for time series subsequences. The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for two time series data mining problems, including motif discovery and novelty discovery.

452 citations

Journal ArticleDOI
TL;DR: This paper reviews the theoretical and experimental data-modeling literature, in large-scale data-intensive fields, and introduces new algorithmic approaches with the least memory requirements and processing to minimize computational cost, while maintaining/improving its predictive/classification accuracy and stability.

447 citations

Journal ArticleDOI
TL;DR: This paper presents a series of user-driven data simplifications that allow researchers to pare event records down to their core elements, and presents a novel metric for measuring visual complexity, and a language for codifying disjoint strategies into an overarching simplification framework.
Abstract: Electronic Health Records (EHRs) have emerged as a cost-effective data source for conducting medical research. The difficulty in using EHRs for research purposes, however, is that both patient selection and record analysis must be conducted across very large, and typically very noisy datasets. Our previous work introduced EventFlow, a visualization tool that transforms an entire dataset of temporal event records into an aggregated display, allowing researchers to analyze population-level patterns and trends. As datasets become larger and more varied, however, it becomes increasingly difficult to provide a succinct, summarizing display. This paper presents a series of user-driven data simplifications that allow researchers to pare event records down to their core elements. Furthermore, we present a novel metric for measuring visual complexity, and a language for codifying disjoint strategies into an overarching simplification framework. These simplifications were used by real-world researchers to gain new and valuable insights from initially overwhelming datasets.

262 citations