Institution
University of Paderborn
Education•Paderborn, Nordrhein-Westfalen, Germany•
About: University of Paderborn is a education organization based out in Paderborn, Nordrhein-Westfalen, Germany. It is known for research contribution in the topics: Control reconfiguration & Software. The organization has 6684 authors who have published 16929 publications receiving 323154 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This article presents a discussion on eight open challenges for data stream mining, which cover the full cycle of knowledge discovery and involve such problems as protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms.
Abstract: Every day, huge volumes of sensory, transactional, and web data are continuously generated as streams, which need to be analyzed online as they arrive. Streaming data can be considered as one of the main sources of what is called big data. While predictive modeling for data streams and big data have received a lot of attention over the last decade, many research approaches are typically designed for well-behaved controlled problem settings, overlooking important challenges imposed by real-world applications. This article presents a discussion on eight open challenges for data stream mining. Our goal is to identify gaps between current research and meaningful applications, highlight open problems, and define new application-relevant research directions for data stream mining. The identified challenges cover the full cycle of knowledge discovery and involve such problems as: protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream mining algorithms. The resulting analysis is illustrated by practical applications and provides general suggestions concerning lines of future research in data stream mining.
260 citations
•
16 Jan 2010TL;DR: A new k -means clustering algorithm for data streams of points from a Euclidean space that provides a good alternative to BIRCH and StreamLS, in particular, if the number of cluster centers is large.
Abstract: We develop a new k-means clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we use a non-uniform sampling approach similar to the k-means++ seeding procedure to obtain small coresets from the data stream. This construction is rather easy to implement and, unlike other coreset constructions, its running time has only a low dependency on the dimensionality of the data. Second, we propose a new data structure which we call a coreset tree. The use of these coreset trees significantly speeds up the time necessary for the non-uniform sampling during our coreset construction.
We compare our algorithm experimentally with two well-known streaming implementations (BIRCH [16] and StreamLS [4, 9]). In terms of quality (sum of squared errors), our algorithm is comparable with StreamLS and significantly better than BIRCH (up to a factor of 2). In terms of running time, our algorithm is slower than BIRCH. Comparing the running time with StreamLS, it turns out that our algorithm scales much better with increasing number of centers. We conclude that, if the first priority is the quality of the clustering, then our algorithm provides a good alternative to BIRCH and StreamLS, in particular, if the number of cluster centers is large.
We also give a theoretical justification of our approach by proving that our sample set is a small coreset in low dimensional spaces.
257 citations
••
TL;DR: In this article, the state of a qubit encoded in the polarization state was demonstrated from a telecom-wavelength photon to a solid-state quantum memory via 24.8 km of optical fibre.
Abstract: Quantum teleportation of the state of a qubit encoded in the polarization state is demonstrated from a telecom-wavelength photon to a solid-state quantum memory via 24.8 km of optical fibre. It is the longest distance ever reached in a teleportation experiment involving a quantum memory.
256 citations
••
TL;DR: This paper discusses the multi-depot, multi-vehicle-type bus scheduling problem (MDVSP), involving multiple depots for vehicles and different vehicle types for timetabled trips, and uses time–space-based instead of connection-based networks for MDVSP modeling.
256 citations
••
TL;DR: In this article, a preliminary phytolith classification scheme was used in soil phytochemical counting procedures to produce typical opal phytophytolith spectra for comparison. But the method was not suitable to describe and characterize rain forest and grassland vegetation.
255 citations
Authors
Showing all 6872 results
Name | H-index | Papers | Citations |
---|---|---|---|
Martin Karplus | 163 | 831 | 138492 |
Marco Dorigo | 105 | 657 | 91418 |
Robert W. Boyd | 98 | 1161 | 37321 |
Thomas Heine | 84 | 423 | 24210 |
Satoru Miyano | 84 | 811 | 38723 |
Wen-Xiu Ma | 83 | 420 | 20702 |
Jörg Neugebauer | 81 | 491 | 30909 |
Thomas Lengauer | 80 | 477 | 34430 |
Gotthard Seifert | 80 | 445 | 26136 |
Reshef Tenne | 74 | 529 | 24717 |
Tim Meyer | 74 | 548 | 24784 |
Qiang Cui | 71 | 292 | 20655 |
Thomas Frauenheim | 70 | 451 | 17887 |
Walter Richtering | 67 | 332 | 14866 |
Marcus Elstner | 67 | 209 | 18960 |