scispace - formally typeset
Search or ask a question
Author

Ulrich Meyer

Other affiliations: Max Planck Society
Bio: Ulrich Meyer is an academic researcher from Goethe University Frankfurt. The author has contributed to research in topics: Shortest path problem & Time complexity. The author has an hindex of 27, co-authored 137 publications receiving 3036 citations. Previous affiliations of Ulrich Meyer include Max Planck Society.


Papers
More filters
Posted Content
TL;DR: EM-LFR is presented, the first external memory algorithm able to generate massive complex networks following the LFR benchmark and evidence that both implementations yield graphs with matching properties by applying clustering algorithms to generated instances is given.
Abstract: LFR is a popular benchmark graph generator used to evaluate community detection algorithms. We present EM-LFR, the first external memory algorithm able to generate massive complex networks following the LFR benchmark. Its most expensive component is the generation of random graphs with prescribed degree sequences which can be divided into two steps: the graphs are first materialized deterministically using the Havel-Hakimi algorithm, and then randomized. Our main contributions are EM-HH and EM-ES, two I/O-efficient external memory algorithms for these two steps. We also propose EM-CM/ES, an alternative sampling scheme using the Configuration Model and rewiring steps to obtain a random simple graph. In an experimental evaluation we demonstrate their performance; our implementation is able to handle graphs with more than 37 billion edges on a single machine, is competitive with a massive parallel distributed algorithm, and is faster than a state-of-the-art internal memory implementation even on instances fitting in main memory. EM-LFR's implementation is capable of generating large graph instances orders of magnitude faster than the original implementation. We give evidence that both implementations yield graphs with matching properties by applying clustering algorithms to generated instances. Similarly, we analyse the evolution of graph properties as EM-ES is executed on networks obtained with EM-CM/ES and find that the alternative approach can accelerate the sampling process.

3 citations

Journal ArticleDOI
TL;DR: For this special issue on “Algorithms for Big Data”, German researchers who conduct research on theoretical boundaries of big data as well as the realization of endto-end data processing systems are invited.
Abstract: The vast amount of existing data in various fields of industry, such as health, finance, and automotives, and its fast growth through social networks, sensors, and smart devices makes continuous research on the impact, opportunities, and boundaries of Big Data necessary and inevitable. At the same time, distributed processing systems, such as Hadoop, Flink, and Spark allow engineers to create data processing software that can handle large volumes of data and fast paced streams. In order to achieve the best possible speedups and scalability, however, new algorithmic insights and their efficient implementation are crucial, too. Furthermore, current research still tries to overcome challenging dimensions, such as variety and veracity of data. Also data privacy is becoming of significant importance by the day. In Germany, several Big Data projects and initiatives try to tackle Big Data problems in a focused manner. For example, the priority programme DFG-SPP 1736 on Algorithms for Big Data has been funding various projects in the targeting technological challenges, fundamental algorithmic techniques, and applications. The Federal Ministry of Education and Research (BMBF) is expanding its funding for BigData research from two competence centers for Big Data: the Berlin Big Data Center (BBDC) and the Competence Center on Scalable Data Solutions and Services (ScaDS) Dresden/Leipzig to several AI competence centers throughout Germany now also in Tübingen, Darmstadt, and Munich. For this special issue we have invited contributions from German researchers who conduct research on theoretical boundaries of bigdata aswell the realizationof endto-end data processing systems. After careful reviewing by several experts and revision of the papers, we have finally accepted the following seven contributions for this special issue on “Algorithms for Big Data”. – “Dictionary learning for transcriptomics data reveals type-specific gene modules in a multi-class setting”

3 citations

Journal ArticleDOI
TL;DR: A previous survey on results for large-scale graph generation obtained within the DFG priority programme SPP 1736 (Algorithms for Big Data); to this end, the scope is broadened and includes recently published results.
Abstract: Abstract The selection of input data is a crucial step in virtually every empirical study. Experimental campaigns in algorithm engineering, experimental algorithmics, network analysis, and many other fields often require suited network data. In this context, synthetic graphs play an important role, as data sets of observed networks are typically scarce, biased, not sufficiently understood, and may pose logistic and legal challenges. Just like processing huge graphs becomes challenging in the big data setting, new algorithmic approaches are necessary to generate such massive instances efficiently. Here, we update our previous survey [35] on results for large-scale graph generation obtained within the DFG priority programme SPP 1736 (Algorithms for Big Data); to this end, we broaden the scope and include recently published results.

3 citations

01 Jan 1994
TL;DR: In this paper, the authors considered the permutation routing problem on two-dimensional n/spl times/n meshes and obtained a near-optimal result: T = 2/spl middot/n+/spl Oscr/(1) with Q = 2.
Abstract: We consider the permutation routing problem on two-dimensional n/spl times/n meshes. To be practical, a routing algorithm is required to ensure very small queue sizes Q, and very low running time T, not only asymptotically but particularly also for the practically important n up to 1000. With a technique inspired by a scheme of Kaklamanis/Krizanc/Rao, we obtain a near-optimal result: T=2/spl middot/n+/spl Oscr/(1) with Q=2. Although Q is very attractive now, the lower order terms in T make this algorithm highly impractical. Therefore we present simple schemes which are asymptotically slower, but have T/spl sime/3/spl middot/n for all n and Q between 2 and 8. >

3 citations

Book ChapterDOI
05 Jan 2015
TL;DR: This paper focuses on engineering an I/O-efficient distance oracle for large graphs that model real-world interactions, and creates small oracle labels that can be kept in internal memory for rather large graphs but also efficiently handles the case when both the graph and these labels have to reside on external storage.
Abstract: Computing shortest path distance is a fundamental primitive in many graph applications. On graphs that do not fit in the main memory of the computing device, computing such distances requires hours to months even with the best I/O-efficient shortest path implementations. For applications requiring many such shortest path distances, one would ideally like to preprocess the input graph into a space-efficient data structure I/O-efficiently, such that the distance queries can be answered with a small additive distortion using only O(1) I/Os. Furthermore, in a batch setting, one would like to answer O(n) such distance queries in O (n/B) I/Os. In this paper, we focus on engineering an I/O-efficient distance oracle for large graphs that model real-world interactions. Our engineered oracle (i) preprocesses graphs with multi-billion edges in less than an hour using a single core of a typical PC, (ii) answers online shortest path queries in milliseconds using a SSD, (iii) answers batched shortest path queries using HDDs with an average time per query of a few microseconds, (iv) results in a highly accurate shortest path estimate and (v) uses space linear in the number of nodes. Our implementation creates small oracle labels (i.e., they can still be kept in internal memory for rather large graphs) but also efficiently handles the case when both the graph and these labels have to reside on external storage. Dynamic settings where new edges are continuously inserted into the graph are efficiently supported, too.

3 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Proceedings ArticleDOI
06 Jun 2010
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Abstract: Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.

3,840 citations

Journal ArticleDOI
TL;DR: It is shown that the full set of hydromagnetic equations admit five more integrals, besides the energy integral, if dissipative processes are absent, which made it possible to formulate a variational principle for the force-free magnetic fields.
Abstract: where A represents the magnetic vector potential, is an integral of the hydromagnetic equations. This -integral made it possible to formulate a variational principle for the force-free magnetic fields. The integral expresses the fact that motions cannot transform a given field in an entirely arbitrary different field, if the conductivity of the medium isconsidered infinite. In this paper we shall show that the full set of hydromagnetic equations admit five more integrals, besides the energy integral, if dissipative processes are absent. These integrals, as we shall presently verify, are I2 =fbHvdV, (2)

1,858 citations

Book
02 Jan 1991

1,377 citations