Home
/
Authors
/
Max Jakob

Author

Max Jakob

Bio: Max Jakob is an academic researcher from Free University of Berlin. The author has contributed to research in topics: Ontology (information science) & Linked data. The author has an hindex of 6, co-authored 9 publications receiving 3940 citations. Previous affiliations of Max Jakob include Saarland University.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia

[...]

Jens Lehmann¹, Robert Isele, Max Jakob, Anja Jentzsch², Dimitris Kontokostas¹, Pablo N. Mendes³, Sebastian Hellmann¹, Mohamed Morsey¹, Patrick van Kleef⁴, Sören Auer¹, Sören Auer⁵, Christian Bizer⁶ - Show less +8 more•Institutions (6)

Leipzig University¹, Hasso Plattner Institute², Wright State University³, OpenLink Software⁴, University of Bonn⁵, University of Mannheim⁶

01 Jan 2015-Social Work

TL;DR: An overview of the DBpedia community project is given, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications, including DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud.

...read moreread less

Abstract: The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and makes it freely available on the Web using Semantic Web and Linked Data technologies. The project extracts knowledge from 111 different language editions of Wikipedia. The largest DBpedia knowledge base which is extracted from the English edition of Wikipedia consists of over 400 million facts that describe 3.7 million things. The DBpedia knowledge bases that are extracted from the other 110 Wikipedia editions together consist of 1.46 billion facts and describe 10 million additional things. The DBpedia project maps Wikipedia infoboxes from 27 different language editions to a single shared ontology consisting of 320 classes and 1,650 properties. The mappings are created via a world-wide crowd-sourcing effort and enable knowledge from the different Wikipedia editions to be combined. The project publishes releases of all DBpedia knowledge bases for download and provides SPARQL query access to 14 out of the 111 language editions via a global network of local DBpedia chapters. In addition to the regular releases, the project maintains a live knowledge base which is updated whenever a page in Wikipedia changes. DBpedia sets 27 million RDF links pointing into over 30 external data sources and thus enables data from these sources to be used together with DBpedia data. Several hundred data sets on the Web publish RDF links pointing to DBpedia themselves and make DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud. In this system report, we give an overview of the DBpedia community project, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications.

...read moreread less

2,856 citations

Proceedings Article•DOI•

DBpedia spotlight: shedding light on the web of documents

[...]

Pablo N. Mendes¹, Max Jakob¹, Andres Garcia-Silva², Christian Bizer¹•Institutions (2)

Free University of Berlin¹, Technical University of Madrid²

07 Sep 2011

TL;DR: DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs, is developed, and results are evaluated in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of the system.

...read moreread less

Abstract: Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.

...read moreread less

1,228 citations

Proceedings Article•DOI•

Improving efficiency and accuracy in multilingual entity extraction

[...]

Joachim Daiber¹, Max Jakob, Chris Hokamp², Pablo N. Mendes³•Institutions (3)

University of Groningen¹, University of North Texas², Wright State University³

04 Sep 2013

TL;DR: This paper discusses some implementation and data processing challenges encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure, and compares the solution to the previous system.

...read moreread less

Abstract: There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text.

...read moreread less

529 citations

Proceedings Article•

DBpedia: A Multilingual Cross-domain Knowledge Base

[...]

Pablo N. Mendes¹, Max Jakob¹, Christian Bizer¹•Institutions (1)

Free University of Berlin¹

01 May 2012

TL;DR: This paper describes the general DBpedia knowledge base and as well as the DBpedia data sets that specifically aim at supporting computational linguistics tasks that include Entity Linking, Word Sense Disambiguation, Question Answering, Slot Filling and Relationship Extraction.

...read moreread less

Abstract: The DBpedia project extracts structured information from Wikipedia editions in 97 different languages and combines this information into a large multi-lingual knowledge base covering many specific domains and general world knowledge. The knowledge base contains textual descriptions (titles and abstracts) of concepts in up to 97 languages. It also contains structured knowledge that has been extracted from the infobox systems of Wikipedias in 15 different languages and is mapped onto a single consistent ontology by a community effort. The knowledge base can be queried using the SPARQL query language and all its data sets are freely available for download. In this paper, we describe the general DBpedia knowledge base and as well as the DBpedia data sets that specifically aim at supporting computational linguistics tasks. These task include Entity Linking, Word Sense Disambiguation, Question Answering, Slot Filling and Relationship Extraction. These use cases are outlined, pointing at added value that the structured data of DBpedia provides.

...read moreread less

167 citations

Proceedings Article•DOI•

Multipedia: enriching DBpedia with multimedia information

[...]

Andres Garcia-Silva¹, Max Jakob², Pablo N. Mendes², Christian Bizer²•Institutions (2)

Technical University of Madrid¹, Free University of Berlin²

26 Jun 2011

TL;DR: This paper addresses the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines by tapping into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images.

...read moreread less

Abstract: Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%.

...read moreread less

12 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•

Character-level convolutional networks for text classification

[...]

Xiang Zhang¹, Junbo Zhao¹, Yann LeCun¹•Institutions (1)

Courant Institute of Mathematical Sciences¹

07 Dec 2015

TL;DR: In this paper, the use of character-level convolutional networks (ConvNets) for text classification has been explored and compared with traditional models such as bag of words, n-grams and their TFIDF variants.

...read moreread less

Abstract: This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.

...read moreread less

3,052 citations

Journal Article•DOI•

DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia

[...]

Leipzig University¹, Hasso Plattner Institute², Wright State University³, OpenLink Software⁴, University of Bonn⁵, University of Mannheim⁶

01 Jan 2015-Social Work

...read moreread less

2,856 citations

Posted Content•

Character-level Convolutional Networks for Text Classification

[...]

Xiang Zhang¹, Junbo Zhao¹, Yann LeCun²•Institutions (2)

Courant Institute of Mathematical Sciences¹, New York University²

04 Sep 2015-arXiv: Learning

TL;DR: This article constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results in text classification.

...read moreread less

1,963 citations

Journal Article•DOI•

Knowledge Graph Embedding: A Survey of Approaches and Applications

[...]

Quan Wang¹, Zhendong Mao¹, Bin Wang¹, Li Guo¹•Institutions (1)

Chinese Academy of Sciences¹

01 Dec 2017-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This article provides a systematic review of existing techniques of Knowledge graph embedding, including not only the state-of-the-arts but also those with latest trends, based on the type of information used in the embedding task.

...read moreread less

Abstract: Knowledge graph (KG) embedding is to embed components of a KG including entities and relations into continuous vector spaces, so as to simplify the manipulation while preserving the inherent structure of the KG. It can benefit a variety of downstream tasks such as KG completion and relation extraction, and hence has quickly gained massive attention. In this article, we provide a systematic review of existing techniques, including not only the state-of-the-arts but also those with latest trends. Particularly, we make the review based on the type of information used in the embedding task. Techniques that conduct embedding using only facts observed in the KG are first introduced. We describe the overall framework, specific model design, typical training procedures, as well as pros and cons of such techniques. After that, we discuss techniques that further incorporate additional information besides facts. We focus specifically on the use of entity types, relation paths, textual descriptions, and logical rules. Finally, we briefly introduce how KG embedding can be applied to and benefit a wide variety of downstream tasks such as KG completion, relation extraction, question answering, and so forth.

...read moreread less

1,905 citations

Journal Article•DOI•

Knowledge graph refinement: A survey of approaches and evaluation methods

[...]

Heiko Paulheim¹•Institutions (1)

University of Mannheim¹

06 Dec 2016-Social Work

TL;DR: A survey of such knowledge graph refinement approaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.

...read moreread less

Abstract: In the recent years, different Web knowledge graphs, both free and commercial, have been created. While Google coined the term "Knowledge Graph" in 2012, there are also a few openly available knowledge graphs, with DBpedia, YAGO, and Freebase being among the most prominent ones. Those graphs are often constructed from semi-structured knowledge, such as Wikipedia, or harvested from the web with a combination of statistical and linguistic methods. The result are large-scale knowledge graphs that try to make a good trade-off between completeness and correctness. In order to further increase the utility of such knowledge graphs, various refinement methods have been proposed, which try to infer and add missing knowledge to the graph, or identify erroneous pieces of information. In this article, we provide a survey of such knowledge graph refinement approaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.

...read moreread less

915 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse