scispace - formally typeset
Open Access

Multilingual Content Extraction Extended with Background Knowledge for Military Intelligence

TLDR
The combined deep and shallow parsing approach with Head-driven Phrase Structured Grammars, the inference process is introduced and it is shown how background knowledge is integrated into the logical inferences to increase the extent, quality, and accuracy of the content extraction.
Abstract
: Written information for military purposes is available in abundance. Documents are written in many languages. The question is how we can automate the content extraction of these documents. One possible approach is based on shallow parsing (information extraction) with application specific combination of analysis results. One example of this, the ZENON research system, does a partial content analysis of some English, Dari, and Tajik texts. Another principal approach for content extraction is based on a combination of deep and shallow parsing with logical inferences on the analysis results. In the project "Multilingual content analysis with semantic inference on military relevant texts" (mIE) we followed the second approach. In this paper, we present the results of the mIE project. First, we briefly contrast the ZENON project to the mIE project. In the main part of the paper, the mIE project is presented. After explaining the combined deep and shallow parsing approach with Head-driven Phrase Structured Grammars, the inference process is introduced. Then we show how background knowledge (WordNet, YAGO) is integrated into the logical inferences to increase the extent, quality, and accuracy of the content extraction. The prototype also is presented. The presentation includes briefing charts.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

NLP as an essential ingredient of effective OSINT frameworks

TL;DR: This work has conceptualized an analysis framework with a strong focus on various techniques of natural language processing to aggregate, manipulate, and analyze intelligence information.
Proceedings Article

Automatic exploitation of multilingual information for military intelligence purposes

TL;DR: It is argued that multilingual NLP technology can strongly support military operations.
References
More filters
Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum
- 01 Sep 2000 - 
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
ReportDOI

Building a large annotated corpus of English: the penn treebank

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Book ChapterDOI

DBpedia: a nucleus for a web of open data

TL;DR: The extraction of the DBpedia datasets is described, and how the resulting information is published on the Web for human-andmachine-consumption and how DBpedia could serve as a nucleus for an emerging Web of open data.
Posted Content

TnT - A Statistical Part-of-Speech Tagger

TL;DR: Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger as mentioned in this paper, which is based on Markov models and has been shown to perform at least as well as other current approaches, including the Maximum Entropy framework.
Proceedings ArticleDOI

TnT -- A Statistical Part-of-Speech Tagger

TL;DR: Contrary to claims found elsewhere in the literature, it is argued that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework.
Related Papers (5)