Multilingual Content Extraction Extended with Background Knowledge for Military Intelligence

Open Access

Multilingual Content Extraction Extended with Background Knowledge for Military Intelligence

TLDR

The combined deep and shallow parsing approach with Head-driven Phrase Structured Grammars, the inference process is introduced and it is shown how background knowledge is integrated into the logical inferences to increase the extent, quality, and accuracy of the content extraction.

Abstract:

: Written information for military purposes is available in abundance. Documents are written in many languages. The question is how we can automate the content extraction of these documents. One possible approach is based on shallow parsing (information extraction) with application specific combination of analysis results. One example of this, the ZENON research system, does a partial content analysis of some English, Dari, and Tajik texts. Another principal approach for content extraction is based on a combination of deep and shallow parsing with logical inferences on the analysis results. In the project "Multilingual content analysis with semantic inference on military relevant texts" (mIE) we followed the second approach. In this paper, we present the results of the mIE project. First, we briefly contrast the ZENON project to the mIE project. In the main part of the paper, the mIE project is presented. After explaining the combined deep and shallow parsing approach with Head-driven Phrase Structured Grammars, the inference process is introduced. Then we show how background knowledge (WordNet, YAGO) is integrated into the logical inferences to increase the extent, quality, and accuracy of the content extraction. The prototype also is presented. The presentation includes briefing charts.

Multilingual Content Extraction Extended with Background Knowledge for Military Intelligence

Citations

NLP as an essential ingredient of effective OSINT frameworks

Automatic exploitation of multilingual information for military intelligence purposes

References

WordNet : an electronic lexical database

Building a large annotated corpus of English: the penn treebank

DBpedia: a nucleus for a web of open data

TnT - A Statistical Part-of-Speech Tagger

TnT -- A Statistical Part-of-Speech Tagger

Related Papers (5)

A friendly merger of conceptual expectations and linguistic analysis in a text processing system

LinguaKit: A Big Data-Based Multilingual Tool for Linguistic Analysis and Information Extraction

Trends in Parsing Technology: Dependency Parsing, Domain Adaptation, and Deep Parsing

Context Dependent Semantic Parsing: A Survey

Natural Language Processing and Information Systems: 15th International Conference on Applications of Natural Language to Information Systems, Cardiff, UK, June 2010, proceedings