Fine-Grained Named Entity Recognition in Legal Documents
Elena Leitner,Georg Rehm,Julián Moreno-Schneider +2 more
- pp 272-287
Reads0
Chats0
TLDR
The work presented in this paper was carried out under the umbrella of the European project LYNX that develops a semantic platform that enables the development of various document processing and analysis applications for the legal domain.Abstract:
This paper describes an approach at Named Entity Recognition (NER) in German language documents from the legal domain. For this purpose, a dataset consisting of German court decisions was developed. The source texts were manually annotated with 19 semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The dataset consists of approx. 67,000 sentences and contains 54,000 annotated entities. The 19 fine-grained classes were automatically generalised to seven more coarse-grained classes (person, location, organization, legal norm, case-by-case regulation, court decision, and legal literature). Thus, the dataset includes two annotation variants, i.e., coarse- and fine-grained. For the task of NER, Conditional Random Fields (CRFs) and bidirectional Long-Short Term Memory Networks (BiLSTMs) were applied to the dataset as state of the art models. Three different models were developed for each of these two model families and tested with the coarse- and fine-grained annotations. The BiLSTM models achieve the best performance with an 95.46 F\(_1\) score for the fine-grained classes and 95.95 for the coarse-grained ones. The CRF models reach a maximum of 93.23 for the fine-grained classes and 93.22 for the coarse-grained ones. The work presented in this paper was carried out under the umbrella of the European project LYNX that develops a semantic platform that enables the development of various document processing and analysis applications for the legal domain.read more
Citations
More filters
Journal ArticleDOI
An end-to-end joint model for evidence information extraction from court record document
TL;DR: A novel end-to-end model is presented that adopts a shared encoder followed by separate decoders for the two tasks and can obtain 72.36% F1 score, outperforming previous methods and strong baselines by a large margin.
Journal ArticleDOI
A comparative study of automated legal text classification using random forests and deep learning
TL;DR: In this paper , a machine learning algorithm using domain concepts as features and random forests as the classifier was proposed for U.S. legal text classification, which significantly outperformed a deep learning system built on multiple pre-trained word embeddings and deep neural networks.
Journal ArticleDOI
A comparative study of automated legal text classification using random forests and deep learning
TL;DR: In this article, a machine learning algorithm using domain concepts as features and random forests as the classifier was proposed for U.S. legal text classification, which significantly outperformed a deep learning system built on multiple pre-trained word embeddings and deep neural networks.
Proceedings Article
A Dataset of German Legal Documents for Named Entity Recognition
TL;DR: A dataset developed for Named Entity Recognition in German federal court decisions that consists of approx.
Journal ArticleDOI
Named Entity Recognition in the Romanian Legal Domain
TL;DR: This work presents a named entity recognition system for the Romanian legal domain that makes use of the gold annotated LegalNERo corpus and combines multiple distributional representations of words, including word embeddings trained on a large legal domain corpus.
References
More filters
Proceedings ArticleDOI
Neural Architectures for Named Entity Recognition
TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.
Proceedings ArticleDOI
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.
Proceedings ArticleDOI
Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling
TL;DR: By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference.
Posted Content
Bidirectional LSTM-CRF Models for Sequence Tagging
Zhiheng Huang,Wei Xu,Kai Yu +2 more
TL;DR: This work is the first to apply a bidirectional LSTM CRF model to NLP benchmark sequence tagging data sets and it is shown that the BI-LSTM-CRF model can efficiently use both past and future input features thanks to a biddirectional L STM component.
Proceedings ArticleDOI
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
Xuezhe Ma,Eduard Hovy +1 more
TL;DR: This paper used a combination of bidirectional LSTM, CNN and CRF for sequence labeling tasks, and achieved state-of-the-art performance on both datasets for POS tagging and CoNLL 2003 corpus for NER.