Learning to Predict Charges for Criminal Cases with Legal Basis

doi:10.18653/V1/D17-1289

Home
/
Papers
/
Learning to Predict Charges for Criminal Cases with Legal Basis

Proceedings Article•DOI•

Learning to Predict Charges for Criminal Cases with Legal Basis

Bingfeng Luo¹, Yansong Feng¹, Jianbo Xu, Xiang Zhang, Dongyan Zhao² - Show less +1 more•Institutions (2)

Peking University¹, Michigan State University²

01 Sep 2017-pp 2727-2736

TL;DR: In this paper, an attention-based neural network method was proposed to jointly model the charge prediction task and the relevant article extraction task in a unified framework, which can effectively predict appropriate charges for cases with different expression styles.

read less

Abstract: The charge prediction task is to determine appropriate charges for a given case, which is helpful for legal assistant systems where the user input is fact description. We argue that relevant law articles play an important role in this task, and therefore propose an attention-based neural network method to jointly model the charge prediction task and the relevant article extraction task in a unified framework. The experimental results show that, besides providing legal basis, the relevant articles can also clearly improve the charge prediction results, and our full model can effectively predict appropriate charges for cases with different expression styles.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Legal Judgment Prediction via Topological Learning

[...]

Haoxi Zhong¹, Zhipeng Guo, Cunchao Tu¹, Chaojun Xiao¹, Zhiyuan Liu¹, Maosong Sun¹ - Show less +2 more•Institutions (1)

Tsinghua University¹

01 Jan 2018

TL;DR: This work formalizes the dependencies among subtasks as a Directed Acyclic Graph (DAG) and proposes a topological multi-task learning framework, TopJudge, which incorporates multiple subtasks and DAG dependencies into judgment prediction.

...read moreread less

Abstract: Legal Judgment Prediction (LJP) aims to predict the judgment result based on the facts of a case and becomes a promising application of artificial intelligence techniques in the legal field In real-world scenarios, legal judgment usually consists of multiple subtasks, such as the decisions of applicable law articles, charges, fines, and the term of penalty Moreover, there exist topological dependencies among these subtasks While most existing works only focus on a specific subtask of judgment prediction and ignore the dependencies among subtasks, we formalize the dependencies among subtasks as a Directed Acyclic Graph (DAG) and propose a topological multi-task learning framework, TopJudge, which incorporates multiple subtasks and DAG dependencies into judgment prediction We conduct experiments on several real-world large-scale datasets of criminal cases in the civil law system Experimental results show that our model achieves consistent and significant improvements over baselines on all judgment prediction tasks The source code can be obtained from https://githubcom/thunlp/TopJudge

...read moreread less

198 citations

Proceedings Article•DOI•

Neural legal judgment prediction in English

[...]

Ilias Chalkidis¹, Ion Androutsopoulos¹, Nikolaos Aletras²•Institutions (2)

Athens University of Economics and Business¹, University of Sheffield²

01 Jul 2019

TL;DR: This paper proposed a new English legal judgment prediction dataset, containing cases from the European Court of Human Rights, and evaluated a broad variety of neural models on the new dataset, establishing strong baselines that surpass previous feature-based models in three tasks: (1) binary violation classification, (2) multi-label classification, and (3) case importance prediction.

...read moreread less

Abstract: Legal judgment prediction is the task of automatically predicting the outcome of a court case, given a text describing the case's facts. Previous work on using neural models for this task has focused on Chinese; only feature-based models (e.g., using bags of words and topics) have been considered in English. We release a new English legal judgment prediction dataset, containing cases from the European Court of Human Rights. We evaluate a broad variety of neural models on the new dataset, establishing strong baselines that surpass previous feature-based models in three tasks: (1) binary violation classification; (2) multi-label classification; (3) case importance prediction. We also explore if models are biased towards demographic information via data anonymization. As a side-product, we propose a hierarchical version of BERT, which bypasses BERT's length limitation.

...read moreread less

160 citations

Proceedings Article•

Few-Shot Charge Prediction with Discriminative Legal Attributes

[...]

Zikun Hu¹, Li Xiang, Cunchao Tu², Zhiyuan Liu², Maosong Sun² - Show less +1 more•Institutions (2)

National University of Singapore¹, Tsinghua University²

01 Aug 2018

TL;DR: This work proposes an attribute-attentive charge prediction model to infer the attributes and charges simultaneously and achieves significant and consistent improvements than other state-of-the-art baselines in the few-shot scenario.

...read moreread less

Abstract: Automatic charge prediction aims to predict the final charges according to the fact descriptions in criminal cases and plays a crucial role in legal assistant systems. Existing works on charge prediction perform adequately on those high-frequency charges but are not yet capable of predicting few-shot charges with limited cases. Moreover, these exist many confusing charge pairs, whose fact descriptions are fairly similar to each other. To address these issues, we introduce several discriminative attributes of charges as the internal mapping between fact descriptions and charges. These attributes provide additional information for few-shot charges, as well as effective signals for distinguishing confusing charges. More specifically, we propose an attribute-attentive charge prediction model to infer the attributes and charges simultaneously. Experimental results on real-work datasets demonstrate that our proposed model achieves significant and consistent improvements than other state-of-the-art baselines. Specifically, our model outperforms other baselines by more than 50% in the few-shot scenario. Our codes and datasets can be obtained from https://github.com/thunlp/attribute_charge.

...read moreread less

147 citations

Proceedings Article•DOI•

How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence

[...]

Haoxi Zhong¹, Chaojun Xiao¹, Cunchao Tu¹, Tianyang Zhang¹, Zhiyuan Liu¹, Maosong Sun¹ - Show less +2 more•Institutions (1)

Tsinghua University¹

25 Apr 2020

TL;DR: In this article, the authors introduce the history, the current state, and the future directions of research in LegalAI, and illustrate the tasks from the perspectives of legal professionals and NLP researchers and show several representative applications in legalAI.

...read moreread less

Abstract: Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain. In recent years, LegalAI has drawn increasing attention rapidly from both AI researchers and legal professionals, as LegalAI is beneficial to the legal system for liberating legal professionals from a maze of paperwork. Legal professionals often think about how to solve tasks from rule-based and symbol-based methods, while NLP researchers concentrate more on data-driven and embedding methods. In this paper, we introduce the history, the current state, and the future directions of research in LegalAI. We illustrate the tasks from the perspectives of legal professionals and NLP researchers and show several representative applications in LegalAI. We conduct experiments and provide an in-depth analysis of the advantages and disadvantages of existing works to explore possible future directions. You can find the implementation of our work from https://github.com/thunlp/CLAIM.

...read moreread less

84 citations

Posted Content•

How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence

[...]

Haoxi Zhong¹, Chaojun Xiao¹, Cunchao Tu¹, Tianyang Zhang¹, Zhiyuan Liu¹, Maosong Sun¹ - Show less +2 more•Institutions (1)

Tsinghua University¹

25 Apr 2020-arXiv: Computation and Language

TL;DR: The history, the current state, and the future directions of research in LegalAI are introduced and an in-depth analysis of the advantages and disadvantages of existing works is provided to explore possible future directions.

...read moreread less

82 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Distributed Representations of Words and Phrases and their Compositionality

[...]

Tomas Mikolov¹, Ilya Sutskever¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹ - Show less +1 more•Institutions (1)

Google¹

05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

...read moreread less

24,012 citations

Proceedings Article•

Neural Machine Translation by Jointly Learning to Align and Translate

[...]

Dzmitry Bahdanau¹, Kyunghyun Cho², Yoshua Bengio²•Institutions (2)

Jacobs University Bremen¹, Université de Montréal²

01 Jan 2015

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

...read moreread less

20,027 citations

Proceedings Article•DOI•

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

[...]

Kyunghyun Cho¹, Bart van Merriënboer², Caglar Gulcehre², Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio³, Yoshua Bengio⁴, Yoshua Bengio⁵ - Show less +5 more•Institutions (5)

Aalto University¹, Université de Montréal², École Polytechnique de Montréal³, Alcatel-Lucent⁴, AT&T⁵

01 Jan 2014

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

Abstract: In this paper, we propose a novel neural network model called RNN Encoder‐ Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder‐Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

...read moreread less

19,998 citations

Proceedings Article•DOI•

Convolutional Neural Networks for Sentence Classification

[...]

Yoon Kim¹•Institutions (1)

New York University¹

25 Aug 2014

TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.

...read moreread less

Abstract: We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.

...read moreread less

9,776 citations

Proceedings Article•DOI•

Hierarchical Attention Networks for Document Classification

[...]

Zichao Yang¹, Diyi Yang¹, Chris Dyer¹, Xiaodong He², Alexander J. Smola¹, Eduard Hovy¹ - Show less +2 more•Institutions (2)

Carnegie Mellon University¹, Microsoft²

13 Jun 2016

TL;DR: Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin.

...read moreread less

Abstract: We propose a hierarchical attention network for document classification. Our model has two distinctive characteristics: (i) it has a hierarchical structure that mirrors the hierarchical structure of documents; (ii) it has two levels of attention mechanisms applied at the wordand sentence-level, enabling it to attend differentially to more and less important content when constructing the document representation. Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin. Visualization of the attention layers illustrates that the model selects qualitatively informative words and sentences.

...read moreread less

4,282 citations