scispace - formally typeset
Search or ask a question
Author

Anne-Laure Ligozat

Bio: Anne-Laure Ligozat is an academic researcher from École Normale Supérieure. The author has contributed to research in topics: Question answering & Annotation. The author has an hindex of 16, co-authored 87 publications receiving 781 citations. Previous affiliations of Anne-Laure Ligozat include Centre national de la recherche scientifique & Université Paris-Saclay.


Papers
More filters
Journal ArticleDOI
TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).
Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

407 citations

Proceedings ArticleDOI
27 Apr 2014
TL;DR: This approach is based on the study of two parallel corpora and aims to identify the linguistic phenomena involved in the manual simplification of French texts and organise them within a typology to generate simplified sentences.
Abstract: This paper presents a method for the syntactic simplification of French texts. Syntactic simplification aims at making texts easier to understand by simplifying complex syntactic structures that hinder reading. Our approach is based on the study of two parallel corpora (encyclopaedia articles and tales). It aims to identify the linguistic phenomena involved in the manual simplification of French texts and organise them within a typology. We then propose a syntactic simplification system that relies on this typology to generate simplified sentences. The module starts by generating all possible variants before selecting the best subset. The evaluation shows that about 80% of the simplified sentences produced by our system are accurate.

62 citations

Journal ArticleDOI
TL;DR: The authors confirm that the use of only machine-learning methods is highly dependent on the annotated training data, and thus obtained better results for well-represented classes.

61 citations

12 Nov 2010
TL;DR: This year’s i2b2/VA challenge is dedicated to medical concept extraction as well as the annotation of assertions and relationships of concepts, mainly based upon machine-learning systems.
Abstract: This year’s i2b2/VA challenge is dedicated to medical concept extraction as well as the annotation of assertions and relationships of concepts. Several kinds of concepts, assertions, and relations must be processed. In this paper, we present the methods we used, mainly based upon machine-learning systems. The results we obtained on the final ground truth (Fmeasures up to 0.773 for concepts, 0.931 for assertions, and 0.709 for relations) constitute a basis for

40 citations

Journal ArticleDOI
01 Jun 2018
TL;DR: A corpus of clinical narratives in French annotated for linguistic, semantic and structural information, aimed at clinical information extraction is presented and harmonization tools to automatically identify annotation differences to be addressed to improve the overall corpus quality are introduced.
Abstract: Quality annotated resources are essential for Natural Language Processing. The objective of this work is to present a corpus of clinical narratives in French annotated for linguistic, semantic and structural information, aimed at clinical information extraction. Six annotators contributed to the corpus annotation, using a comprehensive annotation scheme covering 21 entities, 11 attributes and 37 relations. All annotators trained on a small, common portion of the corpus before proceeding independently. An automatic tool was used to produce entity and attribute pre-annotations. About a tenth of the corpus was doubly annotated and annotation differences were resolved in consensus meetings. To ensure annotation consistency throughout the corpus, we devised harmonization tools to automatically identify annotation differences to be addressed to improve the overall corpus quality. The annotation project spanned over 24 months and resulted in a corpus comprising 500 documents (148,476 tokens) annotated with 44,740 entities and 26,478 relations. The average inter-annotator agreement is 0.793 F-measure for entities and 0.789 for relations. The performance of the pre-annotation tool for entities reached 0.814 F-measure when sufficient training data was available. The performance of our entity pre-annotation tool shows the value of the corpus to build and evaluate information extraction methods. In addition, we introduced harmonization methods that further improved the quality of annotations in the corpus.

38 citations


Cited by
More filters
01 Jan 2009

7,241 citations

01 Mar 2008
TL;DR: It’s time to get used to the idea that there is no such thing as a “magic bullet”.
Abstract: 中國科技大學通識教育中心英語文證照奬勵金實施要點 中華民國 105 年 1 月 8 日通識教育委員會議通過 一、 中國科技大學(以下簡稱本校)為鼓勵本校學生通過具公信力機構之英語文能力測驗或 取得證照,特訂定「中國科技大學通識教育中心英語文證照獎勵金實施要點」(以下簡 稱本要點)。 二、 學生於就讀本校期間,通過歐盟共同架構(CEFR)語言能力參考指標 B1(中級)同等級英 語文能力測驗以上(含)者,得依據本要點酌予獎勵。檢測項目請參閱本中心「歐洲語言 學習、教學、評量共同參考架構與各英語檢測分級對照表」(參見附表);未列於標準 對照表之測驗項目不給予獎助。 三、 凡本校學生,除應英系外,均得申請。大學部學生通過同等級以申請一次為限,在學期 間得重複申請,但該次申請之級別不得低於前次。 本獎勵金每學期核發乙次,每次核發全校前 10 名,各名次核發金額如附表。 四、 申請人應提供在學期間,申請當(學)期參加考試之證明文件及成績證明或證照,以憑辦 理。 五、 獎勵金申請作業:請至通識教育中心網頁下載「英語文證照獎勵金申請表」(附件 1), 填妥後檢附成績單正本及影本(背面簽名並註明與正本無異)各一份、本人金融帳戶存 簿(郵局或土地銀行)封面影本送至通識教育中心。 通識教育中心得每學期遴選受獎代表,擇期公開頒奬,並辦理後續請款作業。 六、 奬勵金申請期限:通過相關證照考試半年內應提出申請,逾期視同放棄。 七、 本要點之獎勵金由學校開設通識教育中心專戶,一切收支專款專用;每年度如有剩餘 款,則移至翌年度繼續使用。 八、 本要點經通識教育中心會議審查通過,陳請校長核定後公告實施,修訂時亦同。

1,468 citations

01 Jan 2016
TL;DR: The learning vocabulary in another language is universally compatible with any devices to read and is available in the digital library an online access to it is set as public so you can get it instantly.
Abstract: Thank you very much for downloading learning vocabulary in another language. As you may know, people have search numerous times for their favorite novels like this learning vocabulary in another language, but end up in infectious downloads. Rather than enjoying a good book with a cup of tea in the afternoon, instead they cope with some infectious virus inside their laptop. learning vocabulary in another language is available in our digital library an online access to it is set as public so you can get it instantly. Our digital library hosts in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Merely said, the learning vocabulary in another language is universally compatible with any devices to read.

1,311 citations

Journal ArticleDOI
TL;DR: The 2010 i2b2/VA Workshop on Natural Language Processing Challenges for Clinical Records presented three tasks, which showed that machine learning approaches could be augmented with rule-based systems to determine concepts, assertions, and relations.

1,111 citations