scispace - formally typeset
Search or ask a question
Author

George Paliouras

Bio: George Paliouras is an academic researcher. The author has contributed to research in topics: Event calculus & First-order logic. The author has an hindex of 8, co-authored 21 publications receiving 1026 citations.

Papers
More filters
Posted Content
TL;DR: It is reached that additional safety nets are needed for the Naive Bayesian anti-spam filter to be viable in practice.
Abstract: It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail (“spam”). We conduct a thorough evaluation of this proposal on a corpus that we make publicly available, contributing towards standard benchmarks. At the same time we investigate the effect of attribute-set size, training-corpus size, lemmatization, and stop-lists on the filter’s performance, issues that had not been previously explored. After introducing appropriate cost-sensitive evaluation measures, we reach the conclusion that additional safety nets are needed for the Naive Bayesian anti-spam filter to be viable in practice.

641 citations

Posted Content
TL;DR: The construction of the datsets and the design of the tracks as well as the evaluation measures that the authors implemented and a quick overview of the results are detailed.
Abstract: LSHTC is a series of challenges which aims to assess the performance of classification systems in large-scale classification in a a large number of classes (up to hundreds of thousands). This paper describes the dataset that have been released along the LSHTC series. The paper details the construction of the datsets and the design of the tracks as well as the evaluation measures that we implemented and a quick overview of the results. All of these datasets are available online and runs may still be submitted on the online server of the challenges.

173 citations

Book ChapterDOI
29 Mar 2015
TL;DR: BioASQ as discussed by the authors is a series of challenges that assess the performance of information systems in supporting two tasks that are central to the biomedical question answering process: a the indexing of large volumes of unlabeled data, primarily scientific articles, with biomedical concepts, and the processing of biomedical questions and the generation of answers and supporting material.
Abstract: BioASQ is a series of challenges that aims to assess the performance of information systems in supporting two tasks that are central to the biomedical question answering process: a the indexing of large volumes of unlabelled data, primarily scientific articles, with biomedical concepts, b the processing of biomedical questions and the generation of answers and supporting material. In this paper, the main results of the first two BioASQ challenges are presented.

80 citations

Journal ArticleDOI
18 Aug 2010
TL;DR: This paper provides an overview of the approaches proposed by the participants of the workshop, together with a summary of the results of the challenge, associated with the PASCAL 2 Large-Scale Hierarchical Text Classification Challenge.
Abstract: This paper reports on the Large Scale Hierarchical Classification workshop (http://kmi.open.ac.uk/events/ecir2010/workshops-tutorials), held in conjunction with the European Conference on Information Retrieval (ECIR) 2010. The workshop was associated with the PASCAL 2 Large-Scale Hierarchical Text Classification Challenge (http://lshtc.iit.demokritos.gr), which took place in 2009. We first provide information about the challenge, presenting the data used, the tasks and the evaluation measures and then we provide an overview of the approaches proposed by the participants of the workshop, together with a summary of the results of the challenge.

37 citations

Proceedings ArticleDOI
01 Sep 2017
TL;DR: The proposed method excels in discerning AD patients in mild and moderate stages from NC leading to the in-depth understanding of language deficits.
Abstract: In the present study, we analyzed written samples obtained from Greek native speakers diagnosed with Alzheimer's in mild and moderate stages and from age-matched cognitively normal controls (NC) We adopted a computational approach for the comparison of morpho-syntactic complexity and lexical variety in the samples We used text classification approaches to assign the samples to one of the two groups The classifiers were tested using various features: morpho-syntactic and lexical characteristics The proposed method excels in discerning AD patients in mild and moderate stages from NC leading to the in-depth understanding of language deficits

30 citations


Cited by
More filters
Proceedings ArticleDOI
21 Mar 2006
TL;DR: A taxonomy of different types of attacks on machine learning techniques and systems, a variety of defenses against those attacks, and an analytical model giving a lower bound on attacker's work function are provided.
Abstract: Machine learning systems offer unparalled flexibility in dealing with evolving input in a variety of applications, such as intrusion detection systems and spam e-mail filtering. However, machine learning algorithms themselves can be a target of attack by a malicious adversary. This paper provides a framework for answering the question, "Can machine learning be secure?" Novel contributions of this paper include a taxonomy of different types of attacks on machine learning techniques and systems, a variety of defenses against those attacks, a discussion of ideas that are important to security for machine learning, an analytical model giving a lower bound on attacker's work function, and a list of open problems.

853 citations

Book ChapterDOI
01 Aug 2012
TL;DR: A survey of a wide variety of text classification algorithms for a number of diverse domains, including target marketing, medical diagnosis, news group filtering, and document organization is provided.
Abstract: The problem of classification has been widely studied in the data mining, machine learning, database, and information retrieval communities with applications in a number of diverse domains, such as target marketing, medical diagnosis, news group filtering, and document organization. In this paper we will provide a survey of a wide variety of text classification algorithms.

818 citations

Book ChapterDOI
03 Feb 2012
TL;DR: Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining.
Abstract: Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases. Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book.

732 citations

Journal ArticleDOI
TL;DR: An overview of text classification algorithms is discussed, which covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods.
Abstract: In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in the real-world problem are discussed.

612 citations

Book ChapterDOI
11 Sep 2017
TL;DR: This paper presents adversarial examples derived from regular inputs by introducing minor—yet carefully selected—perturbations into machine learning models, showing their robustness against inputs crafted by an adversary.
Abstract: Machine learning models are known to lack robustness against inputs crafted by an adversary. Such adversarial examples can, for instance, be derived from regular inputs by introducing minor—yet carefully selected—perturbations.

512 citations