scispace - formally typeset
Search or ask a question
JournalISSN: 1405-5546

Computación Y Sistemas 

National Polytechnic Institute
About: Computación Y Sistemas is an academic journal published by National Polytechnic Institute. The journal publishes majorly in the area(s): Computer science & Context (language use). It has an ISSN identifier of 1405-5546. Over the lifetime, 1261 publications have been published receiving 3872 citations. The journal is also known as: Revista iberoamericana de computación.


Papers
More filters
Journal ArticleDOI
TL;DR: The proposed similarity measure soft similarity is a generalize of the well-known cosine similarity measure in VSM by introducing what it is called “soft cosine measure” and various formulas for exact or approximate calculation of the softcosine measure are proposed.
Abstract: We show how to consider similarity between features for calculation of similarity of objects in the Vector Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity between objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictionary) and does not need to be learned from the data.We call the proposed similarity measure soft similarity. Similarity between features is common, for example, in natural language processing: words, n-grams, or syntactic n-grams can be somewhat different (which makes them different features) but still have much in common: for example, words “play” and “game” are different but related. When there is no similarity between features then our soft similarity measure is equal to the standard similarity. For this, we generalize the well-known cosine similarity measure in VSM by introducing what we call “soft cosine measure”. We propose various formulas for exact or approximate calculation of the soft cosine measure. For example, in one of them we consider for VSM a new feature space consisting of pairs of the original features weighted by their similarity. Again, for features that bear no similarity to each other, our formulas reduce to the standard cosine measure. Our experiments show that our soft cosine measure provides better performance in our case study: entrance exams question answering task at CLEF. In these experiments, we use syntactic n-grams as features and Levenshtein distance as the similarity between n-grams, measured either in characters or in elements of n-grams.

297 citations

Journal ArticleDOI
TL;DR: Experimental results show that PU learning not only outperforms supervised learning significantly, but also detects a large number of potentially fake reviews hidden in the unlabeled set that Dianping fails to detect.
Abstract: Fake review detection has been studied by researchers for several years. However, so far all re- ported studies are based on English reviews. This paper reports a study of detecting fake reviews in Chinese. Our review dataset is from the Chinese review hosting site Di- anping 1 , which has built a fake review detection system. They are confident that their algorithm has a very high precision, but they don't know the recall. This means that all fake reviews detected by the system are almost certainly fake but the remaining reviews may not be all genuine. This paper first reports a supervised learning study of two classes, fake and unknown. However, since the unknown set may contain many fake reviews, it is more appropriate to treat it as an unlabeled set. This calls for the model of learning from positive and unla- beled examples (or PU-learning). Experimental results show that PU learning not only outperforms supervised learning significantly, but also detects a large number of potentially fake reviews hidden in the unlabeled set that Dianping fails to detect.

78 citations

Journal ArticleDOI
TL;DR: A survey on different methods of textual similarity and a new method for measuring semantic similarity between sentences, which uses the advantages of taxonomy methods and merge these information to a language model are presented.
Abstract: Measuring Semantic Textual Similarity (STS), between words/ terms, sentences, paragraph and document plays an important role in computer science and computational linguistic. It also has many application sover several fields such as Biomedical Informatics and Geoinformation. In this paper, we present a survey on different methods of textual similarity and we also reported about the availability of different software and tools those are useful for STS. In natural language processing (NLP), STS is a important component formany tasks such as document summarization, word sense disambiguation, short answer grading, information retrieval and extraction. We split out the measures for semantic similarity into three broad categories such as (i) Topological/Knowledge-based (ii) Statistical/Corpus Based (iii) String based. More emphasisi s given to the methods related to the WordNet taxonomy. Because topological methods, plays an important role to understand intended meaning of an ambiguous word, which is very difficult to process computationally. We also propose a new method for measuring semantic similarity between sentences. This proposed method, uses the advantages of taxonomy methods and merge these information to a language model. It considers the WordNet synsets for lexical relationships between nodes/words and a uni-gram language model is implemented over a large corpus to assign the information content value between the two nodes of different classes.

57 citations

Journal ArticleDOI
TL;DR: In this article, an articulo analiza la filosofia de la tecnologia, la cual es un esfuerzo por parte de los filosós por tomar seriamente la technologia como un tema de reflexion sistematica, intentando no identenficarse unicamente con la extensionfilosofica de las actitudes tecnologicas.
Abstract: Este articulo analiza la filosofia de la tecnologia, la cual es un esfuerzo por parte de los filosofos por tomar seriamente la tecnologia como un tema de reflexion sistematica, intentando no identenficarse unicamente con la extension filosofica de las actitudes tecnologicas.

45 citations

Journal ArticleDOI
TL;DR: This paper presents a distant supervision algorithm for automatically collecting and labeling ‘TEAD’, a dataset for Arabic Sentiment Analysis (SA), using emojis and sentiment lexicons, and presents the algorithm used to deal with mixed-content tweets.
Abstract: Our paper presents a distant supervision algorithm for automatically collecting and labeling ‘TEAD’, a dataset for Arabic Sentiment Analysis (SA), using emojis and sentiment lexicons. The data was gathered from Twitter during the period between the 1st of June and the 30th of November 2017. Although the idea of using emojis to collect and label training data for SA, is not novel, getting this approach to work for Arabic dialect was very challenging. We ended up with more than 6 million tweets labeled as Positive, Negative or Neutral. We present the algorithm used to deal with mixed-content tweets (Modern Standard Arabic MSA and Dialect Arabic DA). We also provide properties and statistics of the dataset along side experiments results. Our try outs covered a wide range of standard classifiers proved to be efficient for sentiment classification problem.

43 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202355
2022154
202134
2020100
2019141
2018168