Proceedings ArticleDOI
String similarity algorithms for a ticket classification system
Malgorzata Pikies,Junade Ali +1 more
- pp 36-41
Reads0
Chats0
TLDR
This work considered the effectiveness of different algorithms and configurations to automatically identify keywords of interest in instances where such key phrases are misspelled, copied incorrectly or are otherwise differently formed, leading to a 15% improvement in the ratio of false positives to true positive classifications.Abstract:
Fuzzy string matching allows for close, but not exactly, matching strings to be compared and extracted from bodies of text. As such, they are useful in systems which automatically extract and process documents. We summarise and compare various existing algorithms for achieving string similarity measures: Longest Common Subsequence (LCS), Dice coefficient, Cosine Similarity, Levenshtein distance and Damerau distance. Based on previously classified customer support enquiries (tickets), we considered the effectiveness of different algorithms and configurations to automatically identify keywords of interest (such as error phrases, product names and warning messages) in instances where such key phrases are misspelled, copied incorrectly or are otherwise differently formed. An optimal algorithm selection is made based on novel studies of the aforementioned similarity measures on text strings tokenised into characters. Such analysis also allowed for an optimum similarity threshold to be identified for various categories of enquiries, to reduce mismatched strings whilst allowing optimal coverage of the correctly matched key phrases. This led to a 15% improvement in the ratio of false positives to true positive classifications over the existing approach used by a customer support system.read more
Citations
More filters
Journal ArticleDOI
Analysis and safety engineering of fuzzy string matching algorithms.
Malgorzata Pikies,Junade Ali +1 more
TL;DR: This paper compliments fuzzy string matching algorithms with a second layer Convolutional Neural Network (CNN) binary classifier, achieving an improved keyword classification ratio for two ticket categories by a relative 69% and 78%.
Book ChapterDOI
Using String-Comparison Measures to Improve and Evaluate Collaborative Filtering Recommender Systems
Luiz Mario L. Pascoal,Hugo Alexandre Dantas do Nascimento,Thierson Couto Rosa,Edjalma Queiroz da Silva,Everton Lima Aleixo +4 more
TL;DR: The general idea is to model the similarity computation between users as an approximate string matching problem and to employ classical algorithms that solve it and demonstrate that the measures based on a string-comparison approach can improve accuracy.
Journal ArticleDOI
Ticket automation: An insight into current research with applications to multi-level classification scenarios
Posted Content
Novel Keyword Extraction and Language Detection Approaches
TL;DR: This paper proposes a fast novel approach to string tokenisation for fuzzy language matching and experimentally demonstrates an 83.6% decrease in processing time with an estimated improvement in recall at the cost of a 2.6%" decrease in precision.
The automated machine learning classification approach on telco trouble ticket dataset
TL;DR: In this article, the authors presented automated machine learning for solving a practical problem of a telco trouble ticket system, in particular, the focus is on the classification of early resolution code from the trouble ticket dataset.
References
More filters
Book
Introduction to Information Retrieval
TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Journal ArticleDOI
A vector space model for automatic indexing
Gerard Salton,A. Wong,C. S. Yang +2 more
TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
Journal ArticleDOI
The String-to-String Correction Problem
TL;DR: An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.
Domain names - concepts and facilities
TL;DR: This memo describes the domain style names and their used for host address look up and electronic mail forwarding and discusses the clients and servers in the domain name system and the protocol used between them.
Journal ArticleDOI
Approximate string-matching with q -grams and maximal matches
TL;DR: Two string distance functions that are computable in linear time give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edited distance based string matching.