scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Typo handling in searching of Quran verse based on phonetic similarities

TL;DR: In this research Lafzi++, an improvement from previous development to handle typographical error types was carried out by applying typo correction using the autocomplete method to correct incorrect queries and Damerau Levenshtein distance to calculate the edit distance.
Abstract: The Quran search system is a search system that was built to make it easier for Indonesians to find a verse with text by Indonesian pronunciation, this is a solution for users who have difficulty writing or typing Arabic characters. Quran search system with phonetic similarity can make it easier for Indonesian Muslims to find a particular verse. Lafzi was one of the systems that developed the search, then Lafzi was further developed under the name Lafzi+. The Lafzi+ system can handle searches with typo queries but there are still fewer variations regarding typing error types. In this research Lafzi++, an improvement from previous development to handle typographical error types was carried out by applying typo correction using the autocomplete method to correct incorrect queries and Damerau Levenshtein distance to calculate the edit distance, so that the system can provide query suggestions when a user mistypes a search, either in the form of substitution, insertion, deletion, or transposition. Users can also search easily because they use Latin characters according to pronunciation in Indonesian. Based on the evaluation results it is known that the system can be better developed, this can be seen from the accuracy value in each query that is tested can surpass the accuracy of the previous system, by getting the highest recall of 96.20% and the highest Mean Average Precision (MAP) reaching 90.69%. The Lafzi++ system can improve the previous system.

Content maybe subject to copyright    Report

Citations
More filters
DOI
01 Jan 2017
TL;DR: In this paper, the role and influence of fatwa MUI on socio-cultural change in Indonesia refers to 47 fatwas MUI in socio-culture field is explained. But the role of scholars to translate the socio cultural transformation in the form of Fatwa has not been discussed.
Abstract: Social change as a result of cultural dynamics often leads to friction in society. Islam as a universal religion with specific guidance based on the Quran and hadith has the role of scholars to translate the socio-cultural transformation in the form of fatwa. This research explains the role and influence of fatwa MUI on socio-cultural change in Indonesia refers to 47 fatwa MUI in socio-cultural field. Descriptive-qualitative method with deepening of literature study on socio-historical approach is used to answer the questions above. The result, there are 8 causes of socio-cultural change in Indonesia, namely the way of thinking, population growth, interaction with other communities and nations, new discoveries, technology, disasters and conflicts in the community. Although the law position of fatwa is not binding, some of its influence are quite significant such as the people involvement in the program KB, the development of sharia financial, and the correction to aqidah of ummah in the face of Christmas. –( Perubahan sosial sebagai hasil dinamika budaya sering menimbulkan gesekan di masyarakat. Islam sebagai agama universal dengan panduan spesifik berdasarkan Quran dan hadis memerlukan peran ulama untuk menerjemahkan transformasi sosio-kultural dalam bentuk fatwa. Penelitian ini menjelaskan peran dan pengaruh fatwa MUI terhadap perubahan sosio-kultural di Indonesia mengacu pada 47 fatwa MUI di bidang sosio-kultural. Metode deskriptif-kualitatif dengan memperdalam studi literatur tentang pendekatan sosio-historis digunakan untuk menjawab pertanyaan di atas. Hasilnya, ada 8 penyebab perubahan sosio-kultural di Indonesia, yaitu cara berfikir, pertumbuhan penduduk, interaksi dengan masyarakat dan bangsa lain, penemuan baru, teknologi, bencana dan konflik di masyarakat. Meski posisi hukum fatwa tidak mengikat, beberapa pengaruhnya cukup signifikan seperti keterlibatan masyarakat dalam program KB, pengembangan keuangan syariah, dan koreksi terhadap aqidah ummat saat menghadapi natal. )-- Keyword : Fatwa, MUI, Socio-cultural, Influence.

4 citations

01 Jan 2017
TL;DR: This master’s thesis project undertook investigation of whether the extant Damerau- Levenshtein edit distance measurement between two strings could be made more useful for detecting and adjusting the distance between strings.
Abstract: This master’s thesis project undertook investigation of whether the extant Damerau- Levenshtein edit distance measurement between two strings could be made more useful for detecting and adjusting m ...

4 citations

Journal ArticleDOI
TL;DR: In this article , a study aimed to map popular phonetics with repetition rate in the Qur'an in order to build implicative concepts in Arabic learning for non-native speakers.
Abstract: The advantages of Arabic as the language of the Qur’an are its variety of sounds in each word with derivations in various forms and meanings. This study aimed to map popular phonetics with repetition rate in the Qur'an in order to build implicative concepts in Arabic learning for non-native speakers. The research used a qualitative approach with the type of literature research. As a preliminary research, data were obtained through documentary techniques from the Qur'an and Surah al-Wāqi'ah was taken as a data sourcebecause it represented the criteria as a popular surah. The analysis used a content analysis approach through the stages of condensation, presentation and conclusions. The results of the study was found that popular phonetics had vocal repetitions of 1,015 short vowels, 246 long vowels and 1,512 consonants. The results showed that the frequency of repetition of both vowels and consonants in the Qur’an had a phonetic sequence pattern from front to back, which was in line with Chomsky’s universal rules and the natural order hypothesis of Krashen. In addition, the results were complemented by the hypothetical presupposition of the implications in Arabic learning for non-native speakers, both linguistically, psychologically, and pedagogically.

1 citations

References
More filters
31 Dec 1994
TL;DR: An N-gram-based approach to text categorization that is tolerant of textual errors is described, which worked very well for language classification and worked reasonably well for classifying articles from a number of different computer-oriented newsgroups according to subject.
Abstract: Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We describe here an N-gram-based approach to text categorization that is tolerant of textual errors. The system is small, fast and robust. This system worked very well for language classification, achieving in one test a 99.8% correct classification rate on Usenet newsgroup articles written in different languages. The system also worked reasonably well for classifying articles from a number of different computer-oriented newsgroups according to subject, achieving as high as an 80% correct classification rate. There are also several obvious directions for improving the system`s classification performance in those cases where it did not do as well. The system is based on calculating and comparing profiles of N-gram frequencies. First, we use the system to compute profiles on training set data that represent the variousmore » categories, e.g., language samples or newsgroup content samples. Then the system computes a profile for a particular document that is to be classified. Finally, the system computes a distance measure between the document`s profile and each of the category profiles. The system selects the category whose profile has the smallest distance to the document`s profile. The profiles involved are quite small, typically 10K bytes for a category training set, and less than 4K bytes for an individual document. Using N-gram frequency profiles provides a simple and reliable way to categorize documents in a wide range of classification tasks.« less

1,826 citations


"Typo handling in searching of Quran..." refers background in this paper

  • ...The advantage of using N-Gram in string matching is that if an error occurs in some strings it tends not to affect other strings because of its characteristics which divide the string into small parts [15]....

    [...]

Book
30 Apr 2009
TL;DR: 1. iMuslims and Cyber Islamic Environments 2. Accessing Cyber IslamicEnvironments 3. De-coding the Sacred: Islamic Source Code 4. The Islamic Blogosphere 5. The Cutting-edge: Militaristic Jihad in Cyberspace 6. Digital Jihadi Battlefields: Iraq and Palestine
Abstract: 1. iMuslims and Cyber Islamic Environments 2. Accessing Cyber Islamic Environments 3. De-coding the Sacred: Islamic Source Code 4. The Islamic Blogosphere 5. The Cutting-edge: Militaristic Jihad in Cyberspace 6. Digital Jihadi Battlefields: Iraq and Palestine 7. The Transformation of Cyber Islamic Environments

164 citations

Journal ArticleDOI
TL;DR: The new problem of code design in the Damerau metric is introduced, motivated by applications in DNA-based storage, and constructions for joint block deletion and adjacent block transposition error-correcting codes are provided.
Abstract: Motivated by applications in DNA-based storage, we introduce the new problem of code design in the Damerau metric. The Damerau metric is a generalization of the Levenshtein distance which, in addition to deletions, insertions, and substitution errors also accounts for adjacent transposition edits. We first provide constructions for codes that may correct either a single deletion or a single adjacent transposition and then proceed to extend these results to codes that can simultaneously correct a single deletion and multiple adjacent transpositions. We conclude with constructions for joint block deletion and adjacent block transposition error-correcting codes. 1 1 Parts of the results were presented at the International Symposium on Information Theory in Barcelona, 2016.

53 citations


"Typo handling in searching of Quran..." refers methods in this paper

  • ...Damerau developed an edit-distance algorithm that made it possible to calculate the editing of transposition between two characters [23]....

    [...]

Journal ArticleDOI
01 Jul 2019
TL;DR: The proposed chemical reaction optimization technique to solve the longest common subsequence problem for multiple instances for less execution time is compared with hyper-heuristic, ant colony optimization, beam ant colonies optimization, and memory-bound anytime algorithms.
Abstract: Longest common subsequence (LCS) is a well-known NP-hard optimization problem that finds out the longest subsequence of each member of a given set of strings. In computational biology, sequence alignment is a fundamental technique to measure the similarity of biological sequences, such as DNA and genome sequences. A high sequence similarity often applied to molecular structural as well as functional similarities and can be used to determine whether (and how) sequences are related. Finding the longest common subsequence (LCS) is one way to measure the similarity of sequences. It has also applications in data compression, FPGA circuit minimization, and bioinformatics, etc. Exact algorithms are impractical since they fail to solve this problem for multiple instances of long lengths in polynomial time. There are some approximations, heuristic, and metaheuristic methods proposed to solve the problem. Chemical reaction optimization (CRO) is a new metaheuristic method that mimics the nature of chemical reaction into optimization problems. In this paper, we have proposed chemical reaction optimization technique to solve the longest common subsequence problem for multiple instances. Here, we have redesigned four elementary operators of CRO for LCS problem. Operators of CRO algorithm are used to explore the search space both locally and globally. A novel correction method has been designed to correct the solution. Correction method works after each search operator to ensure the validity of the changes made by operators. Both solution quality and execution time are considered while designing the operators and the correction method. Thus proposed system brings robustness, efficiency, and effectiveness while solving MLCS problem. Our approach is compared with hyper-heuristic, ant colony optimization, beam ant colony optimization, and memory-bound anytime algorithms. The experimental results in lengths of the returned common sequences show that our proposed algorithm gives either same or better results than all other algorithms in less execution time.

22 citations


Additional excerpts

  • ...Besides being used in computer science, it can also be used in biology to sort DNA or RNA [21, 22]....

    [...]

  • ...} 1183 NAS {”5”:[22],”6”:[6],”10”:[34],....

    [...]

Journal ArticleDOI
TL;DR: A new hybrid algorithm combining character n-gram and neural network methodologies is developed, and it is concluded that Google could be used as a pre-processed spelling correction method.
Abstract: We used the character n-gram method to predict topic changes in search engine queries.We obtained more successful estimations than previous studies, and made remarkable contributions.We compared the character n-gram method with the Levenshtein edit-distance method.We analyzed ASPELL, Google and Bing search engines as pre-processed spelling correction methods.We conclude that Google could be used as a pre-processed spelling correction method. The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.

21 citations


"Typo handling in searching of Quran..." refers methods in this paper

  • ...In addition to being able to predict words, the N-Gram method can also handle spelling errors on queries by detecting the spelling of characters [13]....

    [...]