scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith

20 Jun 2020-Vol. 4, Iss: 3, pp 551-557
TL;DR: The improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common on Indonesian-translated hadiths on the Web and social media raw text.
Abstract: Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
16 Feb 2016
TL;DR: Salah satu keunggulan dari sebuah lembaga/unit pelayanan adalah seberapa cepat dan akurat dalam menangani keluhan pelanggan Keluhan ying disampaikan pelangga umumnya memiliki kesamaan dengan keluh-keluhan sebelumnyi, sehingga solusi dari keluhou baru dapat didasarkan pada solusis ying diberikan pada keluhl
Abstract: Salah satu keunggulan dari sebuah lembaga/unit pelayanan adalah seberapa cepat dan akurat dalam menangani keluhan pelanggan Keluhan yang disampaikan pelanggan umumnya memiliki kesamaan dengan keluhan-keluhan sebelumnya, sehingga solusi dari keluhan baru dapat didasarkan pada solusi yang diberikan pada keluhan lama Vector Space Model (VSM) merupakan salah satu model yang digunakan untuk mengetahui kemiripan dokumen, yang digunakan dalam membangkitkan FAQ otomatis Pembobotan term dilakukan dengan teknik Term Frequency-Inverse Document Frequency (TF-IDF) Kombinasi notasi TF-IDF yang dibandingkan adalah TF-IDF itu sendiri, modifikasi logaritmik TF dan modifikasi logaritmik IDF Similarity measure yang digunakan adalah cosine similarity Hasil dari penelitian ini adalah algoritma VSM dengan pembobotan TF-IDF dapat digunakan untuk membangkitkan FAQ otomatis dan solusi yang relevan Berdasarkan hasil perhitungan accuracy pada masing- masing percobaan dapat disimpulkan bahwa pada threshold 05, kombinasi notasi TF-IDF yang memiliki nilai rata-rata accuracy dan precision tertinggi adalah modifikasi pertama, yaitu masing-masing sebesar 6209% dan 5515% Sedangkan untuk threshold 065 yang memiliki nilai rata-rata accuracy dan precision tertinggi adalah TF-IDF, yaitu masing-masing sebesar 8318% dan 6835% Selain itu percobaan dengan menggunakan 171 data, TF-IDF dan threshold 065 dapat membangkitkan 27 FAQ, yaitu dengan persentase 7037% relevan

4 citations

Journal ArticleDOI
21 Apr 2017
TL;DR: Hadis-hadis semacam itu dapat menimbulkan dampak negatif diantaranya: Menimbulkan dan mempertajam perpecahan dikalangan ummat Islam, mencemarkan pribadi Nabi saw, mengaburkan pemahaman terhadap Islam serta melemahkan jiwa dan semangat keislaman.
Abstract: Segala sesuatu yang disandarkan kepada Rasulullah Saw menjadi sumber ajaran,panutan dan nilai yang sangat berharga bagi ummat Islam. Dari penelitian yang dilakukan para ulama hadis,ternyata ada hadis-hadis yang tidak layak untuk dijadikan sumber ajaran karena keberadaannya tidak memenuhi kriteria yang ditetapkan, hadis itulah yang disebat maudhu (palsu). Hadis-hadis semacam itu dapat menimbulkan dampak negatif diantaranya: Menimbulkan dan mempertajam perpecahan dikalangan ummat Islam, mencemarkan pribadi Nabi saw, mengaburkan pemahaman terhadap Islam serta melemahkan jiwa dan semangat keislaman.

3 citations

20 Mar 2020
TL;DR: In this article, the TF-IDF Vector Space Model (VSM) was used for information retrieval in Bahasa Indonesia Korpus sinonim (tesaurus), and the TFIDF vector space model was used dengan pengembangan pada pembobotan keyword and proses kueri.
Abstract: Information Retrieval (IR) merupakan pencarian informasi yang biasanya dalam suatu teks dokumen. Pada penelitian ini membahas IR terhadap Al Quran terjemahan Bahasa Indonesia Korpus sinonim (tesaurus) dibentuk untuk mendukung information retrieval agar hasil pencarian menjadi lebih luas. Metode yang digunakan adalah TF-IDF Vector Space Model (VSM) dengan pengembangan pada pembobotan keyword dan proses kueri, yaitu hasil kueri yang menjadi peringkat satu pada hasil pencarian information retrieval dijadikan kueri untuk proses pencarian selanjutnya. Cosine similarity digunakan untuk perhitungan kemiripan dokumen. Pembentukan basis data korpus sinonim (tesaurus) dilakukan dengan cara mengembangkan suatu sistem agar dapat dilakukan secara otomatis. Pengujian dilakukan dengan menguji pencarian ayat Al Quran dalamaplikasi information retrieval dan membandingakan hasil pencarian aplikasi dengan pendapat pakar Al Quran dan Hadist. Persentase keberhasilan pengujian dengan menggunakan 1 kata mencapai 100%. Keberhasilan pencarian pengujian menggunakan lebih dari 1 kata atau sebuah kalimat, pada 10 peringkat teratas dari dokumen yang ditemukan, keberhasilan mencapai 95,6%. Penelitian initelah membuktikan bahwa information retrieval dengan menggunakan korpus sinonim(tesaurus), dan penambahan bobot kata dari keyword pertama yang dicari menambah tingkat relevan, karena secara signifikan memperluas hasil pencarian dan mengeliminir dokumen yang tidak relevan.

1 citations

References
More filters
Journal ArticleDOI
01 Sep 2017
TL;DR: In this article, the proposed term weighting schemes take the amount of missing terms into account calculating the weight of existing terms, and the proposed schemes show the highest performance for a SVM classifier with a micro-average F1 classification performance value of 97%.
Abstract: With the rapid growth of textual content on the Internet, automatic text categorization is a comparatively more effective solution in information organization and knowledge management. Feature selection, one of the basic phases in statistical-based text categorization, crucially depends on the term weighting methods In order to improve the performance of text categorization, this paper proposes four modified frequency-based term weighting schemes namely; mTF, mTFIDF, TFmIDF, and mTFmIDF. The proposed term weighting schemes take the amount of missing terms into account calculating the weight of existing terms. The proposed schemes show the highest performance for a SVM classifier with a micro-average F1 classification performance value of 97%. Moreover, benchmarking results on Reuters-21578, 20Newsgroups, and WebKB text-classification datasets, using different classifying algorithms such as SVM and KNN show that the proposed schemes mTF, mTFIDF, and mTFmIDF outperform other weighting schemes such as TF, TFIDF, and Entropy. Additionally, the statistical significance tests show a significant enhancement of the classification performance based on the modified schemes.

87 citations

Journal ArticleDOI
TL;DR: All Hadith relevant methods and algorithms from the literature are discussed and analyzed in terms of functionality, simplicity, F-score and accuracy and it is revealed that neural networks classify the Hadith with 94 % accuracy.
Abstract: Hadiths are important textual sources of law, tradition, and teaching in the Islamic world. Analyzing the unique linguistic features of Hadiths (e.g. ancient Arabic language and story-like text) results to compile and utilize specific natural language processing methods. In the literature, no study is solely focused on Hadith from artificial intelligence perspective, while many new developments have been overlooked and need to be highlighted. Therefore, this review analyze all academic journal and conference publications that using two main methods of artificial intelligence for Hadith text: Hadith classification and mining. All Hadith relevant methods and algorithms from the literature are discussed and analyzed in terms of functionality, simplicity, F-score and accuracy. Using various different Hadith datasets makes a direct comparison between the evaluation results impossible. Therefore, we have re-implemented and evaluated the methods using a single dataset (i.e. 3150 Hadiths from Sahih Al-Bukhari book). The result of evaluation on the classification method reveals that neural networks classify the Hadith with 94 % accuracy. This is because neural networks are capable of handling complex (high dimensional) input data. The Hadith mining method that combines vector space model, Cosine similarity, and enriched queries obtains the best accuracy result (i.e. 88 %) among other re-evaluated Hadith mining methods. The most important aspect in Hadith mining methods is query expansion since the query must be fitted to the Hadith lingo. The lack of knowledge based methods is evident in Hadith classification and mining approaches and this absence can be covered in future works using knowledge graphs.

57 citations


"Improving Document Retrieval with S..." refers methods in this paper

  • ...Therefore, the vector space model was chosen because the dataset used is Indonesian translated and is the best method for conducting retrieval documents [10]....

    [...]

01 Jan 2011
TL;DR: A review of Artificial Intelligence and Corpus Linguistics research at Leeds University on Arabic and the Quran and a proposal for further research: the Quranic Knowledge Map.
Abstract: We review a range of Artificial Intelligence and Corpus Linguistics research at Leeds University on Arabic and the Quran, which has produced a range of software and corpus datasets for research on Modern Standard Arabic and more recently Quranic Arabic .Our work on Quranic Arabic corpus linguistics has attracted widespread interest, not only from Arabic linguists but also from Quranic students, and the general public. We see a great potential impact of Artificial Intelligence modelling of the Quran. This leads us to present a proposal for further research: the Quranic Knowledge Map.

53 citations


"Improving Document Retrieval with S..." refers methods in this paper

  • ...In recent years, the implementation of natural language processing in the Qur'an and hadith are used to conduct information retrieval [5]....

    [...]

01 Jan 2010
TL;DR: This study is conducted to examine knowledge discovery from AL-Hadith through classification algorithm in order to classify AL- hadith to one of predefined classes (books), where AL- Hadith is the saying of Prophet Mohammed and the second religious source for all Muslims.
Abstract: Machine Learning and Data Mining are applied to language datasets in order to discover patterns for English and other European languages, Arabic language belongs to the Semitic family of languages, which differs from European languages in syntax, semantic and morphology. One of the difficulties in Arabic language is that it has a complex morphological structure and orthographic variations. This study is conducted to examine knowledge discovery from AL-Hadith through classification algorithm in order to classify AL-Hadith to one of predefined classes (books), where AL-Hadith is the saying of Prophet Mohammed (Peace and blessings of Allah be upon him (PBUH)) and the second religious source for all Muslims, and because of its importance for Muslims all over the word knowledge discovery from AL-Hadith will make AL-Hadith more understandable for both Muslims and nonmuslims.

38 citations


"Improving Document Retrieval with S..." refers methods in this paper

  • ...[8] researched the classification of texts that existed in 1321 Sahih al Bukhari hadith by classifying classes that have the same topic then the testing is done by comparing with the similarity coefficient table and generate an accuracy of 73%....

    [...]

Journal ArticleDOI
TL;DR: Results and analysis show that, LSI technique outperformed the exact frequency-based technique despite the longer processing time it took during the indexing.

17 citations


Additional excerpts

  • ...[7] conduct research on information retrieval in the form of halal product queries from Malay language documents using the latent semantic indexing method and generate an accuracy value of 86%....

    [...]