Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith

doi:10.29207/RESTI.V4I3.1913

Home
/
Papers
/
Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith

Journal Article•DOI•

Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith

muhammad zaky ramadhan¹, Kemas Muslim Lhaksmana¹•Institutions (1)

Telkom University¹

20 Jun 2020-Vol. 4, Iss: 3, pp 551-557

TL;DR: The improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common on Indonesian-translated hadiths on the Web and social media raw text.

read less

Abstract: Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Implementasi Vector Space Model dalam Pembangkitan Frequently Asked Questions Otomatis dan Solusi yang Relevan untuk Keluhan Pelanggan

[...]

Abdul Aziz¹, Ristu Saptono², Kartika Permatasari Suryajaya²•Institutions (2)

State University of Semarang¹, Sebelas Maret University²

16 Feb 2016

TL;DR: Salah satu keunggulan dari sebuah lembaga/unit pelayanan adalah seberapa cepat dan akurat dalam menangani keluhan pelanggan Keluhan ying disampaikan pelangga umumnya memiliki kesamaan dengan keluh-keluhan sebelumnyi, sehingga solusi dari keluhou baru dapat didasarkan pada solusis ying diberikan pada keluhl

...read moreread less

Abstract: Salah satu keunggulan dari sebuah lembaga/unit pelayanan adalah seberapa cepat dan akurat dalam menangani keluhan pelanggan Keluhan yang disampaikan pelanggan umumnya memiliki kesamaan dengan keluhan-keluhan sebelumnya, sehingga solusi dari keluhan baru dapat didasarkan pada solusi yang diberikan pada keluhan lama Vector Space Model (VSM) merupakan salah satu model yang digunakan untuk mengetahui kemiripan dokumen, yang digunakan dalam membangkitkan FAQ otomatis Pembobotan term dilakukan dengan teknik Term Frequency-Inverse Document Frequency (TF-IDF) Kombinasi notasi TF-IDF yang dibandingkan adalah TF-IDF itu sendiri, modifikasi logaritmik TF dan modifikasi logaritmik IDF Similarity measure yang digunakan adalah cosine similarity Hasil dari penelitian ini adalah algoritma VSM dengan pembobotan TF-IDF dapat digunakan untuk membangkitkan FAQ otomatis dan solusi yang relevan Berdasarkan hasil perhitungan accuracy pada masing- masing percobaan dapat disimpulkan bahwa pada threshold 05, kombinasi notasi TF-IDF yang memiliki nilai rata-rata accuracy dan precision tertinggi adalah modifikasi pertama, yaitu masing-masing sebesar 6209% dan 5515% Sedangkan untuk threshold 065 yang memiliki nilai rata-rata accuracy dan precision tertinggi adalah TF-IDF, yaitu masing-masing sebesar 8318% dan 6835% Selain itu percobaan dengan menggunakan 171 data, TF-IDF dan threshold 065 dapat membangkitkan 27 FAQ, yaitu dengan persentase 7037% relevan

...read moreread less

4 citations

Journal Article•DOI•

Hadis Maudhu dan Akibatnya

[...]

Rabiatul Aslamiah

21 Apr 2017

TL;DR: Hadis-hadis semacam itu dapat menimbulkan dampak negatif diantaranya: Menimbulkan dan mempertajam perpecahan dikalangan ummat Islam, mencemarkan pribadi Nabi saw, mengaburkan pemahaman terhadap Islam serta melemahkan jiwa dan semangat keislaman.

...read moreread less

Abstract: Segala sesuatu yang disandarkan kepada Rasulullah Saw menjadi sumber ajaran,panutan dan nilai yang sangat berharga bagi ummat Islam. Dari penelitian yang dilakukan para ulama hadis,ternyata ada hadis-hadis yang tidak layak untuk dijadikan sumber ajaran karena keberadaannya tidak memenuhi kriteria yang ditetapkan, hadis itulah yang disebat maudhu (palsu). Hadis-hadis semacam itu dapat menimbulkan dampak negatif diantaranya: Menimbulkan dan mempertajam perpecahan dikalangan ummat Islam, mencemarkan pribadi Nabi saw, mengaburkan pemahaman terhadap Islam serta melemahkan jiwa dan semangat keislaman.

...read moreread less

3 citations

Penerapan Algoritma TF-IDF Vector Space Model (VSM) Pada Information Retrieval Terjemahan Al Quran Surat 1 Samai Dengan Surat 16 Berdasarkan Kesamaan Makna

[...]

Irfan Humaini, Lily Wulandari, Diana Ikasari, Tristyanti Yusnitasari

20 Mar 2020

TL;DR: In this article, the TF-IDF Vector Space Model (VSM) was used for information retrieval in Bahasa Indonesia Korpus sinonim (tesaurus), and the TFIDF vector space model was used dengan pengembangan pada pembobotan keyword and proses kueri.

...read moreread less

Abstract: Information Retrieval (IR) merupakan pencarian informasi yang biasanya dalam suatu teks dokumen. Pada penelitian ini membahas IR terhadap Al Quran terjemahan Bahasa Indonesia Korpus sinonim (tesaurus) dibentuk untuk mendukung information retrieval agar hasil pencarian menjadi lebih luas. Metode yang digunakan adalah TF-IDF Vector Space Model (VSM) dengan pengembangan pada pembobotan keyword dan proses kueri, yaitu hasil kueri yang menjadi peringkat satu pada hasil pencarian information retrieval dijadikan kueri untuk proses pencarian selanjutnya. Cosine similarity digunakan untuk perhitungan kemiripan dokumen. Pembentukan basis data korpus sinonim (tesaurus) dilakukan dengan cara mengembangkan suatu sistem agar dapat dilakukan secara otomatis. Pengujian dilakukan dengan menguji pencarian ayat Al Quran dalamaplikasi information retrieval dan membandingakan hasil pencarian aplikasi dengan pendapat pakar Al Quran dan Hadist. Persentase keberhasilan pengujian dengan menggunakan 1 kata mencapai 100%. Keberhasilan pencarian pengujian menggunakan lebih dari 1 kata atau sebuah kalimat, pada 10 peringkat teratas dari dokumen yang ditemukan, keberhasilan mencapai 95,6%. Penelitian initelah membuktikan bahwa information retrieval dengan menggunakan korpus sinonim(tesaurus), dan penambahan bobot kata dari keyword pertama yang dicari menambah tingkat relevan, karena secara signifikan memperluas hasil pencarian dan mengeliminir dokumen yang tidak relevan.

...read moreread less

1 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

Modified frequency-based term weighting schemes for text classification

[...]

Thabit Sabbah¹, Thabit Sabbah², Ali Selamat³, Ali Selamat¹, Hafiz Selamat¹, Fawaz S. Al-Anzi⁴, Enrique Herrera Viedma⁵, Enrique Herrera Viedma⁶, Ondrej Krejcar³, Hamido Fujita⁷ - Show less +6 more•Institutions (7)

Universiti Teknologi Malaysia¹, Al-Quds Open University², University of Hradec Králové³, Kuwait University⁴, King Abdulaziz University⁵, University of Granada⁶, Iwate Prefectural University⁷

01 Sep 2017

TL;DR: In this article, the proposed term weighting schemes take the amount of missing terms into account calculating the weight of existing terms, and the proposed schemes show the highest performance for a SVM classifier with a micro-average F1 classification performance value of 97%.

...read moreread less

Abstract: With the rapid growth of textual content on the Internet, automatic text categorization is a comparatively more effective solution in information organization and knowledge management. Feature selection, one of the basic phases in statistical-based text categorization, crucially depends on the term weighting methods In order to improve the performance of text categorization, this paper proposes four modified frequency-based term weighting schemes namely; mTF, mTFIDF, TFmIDF, and mTFmIDF. The proposed term weighting schemes take the amount of missing terms into account calculating the weight of existing terms. The proposed schemes show the highest performance for a SVM classifier with a micro-average F1 classification performance value of 97%. Moreover, benchmarking results on Reuters-21578, 20Newsgroups, and WebKB text-classification datasets, using different classifying algorithms such as SVM and KNN show that the proposed schemes mTF, mTFIDF, and mTFmIDF outperform other weighting schemes such as TF, TFIDF, and Entropy. Additionally, the statistical significance tests show a significant enhancement of the classification performance based on the modified schemes.

...read moreread less

87 citations

Journal Article•DOI•

Hadith data mining and classification: a comparative analysis

[...]

Mohammad Arshi Saloot¹, Norisma Idris¹, Rohana Mahmud¹, Salinah Jaafar¹, Dirk Thorleuchter, Abdullah Gani¹ - Show less +2 more•Institutions (1)

University of Malaya¹

01 Jun 2016-Artificial Intelligence Review

TL;DR: All Hadith relevant methods and algorithms from the literature are discussed and analyzed in terms of functionality, simplicity, F-score and accuracy and it is revealed that neural networks classify the Hadith with 94 % accuracy.

...read moreread less

Abstract: Hadiths are important textual sources of law, tradition, and teaching in the Islamic world. Analyzing the unique linguistic features of Hadiths (e.g. ancient Arabic language and story-like text) results to compile and utilize specific natural language processing methods. In the literature, no study is solely focused on Hadith from artificial intelligence perspective, while many new developments have been overlooked and need to be highlighted. Therefore, this review analyze all academic journal and conference publications that using two main methods of artificial intelligence for Hadith text: Hadith classification and mining. All Hadith relevant methods and algorithms from the literature are discussed and analyzed in terms of functionality, simplicity, F-score and accuracy. Using various different Hadith datasets makes a direct comparison between the evaluation results impossible. Therefore, we have re-implemented and evaluated the methods using a single dataset (i.e. 3150 Hadiths from Sahih Al-Bukhari book). The result of evaluation on the classification method reveals that neural networks classify the Hadith with 94 % accuracy. This is because neural networks are capable of handling complex (high dimensional) input data. The Hadith mining method that combines vector space model, Cosine similarity, and enriched queries obtains the best accuracy result (i.e. 88 %) among other re-evaluated Hadith mining methods. The most important aspect in Hadith mining methods is query expansion since the query must be fitted to the Hadith lingo. The lack of knowledge based methods is evident in Hadith classification and mining approaches and this absence can be covered in future works using knowledge graphs.

...read moreread less

57 citations

"Improving Document Retrieval with S..." refers methods in this paper

...Therefore, the vector space model was chosen because the dataset used is Indonesian translated and is the best method for conducting retrieval documents [10]....
[...]

An artificial intelligence approach to Arabic and Islamic content on the internet

[...]

Eric Atwell, C Brierley, Kais Dukes, Majdi Sawalha, Abdul-Baquee Sharaf - Show less +1 more

01 Jan 2011

TL;DR: A review of Artificial Intelligence and Corpus Linguistics research at Leeds University on Arabic and the Quran and a proposal for further research: the Quranic Knowledge Map.

...read moreread less

Abstract: We review a range of Artificial Intelligence and Corpus Linguistics research at Leeds University on Arabic and the Quran, which has produced a range of software and corpus datasets for research on Modern Standard Arabic and more recently Quranic Arabic .Our work on Quranic Arabic corpus linguistics has attracted widespread interest, not only from Arabic linguists but also from Quranic students, and the general public. We see a great potential impact of Artificial Intelligence modelling of the Quran. This leads us to present a proposal for further research: the Quranic Knowledge Map.

...read moreread less

53 citations

"Improving Document Retrieval with S..." refers methods in this paper

...In recent years, the implementation of natural language processing in the Qur'an and hadith are used to conduct information retrieval [5]....
[...]

Knowledge Discovery in Al-Hadith Using Text Classification Algorithm

[...]

Khitam Jbara, King Abdullah

01 Jan 2010

TL;DR: This study is conducted to examine knowledge discovery from AL-Hadith through classification algorithm in order to classify AL- hadith to one of predefined classes (books), where AL- Hadith is the saying of Prophet Mohammed and the second religious source for all Muslims.

...read moreread less

Abstract: Machine Learning and Data Mining are applied to language datasets in order to discover patterns for English and other European languages, Arabic language belongs to the Semitic family of languages, which differs from European languages in syntax, semantic and morphology. One of the difficulties in Arabic language is that it has a complex morphological structure and orthographic variations. This study is conducted to examine knowledge discovery from AL-Hadith through classification algorithm in order to classify AL-Hadith to one of predefined classes (books), where AL-Hadith is the saying of Prophet Mohammed (Peace and blessings of Allah be upon him (PBUH)) and the second religious source for all Muslims, and because of its importance for Muslims all over the word knowledge discovery from AL-Hadith will make AL-Hadith more understandable for both Muslims and nonmuslims.

...read moreread less

38 citations

"Improving Document Retrieval with S..." refers methods in this paper

...[8] researched the classification of texts that existed in 1321 Sahih al Bukhari hadith by classifying classes that have the same topic then the testing is done by comparing with the similarity coefficient table and generate an accuracy of 73%....
[...]

Journal Article•DOI•

Using Topic Analysis for Querying Halal Information on Malay Documents

[...]

Haslizatul Mohamed Hanum¹, Zainab Abu Bakar¹, Nurazzah Abd Rahman¹, Marshima Mohd Rosli¹, Norzilah Musa¹ - Show less +1 more•Institutions (1)

Universiti Teknologi MARA¹

19 Mar 2014-Procedia - Social and Behavioral Sciences

TL;DR: Results and analysis show that, LSI technique outperformed the exact frequency-based technique despite the longer processing time it took during the indexing.

...read moreread less

17 citations

Additional excerpts

...[7] conduct research on information retrieval in the form of halal product queries from Malay language documents using the latent semantic indexing method and generate an accuracy value of 86%....
[...]