scispace - formally typeset
Search or ask a question

Penerapan Algoritma TF-IDF Vector Space Model (VSM) Pada Information Retrieval Terjemahan Al Quran Surat 1 Samai Dengan Surat 16 Berdasarkan Kesamaan Makna

TL;DR: In this article, the TF-IDF Vector Space Model (VSM) was used for information retrieval in Bahasa Indonesia Korpus sinonim (tesaurus), and the TFIDF vector space model was used dengan pengembangan pada pembobotan keyword and proses kueri.
Abstract: Information Retrieval (IR) merupakan pencarian informasi yang biasanya dalam suatu teks dokumen. Pada penelitian ini membahas IR terhadap Al Quran terjemahan Bahasa Indonesia Korpus sinonim (tesaurus) dibentuk untuk mendukung information retrieval agar hasil pencarian menjadi lebih luas. Metode yang digunakan adalah TF-IDF Vector Space Model (VSM) dengan pengembangan pada pembobotan keyword dan proses kueri, yaitu hasil kueri yang menjadi peringkat satu pada hasil pencarian information retrieval dijadikan kueri untuk proses pencarian selanjutnya. Cosine similarity digunakan untuk perhitungan kemiripan dokumen. Pembentukan basis data korpus sinonim (tesaurus) dilakukan dengan cara mengembangkan suatu sistem agar dapat dilakukan secara otomatis. Pengujian dilakukan dengan menguji pencarian ayat Al Quran dalamaplikasi information retrieval dan membandingakan hasil pencarian aplikasi dengan pendapat pakar Al Quran dan Hadist. Persentase keberhasilan pengujian dengan menggunakan 1 kata mencapai 100%. Keberhasilan pencarian pengujian menggunakan lebih dari 1 kata atau sebuah kalimat, pada 10 peringkat teratas dari dokumen yang ditemukan, keberhasilan mencapai 95,6%. Penelitian initelah membuktikan bahwa information retrieval dengan menggunakan korpus sinonim(tesaurus), dan penambahan bobot kata dari keyword pertama yang dicari menambah tingkat relevan, karena secara signifikan memperluas hasil pencarian dan mengeliminir dokumen yang tidak relevan.
Citations
More filters
Journal ArticleDOI
20 Jun 2020
TL;DR: The improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common on Indonesian-translated hadiths on the Web and social media raw text.
Abstract: Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.

3 citations


Cites methods from "Penerapan Algoritma TF-IDF Vector S..."

  • ...[9] design an algorithm for Indonesian translated Qur’an based on the vector space model and TF-IDF which generate 98....

    [...]

References
More filters
07 Nov 2003
TL;DR: This thesis tries to evaluate the existing stemmer for Bahasa Indonesia and compare it with a purely rule-based stemmer, which is developed based on a study of morphological structure ofBahasa Indonesia words.
Abstract: Stemming is a process which provides a mapping of different morphological variants of words into their base/common word (stem). This process is also known as conflation. Based on the assumption that terms which have a common stem will usually have similar meaning, the stemming process is widely used in Information Retrieval as a way to improve retrieval performance. In addition to its ability to improve the retrieval performance, the stemming process, which is done at indexing time, will also reduce the size of the index file. This thesis is about a study of stemming algorithms in Bahasa Indonesia, especially their effect on the information retrieval. We try to evaluate the existing stemmer for Bahasa Indonesia and compare it with a purely rule-based stemmer, which we created for this purpose. This rule-based stemmer is developed based on a study of morphological structure of Bahasa Indonesia words.

231 citations

Journal ArticleDOI
TL;DR: This work surveys existing techniques for stemming Indonesian words to their morphological roots, presents the novel and highly accurate CS algorithm, and explores the effectiveness of stemming in the context of general-purpose text information retrieval through ad hoc queries.
Abstract: Stemming words to (usually) remove suffixes has applications in text search, machine translation, document summarization, and text classification. For example, English stemming reduces the words "computer," "computing," "computation," and "computability" to their common morphological root, "comput-." In text search, this permits a search for "computers" to find documents containing all words with the stem "comput-." In the Indonesian language, stemming is of crucial importance: words have prefixes, suffixes, infixes, and confixes that make matching related words difficult.This work surveys existing techniques for stemming Indonesian words to their morphological roots, presents our novel and highly accurate CS algorithm, and explores the effectiveness of stemming in the context of general-purpose text information retrieval through ad hoc queries.

168 citations

Journal ArticleDOI
20 Jun 2020
TL;DR: The improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common on Indonesian-translated hadiths on the Web and social media raw text.
Abstract: Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.

3 citations