scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Query reformulation approach using domain specific ontology for semantic information retrieval

01 Oct 2021-International Journal of Information Technology (Springer Singapore)-Vol. 13, Iss: 5, pp 1-9
TL;DR: It can safely be concluded that in the domain of research the OBSIRM performs better in terms of users’ expectations than Google (for string instrument domain considered by the researcher) as far as relevant information retrieval is concerned.
Abstract: Conventional search engines provide a lot of irrelevant results to the user’s search query, which consumes a lot of time and effort, as the context and semantics of the request made by the user are not analyzed to the optimum extent. This highlights for the need of a methodology embedded in the search machinery, enabling semantic web search. Ontologies have proven to be an effective technology for such semantic knowledge representation and information retrieval. In this research paper, the researcher has constructed a string ontology in a new domain (the music domain) from scratch using Protege 5.0 for semantic information retrieval for a query in the search engine. This ontology is further used in the proposed ontology based semantic information retrieval method (OBSIRM), which has been built to refine the web search in the music domain. The researcher has proposed a novel approach to refine the web search in which the query is first replaced with abbreviations along with the use of the multilingual concept to search a query. The results of the proposed method have been compared with Google. The accuracy and efficiency of the proposed method has been measured in terms of precision and recall. The average precision and recall of OBSIRM is 1.43 and 0.7 as compared to generic Google search engine i.e. 1.04 and 0.51. Hence it can safely be concluded that in the domain of research the OBSIRM performs better in terms of users’ expectations than Google (for string instrument domain considered by the researcher) as far as relevant information retrieval is concerned.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , an improved aquila optimization-based COOT (IAOCOOT) algorithm for query expansion was proposed to deal with the problems of uncertainty, imprecision, and semantic ambiguity of indexed terms in both the local and global perspectives.
Abstract: Query expansion is an important approach utilized to improve the efficiency of data retrieval tasks. Numerous works are carried out by the researchers to generate fair constructive results; however, they do not provide acceptable results for all kinds of queries particularly phrase and individual queries. The utilization of identical data sources and weighting strategies for expanding such terms are the major cause of this issue which leads the model unable to capture the comprehensive relationship between the query terms. In order to tackle this issue, we developed a novel approach for query expansion technique to analyze the different data sources namely WordNet, Wikipedia, and Text REtrieval Conference. This paper presents an Improved Aquila Optimization-based COOT(IAOCOOT) algorithm for query expansion which retrieves the semantic aspects that match the query term. The semantic heterogeneity associated with document retrieval mainly impacts the relevance matching between the query and the document. The main cause of this issue is that the similarity among the words is not evaluated correctly. To overcome this problem, we are using a Modified Needleman Wunsch algorithm algorithm to deal with the problems of uncertainty, imprecision in the information retrieval process, and semantic ambiguity of indexed terms in both the local and global perspectives. The k most similar word is determined and returned from a candidate set through the top-k words selection technique and it is widely utilized in different tasks. The proposed IAOCOOT model is evaluated using different standard Information Retrieval performance metrics to compute the validity of the proposed work by comparing it with other state-of-art techniques.

4 citations

Proceedings ArticleDOI
01 Aug 2020
TL;DR: This paper shows the implementation and analysis of various popular information retrieval metrics, be it a set retrieval or rank retrieval that can be used for performance evaluation of web retrieval of data.
Abstract: With an ever-increasing data over the internet, efficient information retrieval of data for its users has always been on stake. The issue is deeper when it comes to e-governance based real estate scenario, where there is a lack of interoperability found in buying and selling of property formats and loads of miscellaneous legal terminology that fetches inappropriate legal documentation for the users. Hence in today's Semantic web scenario, a Real estate ontology has been proposed under the Real Estate Information Retrieval Model and the model is evaluated by applying various information retrieval measures. This paper shows the implementation and analysis of various popular information retrieval metrics, be it a set retrieval or rank retrieval that can be used for performance evaluation of web retrieval of data. The proposed system depicts better results in almost all the metrics as compared to the initial user query set.

4 citations


Cites methods from "Query reformulation approach using ..."

  • ...Kaur and Aggarwal [5] in their work measure the accuracy and efficiency of their proposed method OBSIRM in terms of average precision and recall....

    [...]

Journal ArticleDOI
TL;DR: Gap-text-2-SQL as mentioned in this paper is an approach that allows long-text sequences to be handled by transformers with up to 512 input tokens and runs database schema pruning (removal of table names and column names that are useless for the query of interest).
Abstract: Databases have a large amount of information that can be accessed by the structured query language (SQL), but this language requires technical knowledge. An alternative to facilitating access to this information is to use natural language to make queries, and an artificial intelligence model to translate to SQL. Transformer-based language models have been incredibly successful in this regard. However, transformers are limited by the size of the input text; therefore, long sentences can interfere with the quality of the results. We present two techniques to improve results. The first is an innovative technique that allows long-text sequences to be handled by transformers with up to 512 input tokens. We run database schema pruning (removal of table names and column names that are useless for the query of interest) during a fine-tuning process. The second technique is a multilingual approach. The model is fine-tuned using a data-augmented Spider dataset [a specialized dataset for Natural Language to SQL (NL2SQL)] in four languages simultaneously: English, Portuguese, Spanish, and French. The combination of these techniques allowed an increase in the exact set match accuracy results from 0.718 to 0.736 in our validation dataset. The process of improving results is challenging because NL2SQL techniques are already significantly optimized, and the two techniques presented here are important because they are applied in the training dataset, allowing them to be used with any current technique. Source code, evaluations, and checkpoints are available at https://github.com/C4AI/gap-text2sql .
TL;DR: A focused crawler that attempts to index only web pages that contain information about jobs, launch-related events, and news and uses contextual pseudo-relevance feedback using machine learning algorithms to get more relevant documents than regular search engines.
Abstract: Nowadays the number of users has increased in use of internet web access as well as the growth of data has also increased. This can be a hassle for users to find exactly relevant information from the Internet relatively quickly. The search process on the web is also inaccurate and it takes a lot of time to get results. Domain-specific web search engines contain information that is specific to the subject at hand. This domainspecific web search engine aims to improve accuracy and provide additional functionality. This makes it easy to connect young minds with start-ups that have turned out to be quite different from common web search engines. This proposed task uses a focused crawler that attempts to index only web pages that contain information about jobs, launch-related events, and news. It also uses contextual pseudo-relevance feedback using machine learning algorithms to get more relevant documents than regular search engines. This proposed task produces search results by reflecting feedback without human intervention. This task uses machine learning techniques to improve accuracy with domain-specific web search engines for better results. Index Terms –Web Crawler, Inverted Indexes, Relevance
Journal ArticleDOI
TL;DR: HiveRel as mentioned in this paper presents search results as tiled hexagons on a map-like surface with center-out relevance ordering and allows on-demand display of relationships between search results.
Abstract: The growing abundance in complex network data models is constantly increasing the challenges for non-expert users who perform an effective exploratory search in large data collections. In such domains, users search for entities related to a topic of interest and acquire knowledge by investigating the relationships between these entities. Designers, in turn, are challenged by the need to provide tools that enable convenient search and exploration to facilitate productive performance on the task. For this purpose, we introduce HiveRel, a search system that presents search results as tiled hexagons on a map-like surface with center-out relevance ordering and allows on-demand display of relationships between search results. HiveRel’s user interface is based on theoretical principles that reflect how users acquire knowledge through relationships. For the search mechanism, we provide a set of information retrieval definitions leading to the formalization of the Maximal n-Bounded Exploration Subgraph problem and present an implementation of a greedy heuristic algorithm that provides non-optimal solutions to this problem. We develop a proof of concept version of HiveRel. We evaluate it in two user studies that compare users’ performance using HiveRel to standard web search over a range of search knowledge acquisition tasks and two different domains. The results indicate that despite the lack of familiarity with the new system, users were generally more accurate and as fast using HiveRel, and provided positive evaluations for the search experience.
References
More filters
Journal ArticleDOI
TL;DR: A novel metric to measure the semantic relatedness between words is proposed based on ontologies represented using a general knowledge base for dynamically building a semantic network and based on linguistic properties is combined with this network to create a measure of semanticrelatedness.
Abstract: The concept of relevance is a hot topic in the information retrieval process. In recent years the extreme growth of digital documents brought to light the need for novel approaches and more efficient techniques to improve the accuracy of IR systems to take into account real users' information needs. In this article we propose a novel metric to measure the semantic relatedness between words. Our approach is based on ontologies represented using a general knowledge base for dynamically building a semantic network. This network is based on linguistic properties and it is combined with our metric to create a measure of semantic relatedness. In this way we obtain an efficient strategy to rank digital documents from the Internet according to the user's interest domain. The proposed methods, metrics, and techniques are implemented in a system for information retrieval on the Web. Experiments are performed on a test set built using a directory service having information about analyzed documents. The obtained results compared to other similar systems show an effective improvement.

78 citations

01 Jan 2009
TL;DR: This paper compared the retrieval effectiveness of the Google and Yahoo and showed that the precision of Google was high for simple multiword queries and Yahoo had comparatively high precision for complex multi-word queries.
Abstract: This paper compared the retrieval effectiveness of the Google and Yahoo. Both precision and relative recall were considered for evaluating the effectiveness of the search engines. Queries using concepts in the field of library and information science were tested and were divided into one-word queries, simple multi-word queries and complex multi-word queries. Results of the study showed that the precision of Google was high for simple multiword queries (0.97) and Yahoo had comparatively high precision for complex multi-word queries (0.76). Relative recall of Google was high for simple oneword queries (0.92) while Yahoo had higher relative recall for complex multiword queries (0.61).

41 citations

01 Jan 2011
TL;DR: This paper explains the terms of university through university ontology using protege, a most popular tool for ontology editing and for developing ontology, which enables ontology developers to concentrate on conceptual terms without thinking about syntax of an output language.
Abstract: The current web is based on html which can display information simply. Researchers are working towards the semantic web which is an intelligent and meaningful web proposed by Tim burner's lee. Ontology and ontology based application are its basic ingredients. With the ontology we can focus on only main concepts and its relationship rather than information. Protege is a most popular tool for ontology editing and for developing ontology (1). It has a GUI which enables ontology developers to concentrate on conceptual terms without thinking about syntax of an output language. Protege has flexible knowledge model and extensible plug-in architecture. This paper explains the terms of university through university ontology. We will focus on creating an university ontology using protege. Rajiv Gandhi Technical University Bhopal, India has been taken an example for the ontology development and various aspects like: super class and subclass hierarchy, creating a subclass instances for class illustration, query retrieval process visualization view and graph view have been demonstrated.

37 citations

Journal ArticleDOI
S. Remi1, S.C. Varghese1
TL;DR: A novel method for supporting semantic information retrieval is proposed by building a domain specific ontology and a prototype of a fuzzy semantic search engine is developed and the results are compared with that of a traditional search engine.

19 citations

Posted Content
TL;DR: SIEU as discussed by the authors is a semantic search engine that uses ontology as a knowledge base for the information retrieval process, which is one layer above what Google or any other search engines retrieve by analyzing just the keywords.
Abstract: Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.

17 citations

Trending Questions (1)
Can the incorporation of domain-specific ontologies improve the accuracy of text document comparison for information retrieval?

Yes, the integration of domain-specific ontologies, as demonstrated in OBSIRM for music domain, enhances precision and recall in semantic information retrieval compared to generic search engines like Google.