scispace - formally typeset
Search or ask a question
Book ChapterDOI

An Approach to Design an IoT Service for Business—Domain Specific Web Search

TL;DR: The purpose of this work is to design a crawler which crawls through the domain specific web pages in the web according to the stated ontologies, and makes use of effective page ranking algorithm which leads the user to the most relevant results in a specific domain.
Abstract: The efforts are made to extract the most relevant results while searching for the specific query in huge web storage available in the WWW. Majority of the Web pages are written in such a way that the crawler finds it difficult to extract any specific domain. The concept of ontology is used to find domain specific results. The purpose of this work is to design a crawler which crawls through the domain specific web pages in the web according to the stated ontologies. To minimize the bias on accessing the highly relevant web links in a deep web, we are proposing an intelligent crawling mechanism. The intelligent crawler makes use of effective page ranking algorithm which leads the user to the most relevant results in a specific domain. The proposed Internet of Things (IoT) service will incorporate this mechanism to an effective extent.
Citations
More filters
Proceedings ArticleDOI
01 Jan 2017
TL;DR: This work proposes a novel approach to the distribution of web search engine to take care of sub parts of entity and sensor pages by distributing the search engine itself along with the index over multiple nodes.
Abstract: Real time business search largely has the involvement of Internet of Things which has now become a huge set of objects with large magnitude of intercommunication links and services. This scalability issue can be resolved by bringing in (SIoT) Social IoT i.e. an object looking for its social partners who will have similar set of rules to positively influence the performance of the service. In this work, we are putting forward the analysis of the impact of deep web on real time search. We also propose a novel approach by analyzing the shortcomings of the existing techniques. The novel idea is the distribution of web search engine to take care of sub parts of entity and sensor pages by distributing the search engine itself along with the index over multiple nodes. Deep websites are needed to be brought in for accurate results. This work offers better accuracy and a significant speed up for multiple query execution in distributed environment.

1 citations


Cites background from "An Approach to Design an IoT Servic..."

  • ...The relevance of a web page[8] in a particular domain is the most important aspect of our system with respect to accuracy....

    [...]

Journal ArticleDOI
TL;DR: The proposed query processing and analysing system (QPAS) for social networks is based on extracting user's intent from various social networks using existing NLP techniques and offers an edge over other mechanisms as it not only retrieves more user-centric results as compared to traditional way of keyword-based searching but also in timely manner as well.
Abstract: User intention and nature of network plays a vital role towards the quality of response received as the result of any user query. Therefore, the need of system understanding the user's intent and network dynamism as well is highly apparent. The proposed query processing and analysing system (QPAS) for social networks is based on extracting user's intent from various social networks using existing NLP techniques. It fetches the information and further employs hybrid ensemble k-means hierarchical agglomerative clustering (HEKHAC) and modified Bitonic sort to improve the responses. The proposed approach offers an edge over other mechanisms as it not only retrieves more user-centric results as compared to traditional way of keyword-based searching but also in timely manner as well. It is an innovative approach to investigate the new aspects of social network. The proposed model offers a noteworthy revolution scoring up to precision and recall respectively.
References
More filters
Proceedings Article
11 Nov 1999
TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
Abstract: The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.

14,400 citations

Proceedings ArticleDOI
08 May 2007
TL;DR: A new framework is proposed whereby crawlers automatically learn patterns of promising links and adapt their focus as the crawl progresses, thus greatly reducing the amount of required manual setup andtuning.
Abstract: In this paper we describe new adaptive crawling strategies to efficiently locate the entry points to hidden-Web sources. The fact that hidden-Web sources are very sparsely distributedmakes the problem of locating them especially challenging. We deal with this problem by using the contents ofpages to focus the crawl on a topic; by prioritizing promisinglinks within the topic; and by also following links that may not lead to immediate benefit. We propose a new frameworkwhereby crawlers automatically learn patterns of promisinglinks and adapt their focus as the crawl progresses, thus greatly reducing the amount of required manual setup andtuning. Our experiments over real Web pages in a representativeset of domains indicate that online learning leadsto significant gains in harvest rates' the adaptive crawlers retrieve up to three times as many forms as crawlers thatuse a fixed focus strategy.

190 citations

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A general method for analyzing nondeterministic programs that use reducers and it is shown that for a graph G=(V,E) with diameter D and bounded out-degree, this data-race-free version of PBFS algorithm attains near-perfect linear speedup if P << (V+E)/Dlg3(V/D).
Abstract: We have developed a multithreaded implementation of breadth-first search (BFS) of a sparse graph using the Cilk++ extensions to C++. Our PBFS program on a single processor runs as quickly as a standar. C++ breadth-first search implementation. PBFS achieves high work-efficiency by using a novel implementation of a multiset data structure, called a "bag," in place of the FIFO queue usually employed in serial breadth-first search algorithms. For a variety of benchmark input graphs whose diameters are significantly smaller than the number of vertices -- a condition met by many real-world graphs -- PBFS demonstrates good speedup with the number of processing cores.Since PBFS employs a nonconstant-time "reducer" -- "hyperobject" feature of Cilk++ -- the work inherent in a PBFS execution depends nondeterministically on how the underlying work-stealing scheduler load-balances the computation. We provide a general method for analyzing nondeterministic programs that use reducers. PBFS also is nondeterministic in that it contains benign races which affect its performance but not its correctness. Fixing these races with mutual-exclusion locks slows down PBFS empirically, but it makes the algorithm amenable to analysis. In particular, we show that for a graph G=(V,E) with diameter D and bounded out-degree, this data-race-free version of PBFS algorithm runs it time O((V+E)/P + Dlg3(V/D)) on P processors, which means that it attains near-perfect linear speedup if P

174 citations

Proceedings ArticleDOI
23 Jul 2007
TL;DR: It is found that HITS outperforms PageRank, but is about as effective as web-page in-degree, and that link-based features perform better forgeneral queries, whereas BM25F performs better for specificqueries.
Abstract: This paper describes a large-scale evaluation of the effectiveness of HITS in comparison with other link-based ranking algorithms, when used in combination with a state-of-the-art text retrieval algorithm exploiting anchor text. We quantified their effectiveness using three common performance measures: the mean reciprocal rank, the mean average precision, and the normalized discounted cumulative gain measurements. The evaluation is based on two large data sets: a breadth-first search crawl of 463 million web pages containing 17.6 billion hyperlinks and referencing 2.9 billion distinct URLs; and a set of 28,043 queries sampled from a query log, each query having on average 2,383 results, about 17 of which were labeled by judges. We found that HITS outperforms PageRank, but is about as effective as web-page in-degree. The same holds true when any of the link-based features are combined with the text retrieval algorithm. Finally, we studied the relationship between query specificity and the effectiveness of selected features, and found that link-based features perform better for general queries, whereas BM25F performs better for specific queries.

89 citations

Journal ArticleDOI
TL;DR: This special issue features recent and emerging advances IoT architecture, protocols, services and applications, and IoT data modeling and management, which cover topics including sensors and devices for IoT, efficient communications and networking for Internet of things, security and privacy in IoT, crowdsensing and crowdsourcing, localization and tracking, Services and applications.
Abstract: The internet of things (IoT) has been emerging as the next big 'thing' in Internet. It is envisioned that billions of physical things or objects will be outfitted with different kinds of sensors and actuators and connected to the Internet via heterogeneous access networks enabled by technologies such as embedded sensing and actuating, radio frequency identification (RFID), wireless sensor networks, real-time and semantic web services, etc. IoT is actually cyber-physical systems or a network of networks. With the huge number of things/objects and sensors/actuators connected to the Internet, a massive and in some cases real-time data flow will be automatically produced by connected things and sensors. It is important to collect correct raw data in an efficient way; but more important is to analyze and mine the raw data to abstract more valuable information such as correlations among things and services to provide web of things or Internet of services. This special issue features recent and emerging advances IoT architecture, protocols, services and applications. More than one hundred papers were received and peer-reviewed, out of which thirty five papers were selected for publication, which cover topics including sensors and devices for IoT, efficient communications and networking for IoT, security and privacy in IoT, crowdsensing and crowdsourcing, localization and tracking, services and applications, and IoT data modeling and management. The first set of five papers discuss IoT sensor and device related issues. The second set includes fourteen papers about efficient communications and networking for IoT. Security and privacy is another important aspect in IoT. Five papers focusing on IoT security and privacy were selected. Four papers concentrating on tracking and localization were selected and three more papers about IoT data modeling and management were included in this special issue. Finally, there were three papers which presented new IoT services and applications.

85 citations