scispace - formally typeset
Search or ask a question
Author

Santanu Roy

Bio: Santanu Roy is an academic researcher from Future Institute of Engineering and Management. The author has contributed to research in topics: Materialized view & Data warehouse. The author has an hindex of 3, co-authored 10 publications receiving 28 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This research work dynamically finds the most cost effective path from the lattice structure of cuboids based on concept hierarchy to minimize the query access time.
Abstract: Analytical processing on multi-dimensional data is performed over data warehouse. This, in general, is presented in the form of cuboids. The central theme of the data warehouse is represented in the form of fact table. A fact table is built from the related dimension tables. The cuboid that corresponds to the fact table is called base cuboid. All possible combination of the cuboids could be generated from base cuboid using successive roll-up operations and this corresponds to a lattice structure. Some of the dimensions may have a concept hierarchy in terms of multiple granularities of data. This means a dimension is represented in more than one abstract form. Typically, neither all the cuboids nor all the concept hierarchy are required for a specific business processing. These cuboids are resided in different layers of memory hierarchy like cache memory, primary memory, secondary memory, etc. This research work dynamically finds the most cost effective path from the lattice structure of cuboids based on concept hierarchy to minimize the query access time. The knowledge of location of cuboids at different memory elements is used for the purpose.

13 citations

Book ChapterDOI
05 Nov 2014
TL;DR: This paper has proposed materialized view construction methodology at first by analyzing the attribute similarity based on Jaccard Index then clustering methodology is applied using similarity based weighted connected graph and the clusters are validated to check the correctness of the materialized views.
Abstract: Materialized view is important to any data intensive system where answering queries at runtime is subject of interest. Users are not aware about the presence of materialized views in the system but the presence of these results in fast access to data and therefore optimized execution of queries. Many techniques have evolved over the period to construct materialized views. However the survey work reveals a few attempts to construct materialized views based on attribute similarity measure by statistical similarity function and thereafter applying the clustering techniques. In this paper we have proposed materialized view construction methodology at first by analyzing the attribute similarity based on Jaccard Index then clustering methodology is applied using similarity based weighted connected graph. Further the clusters are validated to check the correctness of the materialized views.

10 citations

Book ChapterDOI
TL;DR: A new methodology for efficient implementation by forming lattice on query parameters helps to co-relate the different query parameters that in turn form association rules among them.
Abstract: This research work is on optimizing the number of query parameters required to recommend an e-learning platform This paper proposes a new methodology for efficient implementation by forming lattice on query parameters This lattice structure helps to co-relate the different query parameters that in turn form association rules among them The proposed methodology is conceptualized on an e-learning platform with the objective of formulating an effective recommendation system to determine associations between various products offered by the e-learning platform by analyzing the minimum set of query parameters

8 citations

Book ChapterDOI
01 Jan 2017
TL;DR: An algorithm which generates a materialized view by considering the frequencies of the multiple attributes at a time taken from a database with the help of Apriori algorithm is proposed, which supports scalabilisalubrityty as well as flexibility.
Abstract: Analysis of data is an inherent part in the world of business to identify interesting patterns underlying in the data set. The size of the data is usually huge in the modern day application. Searching the data from the huge data set with a lesser time complexity is always a subject of interest. These data are mostly stored in tables based on relational model. Data are fetched from these tables using SQL queries. Query response time is an important quality factor for this type of system. Materialized view formation is the most common way of enhancing the query execution speed across industries. Different approaches have been applied over the time to generate materialized views. However few attempts have been made to construct materialized views with the help of Association based mining algorithms and none of those existing Association based methods measure the performance of the views in terms of both Hit-Miss ratio and view size scalability. This paper proposes an algorithm which generates a materialized view by considering the frequencies of the multiple attributes at a time taken from a database with the help of Apriori algorithm. Apriori algorithm is used to generate frequent attribute sets which are further considered for materialization. Moreover by varying the support count, changing the sizes of the frequent attributes sets; proposed methodology supports scalabilisalubrityty as well as flexibility. Experimental results are given to prove the enhanced results over existing inter-attribute analysis based materialized view formation.

5 citations

Proceedings ArticleDOI
07 Jul 2013
TL;DR: The focus of this research work is to dynamically identify the most cost effective path within the lattice structure of cuboids to minimize the query access time having the knowledge of existing cuboid location at different memory elements.
Abstract: Data warehouse represents multi-dimensional data suitable for analytical processing and logically data are organized in the form of data cube or cuboid. Data warehouse actually represents a business theme which is called fact table. The cuboid that identifies the complete fact table is called base cuboid. The all possible combination of the cuboids that could be generated from base cuboid corresponds to lattice structure. A lattice consists of numbers of cuboids. In real life, all these cuboids may not be important for business analysis. Thus all of them are not always called during business processing. The cuboids that are referred in different applications are fetched from diverse memory hierarchy such as cache memory, primary memory and secondary memory. The different execution speed of the respective memory element is taken into account which forms a memory hierarchy. The focus of this research work is to dynamically identify the most cost effective path within the lattice structure of cuboids to minimize the query access time having the knowledge of existing cuboid location at different memory elements.

3 citations


Cited by
More filters
01 Jan 2008
TL;DR: In this paper, a new approach for materialized view selection using Parallel Simulated Annealing (PSA) is introduced, which selects views from an input Multiple View Processing Plan (MVPP).
Abstract: In order to facilitate efficient query processing, the information contained in data warehouses is typically stored as a set of materialized views. Deciding which views to materialize represent a challenge in order to minimize view maintenance and query processing costs. Some existing approaches are applicable only for small problems, which are far from reality. In this paper we introduce a new approach for materialized view selection using Parallel Simulated Annealing (PSA) that selects views from an input Multiple View Processing Plan (MVPP). With PSA, we are able to perform view selection on MVPPs having hundreds of queries and thousands of views. Also, in our experimental study we show that our method provides a significant improvement in the quality of the obtained set of materialized views over existing heuristic and sequential simulated annealing algorithms.

41 citations

Journal ArticleDOI
TL;DR: The proposed MOTH system exploits the coarse-grained of the fully and partially reused-based opportunities among queries with considering non-equal tuples size and non-uniform data distribution to avoid repeated computations and reduce multi-query execution time.
Abstract: Multi-query optimization in Big Data becomes a promising research direction due to the popularity of massive data analytical systems (eg, MapReduce and Flink) The multi-query is translated into jobs These jobs are routinely submitted with similar tasks to the underling Big Data analytical systems These similar tasks are considered complicated and computation overhead Therefore, there are some existing techniques that have been proposed for exploiting sharing tasks in Big Data multi-query optimization (eg, MRShare and Relaxed MRShare) These techniques are heavily tailored relaxed optimizing factors of fine-grained reused-based opportunities In accordance with Big Data multi-query optimization, the existing fine-grained techniques are only concerned with equal tuples size and uniform data distribution These issues are not applicable to the real-world distributed applications which depend on coarse-grained reused-based opportunities, such as non-equal tuples size and non-uniform data distribution These two issues receive more-attention in Big Data multi-query optimization, to minimize the data read from or written back to Big Data infrastructures (eg, Hadoop) In this paper, Multi-Query Optimization using Tuple Size and Histogram (MOTH) system has been proposed to consider the granularity of the reused-based opportunities The proposed MOTH system exploits the coarse-grained of the fully and partially reused-based opportunities among queries with considering non-equal tuples size and non-uniform data distribution to avoid repeated computations According to the proposed MOTH system, a combined technique has been introduced for estimating the coarse-grained reused-based opportunities horizontally and vertically The horizontal estimation of non-equal tuples size has been done by extracting metadata in column-level, while the vertical estimation of non-uniform data distribution is concerned with using pre-computed histogram in row-level In addition, the MOTH system estimates the coarse-grained reused-based opportunities with considering slow storage (ie, limited physical resources or fewer allocated virtualized resources) to produce the accurate estimation of the reused results costs Then, a cost-based heuristic algorithm has been introduced to select the best reused-based opportunity and generate an efficient multi-query execution plan Because the partial reused-based opportunities have been considered, extra computations are needed to retrieve the non-derived results Also, a partial reused-based optimizer has been tailored and added to the proposed MOTH system to reformulate the generated multi-query plan to improve the shared partial queries According to the experimental results of the proposed MOTH system using TPC-H benchmark, it is found that multi-query execution time has been reduced by considering the granularity of the reused results

20 citations

Journal ArticleDOI
TL;DR: This research proposes virtual and intelligent agent-based recommendation, which requires users’ profile information and preferences to recommend the proper content and search results based on their search history, and applies Natural Language Processing techniques and semantic analysis approaches for the recommendation of course selection to e-learners and tutors.
Abstract: E-learning is a popular area in terms of learning from social media websites in various terms and contents for every group of people in this world with different knowledge backgrounds and jobs. E-learning sites help users such as students, business workers, instructors, and those searching for different educational institutions. Excluding the benefits of this system, there are various challenges that the users face in online platforms. One of the important challenges is the true information and right content based on these resources, search results and quality. This research proposes virtual and intelligent agent-based recommendation, which requires users’ profile information and preferences to recommend the proper content and search results based on their search history. We applied Natural Language Processing (NLP) techniques and semantic analysis approaches for the recommendation of course selection to e-learners and tutors. Moreover, machine learning performance analysis applied to improve the user rating results in the e-learning environment. The system automatically learns and analyzes the learner characteristics and processes the learning style through the clustering strategy. Compared with the recent state-of-the-art in this field, the proposed system and the simulation results show the minimizing number of metric errors compared to other works. The achievements of the presented approach are providing a comfortable platform to the user for course selection and recommendations. Similarly, we avoid recommending the same contents and courses. We analyze the user preferences and improving the recommendation system performance to provide highly related content based on the user profile situation. The prediction accuracy of the proposed system is 98% compared to hybrid filtering, self organization systems and ensemble modeling.

14 citations

Journal ArticleDOI
TL;DR: This research work dynamically finds the most cost effective path from the lattice structure of cuboids based on concept hierarchy to minimize the query access time.
Abstract: Analytical processing on multi-dimensional data is performed over data warehouse. This, in general, is presented in the form of cuboids. The central theme of the data warehouse is represented in the form of fact table. A fact table is built from the related dimension tables. The cuboid that corresponds to the fact table is called base cuboid. All possible combination of the cuboids could be generated from base cuboid using successive roll-up operations and this corresponds to a lattice structure. Some of the dimensions may have a concept hierarchy in terms of multiple granularities of data. This means a dimension is represented in more than one abstract form. Typically, neither all the cuboids nor all the concept hierarchy are required for a specific business processing. These cuboids are resided in different layers of memory hierarchy like cache memory, primary memory, secondary memory, etc. This research work dynamically finds the most cost effective path from the lattice structure of cuboids based on concept hierarchy to minimize the query access time. The knowledge of location of cuboids at different memory elements is used for the purpose.

13 citations

01 Jan 2011
TL;DR: In this article, a new hobby for other people may inspire them to join with you, but many people are not interested in this hobby, why? Boring is the reason of why.
Abstract: Introducing a new hobby for other people may inspire them to join with you. Reading, as one of mutual hobby, is considered as the very easy hobby to do. But, many people are not interested in this hobby. Why? Boring is the reason of why. However, this feel actually can deal with the book and time of you reading. Yeah, one that we will refer to break the boredom in reading is choosing information and automation international symposium isia 2010 guangzhou china november 10 11 2010 revised selected papers communications in computer and information science as the reading material.

13 citations