scispace - formally typeset
Search or ask a question

Showing papers by "Mansaf Alam published in 2017"


Journal ArticleDOI
TL;DR: This research paper investigates the current trends and identifies the existing challenges in development of a big scholarly data platform, with specific focus on directions for future research and maps them to the different phases of the big data lifecycle.
Abstract: Survey of big scholarly data with respect to the different phases of the big data lifecycle.Identifies the different big data tools and technologies that can be used for development of scholarly applications.Investigates research challenges and limitations specific to big scholarly data and its applications.Provides research directions and paves way towards the development of a generic and comprehensive big scholarly data platform. Recently, there has been a shifting focus of organizations and governments towards digitization of academic and technical documents, adding a new facet to the concept of digital libraries. The volume, variety and velocity of this generated data, satisfies the big data definition, as a result of which, this scholarly reserve is popularly referred to as big scholarly data. In order to facilitate data analytics for big scholarly data, architectures and services for the same need to be developed. The evolving nature of research problems has made them essentially interdisciplinary. As a result, there is a growing demand for scholarly applications like collaborator discovery, expert finding and research recommendation systems, in addition to several others. This research paper investigates the current trends and identifies the existing challenges in development of a big scholarly data platform, with specific focus on directions for future research and maps them to the different phases of the big data lifecycle.

104 citations


Journal ArticleDOI
TL;DR: In this article, a behavioral biometric signature-based authentication mechanism is proposed to ensure the security of e-medical data access in cloud-based healthcare management system, which achieves high accuracy rate for secure data access and retrieval.

63 citations


Posted Content
TL;DR: BAMHealthCloud as discussed by the authors is a cloud based system for management of healthcare data, it ensures security of data through biometric authentication It has been developed after performing a detailed case study on healthcare sector in a developing country Training of the signature samples for authentication purpose has been performed in parallel on hadoop MapReduce framework using Resilient Backpropagation neural network.
Abstract: Advancements in healthcare industry with new technology and population growth has given rise to security threat to our most personal data The healthcare data management system consists of records in different formats such as text, numeric, pictures and videos leading to data which is big and unstructured Also, hospitals have several branches at different locations throughout a country and overseas In view of these requirements a cloud based healthcare management system can be an effective solution for efficient health care data management One of the major concerns of a cloud based healthcare system is the security aspect It includes theft to identity, tax fraudulence, insurance frauds, medical frauds and defamation of high profile patients Hence, a secure data access and retrieval is needed in order to provide security of critical medical records in health care management system Biometric authentication mechanism is suitable in this scenario since it overcomes the limitations of token theft and forgetting passwords in conventional token id-password mechanism used for providing security It also has high accuracy rate for secure data access and retrieval In this paper we propose BAMHealthCloud which is a cloud based system for management of healthcare data, it ensures security of data through biometric authentication It has been developed after performing a detailed case study on healthcare sector in a developing country Training of the signature samples for authentication purpose has been performed in parallel on hadoop MapReduce framework using Resilient Backpropagation neural network From rigorous experiments it can be concluded that it achieves a speedup of 9x, Equal error rate (EER) of 012, sensitivity of 098 and specificity of 095 as compared to other approaches existing in literature

32 citations


Posted Content
TL;DR: This chapter surveys the big data concept, discusses the mathematical and data analytics techniques that can be used for big data and gives taxonomy of the existing tools, frameworks and platforms available for different big data computing models.
Abstract: The excessive amounts of data generated by devices and Internet-based sources at a regular basis constitute, big data. This data can be processed and analyzed to develop useful applications for specific domains. Several mathematical and data analytics techniques have found use in this sphere. This has given rise to the development of computing models and tools for big data computing. However, the storage and processing requirements are overwhelming for traditional systems and technologies. Therefore, there is a need for infrastructures that can adjust the storage and processing capability in accordance with the changing data dimensions. Cloud Computing serves as a potential solution to this problem. However, big data computing in the cloud has its own set of challenges and research issues. This chapter surveys the big data concept, discusses the mathematical and data analytics techniques that can be used for big data and gives taxonomy of the existing tools, frameworks and platforms available for different big data computing models. Besides this, it also evaluates the viability of cloud-based big data computing, examines existing challenges and opportunities, and provides future research directions in this field.

19 citations


Journal ArticleDOI
TL;DR: This paper proposes a framework that can analyze twitter data and classify tweets on some specific subject to generate trends and illustrates the use of framework by analyzing the tweets on “Politics” domain as a subject.
Abstract: Increasing popularity of social networking sites like facebook, twitter, google+ etc. is contributing in fast proliferation of big data. Amongst social Networking sites, twitter is one of the most common source of big data where people from across the world share their views on various topics and subjects. With daily Active user count of 100-million+ users twitter is becoming a rich information source for finding trends and current happenings around the world. Twitter does provide a limited “trends” feature. To make twitter trends more interesting and informative, in this paper we propose a framework that can analyze twitter data and classify tweets on some specific subject to generate trends. We illustrate the use of framework by analyzing the tweets on “Politics” domain as a subject. In order to classify tweets we propose a tweet classification algorithm that efficiently classify the tweets.

18 citations


Proceedings ArticleDOI
05 May 2017
TL;DR: The prevalent Hadoop framework, Hive, No SQL, New SQL, MapReduce and HBase for addressing the biggest challenge of Bigdata i.e Data Analytic has been analyzed and compared.
Abstract: Recent technological advancements in typical domains (e.g. internet, financial companies, health care, user generated data, supply chain systems etc.) have directed to inundate of data from these domains. Data outburst trend gave the insight meaning to the buzz word ‘Bigdata’. If we compare with traditional data, Bigdata exhibits some unique characteristics like it is commonly enormous and unstructured type of data that cannot be handled using traditional databases. Hence new system designs are required for the following processes i.e. data collection, data transmission, storage, and large-scale data processing mechanisms. The definition of Bigdata has been presented from many aspects in this paper. We analyzed Bigdata system architecture and various challenges of Bigdata. The prevalent Hadoop framework, Hive, No SQL, New SQL, MapReduce and HBase for addressing the biggest challenge of Bigdata i.e Data Analytic has also been analyzed and compared.

6 citations



Posted Content
TL;DR: A quality framework for higher education that evaluates the performance of institutions on the basis of performance of outgoing students is proposed and cloud-based big data technology, BigQuery, was used with R to perform analytics.
Abstract: This research paper proposes a quality framework for higher education that evaluates the performance of institutions on the basis of performance of outgoing students. Literature was surveyed to evaluate existing quality frameworks and develop a framework that provides insights on an unexplored dimension of quality. In order to implement and test the framework, cloud-based big data technology, BigQuery, was used with R to perform analytics. It was found that how the students fair after passing out of a course is the outcome of educational process. This aspect can also be used as a quality metric for performance evaluation and management of educational organizations. However, it has not been taken into account in existing research. The lack of an integrated data collection system and rich datasets for educational intelligence applications, are some of the limitations that plague this area of research. Educational organizations are responsible for the performance of their students even after they complete their course. The inclusion of this dimension to quality assessment shall allow evaluation of educational institutions on these grounds. Assurance of this quality dimension shall boost enrolments in postgraduate and research degrees. Moreover, educational institutions will be motivated to groom students for placements or higher studies.

4 citations


Journal ArticleDOI
TL;DR: E-GENMR provides interoperability as it takes queries written in various RDBMS forms like SQL Server, ORACLE, DB2, MYSQL and convert into MapReduce codes as they are considered to be the efficient way for processing large data.
Abstract: Big Data, Cloud computing and Data Science is the booming future of IT industries. The common thing among all the new techniques is that they deal with not just Data but Big Data. Users store various kinds of data on cloud repositories. Cloud Database Management System deals with these large sets of data. Cloud Database service provider deals with many obstacles while providing various service. Amongst all the challenges processing of large amount of data, interoperability and security are the major concerns that are explained in this study. Enhanced Generalized Query Processing through MapReduce (E-GENMR) is a prototype model that provides solution for these problems. Firstly, traditional approaches are not suitable for processing such gigantic amount of data as they are not able to handle such amount of data. Various solutions have been developed such as Hadoop, MapReduce Programming codes, HIVE, PIG etc. but these technologies don't provide solution for these problems at the same time and moreover users are not compatible with these latest technologies like MapReduce codes. E-GENMR provides interoperability as it takes queries written in various RDBMS forms like SQL Server, ORACLE, DB2, MYSQL and convert into MapReduce codes as they are considered to be the efficient way for processing large data. Secondly, Client's data is stored in encrypted form and processing is done on this data hence it ensures the security aspect. Indexing plays a very important role in processing queries, in E-GENMR indexing is implemented using closed double hashing technique. We compared various query processing time of E-GENMR for encrypted data and unencrypted data. A comparison of various queries has been done to evaluate the performance of E-GENMR with latest techniques like Hadoopdb, SQLMR, HIVE and PIG and it has been concluded that E-GENMR shows better performance.

3 citations


Posted Content
TL;DR: This research investigates the efficacy of workflow-based big data analytics in the cloud environment, giving insights on the research already performed in the area and possible future research directions in the field.
Abstract: Workflow is a common term used to describe a systematic breakdown of tasks that need to be performed to solve a problem. This concept has found best use in scientific and business applications for streamlining and improving the performance of the underlying processes targeted towards achieving an outcome. The growing complexity of big data analytical problems has invited the use of scientific workflows for performing complex tasks for specific domain applications. This research investigates the efficacy of workflow-based big data analytics in the cloud environment, giving insights on the research already performed in the area and possible future research directions in the field.

3 citations


Posted Content
TL;DR: In this paper, an efficient management of data generated by an electron cryo microscopy (cryoEM) lab on a cloud-based environment was tested and shown that large data in order of terabytes could be efficiently reduced to its minimal essential self in a cost effective scalable manner.
Abstract: Cloud computing is a cost-effective way for start-up life sciences laboratories to store and manage their data. However, in many instances the data stored over the cloud could be redundant which makes cloud-based data management inefficient and costly because one has to pay for every byte of data stored over the cloud. Here, we tested efficient management of data generated by an electron cryo microscopy (cryoEM) lab on a cloud-based environment. The test data was obtained from cryoEM repository EMPIAR. All the images were subjected to an in-house parallelized version of principal component analysis. An efficient cloud-based MapReduce modality was used for parallelization. We showed that large data in order of terabytes could be efficiently reduced to its minimal essential self in a cost-effective scalable manner. Furthermore, on-spot instance on Amazon EC2 was shown to reduce costs by a margin of about 27 percent. This approach could be scaled to data of any large volume and type.

Journal ArticleDOI
TL;DR: A novel framework for doing analysis on Big Data is proposed and its implementation is shown by creating a „Twitter Mart‟ which is a compilation of subject specific tweets that address some of the challenges for industries engaged in analyzing subject specific data.
Abstract: The contemporary era of technological quest is buzzing with two words Big Data and Cloud Computing. Digital data is growing rapidly from Gigabytes (GBs), terabytes (TBs) to Petabytes (PBs), and thereby burgeoning data management challenges. Social networking sites like Twitter, Facebook, Google+ etc generate huge data chunks on daily basis. Among them, twitter masks as the largest source of publicly available mammoth data chunks intended for various objectives of research and development. In order to further research in this fast emerging area of managing Big Data, we propose a novel framework for doing analysis on Big Data and show its implementation by creating a „Twitter Mart‟ which is a compilation of subject specific tweets that address some of the challenges for industries engaged in analyzing subject specific data. In this paper, we adduce algorithms and an holistic model that aids in effective stockpiling and retrieving data in an efficient manner.

Posted Content
TL;DR: A five-layer model for cloud-based big data analytics that uses dew computing and edge computing concepts and an approach for creation of custom big data stack by selecting technologies on the basis of identified data and computing models for the application is presented.
Abstract: Big data analytics has gathered immense research attention lately because of its ability to harness useful information from heaps of data. Cloud computing has been adjudged as one of the best infrastructural solutions for implementation of big data analytics. This research paper proposes a five-layer model for cloud-based big data analytics that uses dew computing and edge computing concepts. Besides this, the paper also presents an approach for creation of custom big data stack by selecting technologies on the basis of identified data and computing models for the application