scispace - formally typeset
Search or ask a question
Author

M. Varalakshmi

Bio: M. Varalakshmi is an academic researcher from VIT University. The author has contributed to research in topics: Embarrassingly parallel & Atmospheric model. The author has an hindex of 1, co-authored 4 publications receiving 3 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A two-bit geohash coding algorithm that divides the search space into four equal partitions where each partition is assigned a two‐bit label as 00, 01, 10, and 11, which helps to uniquely identify a chosen data point and the two neighbors on its either side, taken along a particular dimension.
Abstract: Insights from geohash coding algorithms introduce significant opportunities for various spatial applications. However, these algorithms require massive storage, complex bit manipulation, and extensive code modification when scaled to higher dimensions. In this article, we have developed a two‐bit geohash coding algorithm that divides the search space into four equal partitions where each partition is assigned a two‐bit label as 00, 01, 10, and 11, which helps to uniquely identify a chosen data point and the two neighbors on its either side, taken along a particular dimension. This salient feature of the algorithm simplifies the generation of geohash code for the neighboring grid cells. In addition, it achieves efficient memory utilization by storing the geohash values of the training points as integers. Demonstrated by experiments for climate data assimilation, model‐to‐observation space mapping with a geohash code length of 24 bits for Lat‐Lon extent of India has shown favorable results with an accuracy of 85%. Performance and scalability evaluation of the proposed algorithm, optimized for multicore and many‐core processors has shown significant speedups outperforming a tree‐based approach. This algorithm provides a foundation for new spatial statistical methods that can be used for pattern discovery and detection in spatial big data.

3 citations

Journal ArticleDOI
01 Jan 2018
TL;DR: Attempts to implement implementation of Matrix inversion techniques, as well as results, are shown.
Abstract: Attempts to harness the big climate data that come from high-resolution model output and advanced sensors to provide more accurate and rapidly-updated weather prediction, call for innovations in the existing data assimilation systems. Matrix inversion is a key operation in a majority of data assimilation techniques. Hence, this article presents out-of-core CUDA implementation of an iterative method of matrix inversion. The results show significant speed up for even square matrices of size 1024 X 1024 and more, without sacrificing the accuracy of the results. In a similar test environment, the comparison of this approach with a direct method such as the Gauss-Jordan approach, modified to process large matrices that cannot be processed directly within a single kernel call shows that the former is twice as efficient as the latter. This acceleration is attributed to the division-free design and the embarrassingly parallel nature of every sub-task of the algorithm. The parallel algorithm has been designed to be highly scalable when implemented with multiple GPUs for handling large matrices. KEywoRDS Big Climate Data, Convergence Rate, GPU, Iterative Method, Matrix Type Identification, Numerical Weather Prediction, Parallel Matrix Inverse, Parallel Reduction

3 citations

Journal Article
TL;DR: In this article, a comparative study of various parallel programming models for a compute intensive application pertaining to atmospheric modeling is made, which deals with predicting the behavior of atmosphere through mathematical equations governing the atmospheric fluid flows The mathematical equations are nonlinear partial differential equations which are difficult to solve analytically.
Abstract: This study aims at making a comparative study of various parallel programming models for a compute intensive application pertaining to Atmospheric modeling Atmospheric modeling deals with predicting the behavior of atmosphere through mathematical equations governing the atmospheric fluid flows The mathematical equations are nonlinear partial differential equations which are difficult to solve analytically Thus fundamental governing equations of atmospheric motion are discretized into algebraic forms that are solved using numerical methods to obtain flow-field values at discrete points in time and/or space Solving these equations often requires huge computational resource, which is normally available with high-speed supercomputers Shallow Water equations provide a useful framework for the analysis of dynamics of large-scale atmospheric flow and for the analysis of various numerical methods that might be applied to the solution of these equations In this study, Finite volume approach has been used for discretizing these equations that leads to a number of algebraic equations equal to the number of time instants at which the flow field values are to be evaluated It is apparent that the application is embarrassingly parallel and its parallelization will suppress communication overhead A High Performance Compute cluster has been employed for solving the equations involved in atmospheric modeling Use of OpenMP and MPI APIs has paved the way to study the behavior of shared memory programming model and the message passing programming model in the context of such a highly compute intensive application It is observed that no additional benefit can be enjoyed by creating too many software threads than the available hardware threads, as the execution resources should be shared among them
01 Jan 2015
TL;DR: A comparative study of various parallel programming models for a compute intensive application pertaining to Atmospheric modeling and it is observed that no additional benefit can be enjoyed by creating too many software threads than the available hardware threads, as the execution resources should be shared among them.
Abstract: This study aims at making a comparative study of various parallel programming models for a compute intensive application pertaining to Atmospheric modeling. Atmospheric modeling deals with predicting the behavior of atmosphere through mathematical equations governing the atmospheric fluid flows. The mathematical equations are nonlinear partial differential equations which are difficult to solve analytically. Thus fundamental governing equations of atmospheric motion are discretized into algebraic forms that are solved using numerical methods to obtain flow field values at discrete points in time and/or space. Solving these equations often requires huge computational resource, which is normally available with high speed supercomputers. Shallow Water equations provide a useful framework for the analysis of dynamics of large scale atmospheric flow and for the analysis of various numerical methods that might be applied to the solution of these equations. In this study, Finite volume approach has been used for discretizing these equations that leads to a number of algebraic equations equal to the number of time instants at which the flow field values are to be evaluated. It is apparent that the application is embarrassingly parallel and its parallelization will suppress communication overhead. A High Performance Compute cluster has been employed for solving the equations involved in atmospheric modeling. Use of OpenMP and MPI APIs has paved the way to study the behavior of shared memory programming model and the message passing programming model in the context of such a highly compute intensive application. It is observed that no additional benefit can be enjoyed by creating too many software threads than the available hardware threads, as the execution resources should be shared among them.

Cited by
More filters
Journal ArticleDOI
TL;DR: In this article , a bibliometric analysis of 247 published papers on predicting infectious diseases in Africa, published in the Web of Science core collection databases, is presented in order to reveal the research trends, gaps, and hotspots in predicting Africa's infectious diseases using bibliometrics tools.
Abstract: Africa has a long history of novel and re-emerging infectious disease outbreaks. This reality has attracted the attention of researchers interested in the general research theme of predicting infectious diseases. However, a knowledge mapping analysis of literature to reveal the research trends, gaps, and hotspots in predicting Africa’s infectious diseases using bibliometric tools has not been conducted. A bibliometric analysis of 247 published papers on predicting infectious diseases in Africa, published in the Web of Science core collection databases, is presented in this study. The results indicate that the severe outbreaks of infectious diseases in Africa have increased scientific publications during the past decade. The results also reveal that African researchers are highly underrepresented in these publications and that the United States of America (USA) is the most productive and collaborative country. The relevant hotspots in this research field include malaria, models, classification, associations, COVID-19, and cost-effectiveness. Furthermore, weather-based prediction using meteorological factors is an emerging theme, and very few studies have used the fourth industrial revolution (4IR) technologies. Therefore, there is a need to explore 4IR predicting tools such as machine learning and consider integrated approaches that are pivotal to developing robust prediction systems for infectious diseases, especially in Africa. This review paper provides a useful resource for researchers, practitioners, and research funding agencies interested in the research theme—the prediction of infectious diseases in Africa—by capturing the current research hotspots and trends.

7 citations

Journal ArticleDOI
TL;DR: In this article, a new QRB-tree index with two levels of optimization is proposed to accelerate the loading and search steps of a query region for the grids within the query region.
Abstract: Support for region queries is crucial in geographic information systems, which process exact queries through spatial indexing to filter features and subsequently refine the selection. Although the filtering step has been extensively studied, the refinement step has received little attention. This research builds upon the QR-tree index, which decomposes space into hierarchical grids, registers features to the grids, and builds an R-tree for each grid, to develop a new QRB-tree index with two levels of optimization. In the first level, a bucket is introduced in every grid in the QR-tree index to accelerate the loading and search steps of a query region for the grids within the query region. In the second level, the number of candidate features to be eliminated is reduced by limiting the features to those registered to the grids covering the corners of the query region. Subsequently, an approach for determining the maximal grid level, which significantly affects the performance of the QR-tree index, is proposed. Direct comparisons of time costs with the QR-tree index and geohash index show that the QRB-tree index outperforms the other two approaches for rough queries in large query regions and exact queries in all cases.

3 citations

Journal ArticleDOI
TL;DR: In this article , a scale-elastic grid structure (SEGS) for 3D discrete grid systems is proposed to satisfy the anisotropic scale requirements of voxels in three dimensions.

2 citations

Posted Content
TL;DR: In this article, a new risk measure is proposed to quantify the model's tendency to overfit, which is a good proxy for the true generalization risk and can be used to train the model and calibrate the hyperparameters.
Abstract: Generalization is a central problem in Machine Learning. Indeed most prediction methods require careful calibration of hyperparameters usually carried out on a hold-out \textit{validation} dataset to achieve generalization. The main goal of this paper is to introduce a novel approach to achieve generalization without any data splitting, which is based on a new risk measure which directly quantifies a model's tendency to overfit. To fully understand the intuition and advantages of this new approach, we illustrate it in the simple linear regression model ($Y=X\beta+\xi$) where we develop a new criterion. We highlight how this criterion is a good proxy for the true generalization risk. Next, we derive different procedures which tackle several structures simultaneously (correlation, sparsity,...). Noticeably, these procedures \textbf{concomitantly} train the model and calibrate the hyperparameters. In addition, these procedures can be implemented via classical gradient descent methods when the criterion is differentiable w.r.t. the hyperparameters. Our numerical experiments reveal that our procedures are computationally feasible and compare favorably to the popular approach (Ridge, LASSO and Elastic-Net combined with grid-search cross-validation) in term of generalization. They also outperform the baseline on two additional tasks: estimation and support recovery of $\beta$. Moreover, our procedures do not require any expertise for the calibration of the initial parameters which remain the same for all the datasets we experimented on.

1 citations

Posted Content
TL;DR: This article proposed a differentiable close-form regularization scheme on the last hidden layer during training, which penalizes memorization through the generation of uninformative labels and the application of a differential regularization.
Abstract: Deep Learning (DL) is considered the state-of-the-art in computer vision, speech recognition and natural language processing. Until recently, it was also widely accepted that DL is irrelevant for learning tasks on tabular data, especially in the small sample regime where ensemble methods are acknowledged as the gold standard. We present a new end-to-end differentiable method to train a standard FFNN. Our method, \textbf{Muddling labels for Regularization} (\texttt{MLR}), penalizes memorization through the generation of uninformative labels and the application of a differentiable close-form regularization scheme on the last hidden layer during training. \texttt{MLR} outperforms classical NN and the gold standard (GBDT, RF) for regression and classification tasks on several datasets from the UCI database and Kaggle covering a large range of sample sizes and feature to sample ratios. Researchers and practitioners can use \texttt{MLR} on its own as an off-the-shelf \DL{} solution or integrate it into the most advanced ML pipelines.