scispace - formally typeset
Search or ask a question
Author

Xiaogang Ma

Bio: Xiaogang Ma is an academic researcher from University of Idaho. The author has contributed to research in topics: Computer science & Ontology (information science). The author has an hindex of 16, co-authored 83 publications receiving 711 citations. Previous affiliations of Xiaogang Ma include China University of Geosciences (Wuhan) & University of Twente.


Papers
More filters
Journal ArticleDOI
TL;DR: A workflow and a few empirical case studies for Chinese word segmentation rules of the Conditional Random Fields model are presented, and the potential of leveraging natural language processing and knowledge graph technologies for geoscience is shown.

125 citations

Journal ArticleDOI
TL;DR: In this paper , the authors present work led by the NASA Earth Science Data Systems Working Groups and ESIP machine learning cluster to give a comprehensive overview of AI in Earth sciences and provide all the levels of AI practitioners in geosciences with an overall big picture.

54 citations

Journal ArticleDOI
TL;DR: This work proposes a locality-based MPS approach to reconstruct 3-D geological models on the basis of such 2-D cross sections (3DRCS), making3-D training images unnecessary and shows better performance in portraying anisotropy characteristics and in CPU cost.
Abstract: . Multiple-point statistics (MPS) has shown promise in representing complicated subsurface structures. For a practical three-dimensional (3-D) application, however, one of the critical issues is the difficulty in obtaining a credible 3-D training image. However, bidimensional (2-D) training images are often available because established workflows exist to derive 2-D sections from scattered boreholes and/or other samples. In this work, we propose a locality-based MPS approach to reconstruct 3-D geological models on the basis of such 2-D cross sections (3DRCS), making 3-D training images unnecessary. Only several local training subsections closer to the central uninformed node are used in the MPS simulation. The main advantages of this partitioned search strategy are the high computational efficiency and a relaxation of the stationarity assumption. We embed this strategy into a standard MPS framework. Two probability aggregation formulas and their combinations are used to assemble the probability density functions (PDFs) from different subsections. Moreover, a novel strategy is adopted to capture more stable PDFs, where the distances between patterns and flexible neighborhoods are integrated on multiple grids. A series of sensitivity analyses demonstrate the stability of the proposed approach. Several hydrogeological 3-D application examples illustrate the applicability of the 3DRCS approach in reproducing complex geological features. The results, in comparison with previous MPS methods, show better performance in portraying anisotropy characteristics and in CPU cost.

50 citations

Proceedings ArticleDOI
Han Wang1, Jin Guang Zheng1, Xiaogang Ma1, Peter Fox1, Heng Ji1 
01 Jan 2015
TL;DR: A novel unsupervised algorithm named Quantified Collective Validation is presented that avoids excessive linguistic analysis on the source documents and fully leverages the knowledge base structure for the entity linking task.
Abstract: Linking named mentions detected in a source document to an existing knowledge base provides disambiguated entity referents for the mentions. This allows better document analysis, knowledge extraction and knowledge base population. Most of the previous research extensively exploited the linguistic features of the source documents in a supervised or semi-supervised way. These systems therefore cannot be easily applied to a new language or domain. In this paper, we present a novel unsupervised algorithm named Quantified Collective Validation that avoids excessive linguistic analysis on the source documents and fully leverages the knowledge base structure for the entity linking task. We show our approach achieves stateof-the-art English entity linking performance and demonstrate successful deployment in a new language (Chinese) and two new domains (Biomedical and Earth Science). Experiment datasets and system demonstration are available at http://tw.rpi.edu/web/doc/ hanwang_emnlp_2015 for research purpose.

45 citations

Journal ArticleDOI
29 Dec 2020
TL;DR: In this paper, the authors present a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on Local Outlier Factor (LOF), a density-based technique.
Abstract: Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on LOF algorithms. It collects and categorizes existing local outlier detection algorithms and analyzes their characteristics. Furthermore, the paper discusses the advantages and limitations of those algorithms and proposes several promising directions for developing improved local outlier detection methods for data streams.

44 citations


Cited by
More filters
01 Mar 1999

3,234 citations

Posted Content
TL;DR: Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.
Abstract: This is a thought piece on data-intensive science requirements for databases and science centers. It argues that peta-scale datasets will be housed by science centers that provide substantial storage and processing for scientists who access the data via smart notebooks. Next-generation science instruments and simulations will generate these peta-scale datasets. The need to publish and share data and the need for generic analysis and visualization tools will finally create a convergence on common metadata standards. Database systems will be judged by their support of these metadata standards and by their ability to manage and access peta-scale datasets. The procedural stream-of-bytes-file-centric approach to data analysis is both too cumbersome and too serial for such large datasets. Non-procedural query and analysis of schematized self-describing data is both easier to use and allows much more parallelism.

476 citations

Journal Article
TL;DR: In this article, the authors proposed a measure on local outliers based on a symmetric neighborhood relationship, which considers both neighbors and reverse neighbors of an object when estimating its density distribution.
Abstract: Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

321 citations

01 Jan 2016
TL;DR: The foundations of the theory of signs is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can get it instantly.
Abstract: Thank you very much for downloading foundations of the theory of signs. Maybe you have knowledge that, people have look numerous times for their chosen readings like this foundations of the theory of signs, but end up in infectious downloads. Rather than enjoying a good book with a cup of tea in the afternoon, instead they are facing with some infectious virus inside their computer. foundations of the theory of signs is available in our book collection an online access to it is set as public so you can get it instantly. Our books collection hosts in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the foundations of the theory of signs is universally compatible with any devices to read.

283 citations