Home
/
Authors
/
Tianyu Cao

Author

Tianyu Cao

Other affiliations: University of Vermont

Bio: Tianyu Cao is an academic researcher from Amazon.com. The author has contributed to research in topics: Computer science & Maximization. The author has an hindex of 6, co-authored 14 publications receiving 170 citations. Previous affiliations of Tianyu Cao include University of Vermont.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Subkilometer crater discovery with boosting and transfer learning

[...]

Wei Ding¹, Tomasz F. Stepinski², Yang Mu¹, Lourenço Bandeira³, Ricardo Ricardo⁴, Youxi Wu⁵, Zhenyu Lu⁵, Tianyu Cao⁵, Xindong Wu⁵ - Show less +5 more•Institutions (5)

University of Massachusetts Boston¹, University of Cincinnati², Instituto Superior Técnico³, University of Houston⁴, University of Vermont⁵

15 Jul 2011-ACM Transactions on Intelligent Systems and Technology

TL;DR: An integrated framework on autodetection of subkilometer craters with boosting and transfer learning that can achieve an F1 score above 0.85, a significant improvement over the other crater detection algorithms.

...read moreread less

Abstract: Counting craters in remotely sensed images is the only tool that provides relative dating of remote planetary surfaces. Surveying craters requires counting a large amount of small subkilometer craters, which calls for highly efficient automatic crater detection. In this article, we present an integrated framework on autodetection of subkilometer craters with boosting and transfer learning. The framework contains three key components. First, we utilize mathematical morphology to efficiently identify crater candidates, the regions of an image that can potentially contain craters. Only those regions occupying relatively small portions of the original image are the subjects of further processing. Second, we extract and select image texture features, in combination with supervised boosting ensemble learning algorithms, to accurately classify crater candidates into craters and noncraters. Third, we integrate transfer learning into boosting, to enhance detection performance in the regions where surface morphology differs from what is characterized by the training set. Our framework is evaluated on a large test image of 37,500 × 56,250 m2 on Mars, which exhibits a heavily cratered Martian terrain characterized by nonuniform surface morphology. Empirical studies demonstrate that the proposed crater detection framework can achieve an F1 score above 0.85, a significant improvement over the other crater detection algorithms.

...read moreread less

62 citations

Proceedings Article•DOI•

OASNET: an optimal allocation approach to influence maximization in modular social networks

[...]

Tianyu Cao¹, Xindong Wu², Song Wang¹, Xiaohua Hu³•Institutions (3)

University of Vermont¹, Hefei University of Technology², Drexel University³

22 Mar 2010

TL;DR: It is proved that finding an optimal allocation in a modular social network is NP-hard and a new optimal dynamic programming algorithm is proposed to solve the problem, which is named OASNET (Optimal Allocation in a Social NETwork).

...read moreread less

Abstract: Influence maximization in a social network is to target a given number of nodes in the network such that the expected number of activated nodes from these nodes is maximized. A social network usually exhibits some degree of modularity. Previous research efforts that made use of this topological property are restricted to random networks with two communities. In this paper, we firstly transform the influence maximization problem in a modular social network to an optimal resource allocation problem in the same network. We assume that the communities of the social network are disconnected. We then propose a recursive relation for finding such an optimal allocation. We prove that finding an optimal allocation in a modular social network is NP-hard and propose a new optimal dynamic programming algorithm to solve the problem. We name our new algorithm OASNET (Optimal Allocation in a Social NETwork). We compare OASNET with equal allocation, proportional allocation, random allocation and selecting top degree nodes without any allocation strategy on both synthetic and real world datasets. Experimental results show that OASNET outperforms these four heuristics.

...read moreread less

44 citations

Journal Article•DOI•

Maximizing influence spread in modular social networks by optimal resource allocation

[...]

Tianyu Cao¹, Xindong Wu², Song Wang¹, Xiaohua Hu³•Institutions (3)

University of Vermont¹, Hefei University of Technology², University of Pennsylvania³

01 Sep 2011-Expert Systems With Applications

TL;DR: It is proved that finding an optimal allocation in a modular social network is NP-hard and a new dynamic programming algorithm is proposed to solve the problem, which is named OASNET (Optimal Allocation in a Social NETwork).

...read moreread less

Abstract: Influence maximization in a social network is to target a given number of nodes in the network such that the expected number of activated nodes from these nodes is maximized. A social network usually exhibits some degree of modularity. Previous research efforts that made use of this topological property are restricted to random networks with two communities. In this paper, we firstly transform the influence maximization problem in a modular social network to an optimal resource allocation problem. We assume that the communities of the social network are disconnected. We then propose a recursive relation for finding such an optimal allocation. We prove that finding an optimal allocation in a modular social network is NP-hard and propose a new dynamic programming algorithm to solve the problem. We name our new algorithm OASNET (Optimal Allocation in a Social NETwork). We compare OASNET with the high degree heuristics, the single degree discount heuristics, and the degree discount heuristics on three real world datasets. Experimental results show that OASNET outperforms comparison heuristics significantly on the independent cascade model when the diffusion probability is greater than a certain threshold.

...read moreread less

34 citations

Proceedings Article•DOI•

Multilingual Knowledge Graph Completion with Self-Supervised Adaptive Graph Alignment

[...]

Zijie Huang, Zheng Li, Hao Jiang, Tianyu Cao, Hanqing Lu, Bing Yin, Karthikeyan Subbian, Yizhou Sun, Wei Wang - Show less +5 more

28 Mar 2022

TL;DR: This paper explores multilingual KG completion, which leverages limited seed alignment as a bridge, to embrace the collective knowledge from multiple languages and proposes a novel self-supervised adaptive graph alignment (SS-AGA) method, which fuses all KGs as a whole graph by regarding alignment as an new edge type.

...read moreread less

Abstract: Predicting missing facts in a knowledge graph (KG) is crucial as modern KGs are far from complete. Due to labor-intensive human labeling, this phenomenon deteriorates when handling knowledge represented in various languages. In this paper, we explore multilingual KG completion, which leverages limited seed alignment as a bridge, to embrace the collective knowledge from multiple languages. However, language alignment used in prior works is still not fully exploited: (1) alignment pairs are treated equally to maximally push parallel entities to be close, which ignores KG capacity inconsistency; (2) seed alignment is scarce and new alignment identification is usually in a noisily unsupervised manner. To tackle these issues, we propose a novel self-supervised adaptive graph alignment (SS-AGA) method. Specifically, SS-AGA fuses all KGs as a whole graph by regarding alignment as a new edge type. As such, information propagation and noise influence across KGs can be adaptively controlled via relation-aware attention weights. Meanwhile, SS-AGA features a new pair generator that dynamically captures potential alignment pairs in a self-supervised paradigm. Extensive experiments on both the public multilingual DBPedia KG and newly-created industrial multilingual E-commerce KG empirically demonstrate the effectiveness of SS-AGA

...read moreread less

20 citations

Book Chapter•DOI•

Active learning of model parameters for influence maximization

[...]

Tianyu Cao¹, Xindong Wu¹, Tony Hu², Song Wang¹•Institutions (2)

University of Vermont¹, Drexel University²

05 Sep 2011

TL;DR: Extensive experimental evaluations on five popular network datasets demonstrate that the proposed weighted sampling algorithm outperforms pure random sampling in terms of both model accuracy and the proposed objective function.

...read moreread less

Abstract: Previous research efforts on the influence maximization problem assume that the network model parameters are known beforehand. However, this is rarely true in real world networks. This paper deals with the situation when the network information diffusion parameters are unknown. To this end, we firstly examine the parameter sensitivity of a popular diffusion model in influence maximization, i.e., the linear threshold model, to motivate the necessity of learning the unknown model parameters. Experiments show that the influence maximization problem is sensitive to the model parameters under the linear threshold model. In the sequel, we formally define the problem of finding the model parameters for influence maximization as an active learning problem under the linear threshold model. We then propose a weighted sampling algorithm to solve this active learning problem. Extensive experimental evaluations on five popular network datasets demonstrate that the proposed weighted sampling algorithm outperforms pure random sampling in terms of both model accuracy and the proposed objective function.

...read moreread less

18 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning

[...]

Salvador García¹, Julián Luengo², José A. Sáez³, Victoria López³, Francisco Herrera³ - Show less +1 more•Institutions (3)

University of Jaén¹, University of Burgos², University of Granada³

01 Apr 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A survey of discretization methods can be found in this paper, where the main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data.

...read moreread less

Abstract: Discretization is an essential preprocessing technique used in many knowledge discovery and data mining tasks. Its main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data. In this manner, symbolic data mining algorithms can be applied over continuous data and the representation of information is simplified, making it more concise and specific. The literature provides numerous proposals of discretization and some attempts to categorize them into a taxonomy can be found. However, in previous papers, there is a lack of consensus in the definition of the properties and no formal categorization has been established yet, which may be confusing for practitioners. Furthermore, only a small set of discretizers have been widely considered, while many other methods have gone unnoticed. With the intention of alleviating these problems, this paper provides a survey of discretization methods proposed in the literature from a theoretical and empirical perspective. From the theoretical perspective, we develop a taxonomy based on the main properties pointed out in previous research, unifying the notation and including all the known methods up to date. Empirically, we conduct an experimental study in supervised classification involving the most representative and newest discretizers, different types of classifiers, and a large number of data sets. The results of their performances measured in terms of accuracy, number of intervals, and inconsistency have been verified by means of nonparametric statistical tests. Additionally, a set of discretizers are highlighted as the best performing ones.

...read moreread less

419 citations

Journal Article•DOI•

Online Feature Selection with Streaming Features

[...]

Xindong Wu¹, Kui Yu¹, Wei Ding², Hao Wang¹, Xingquan Zhu³ - Show less +1 more•Institutions (3)

Hefei University of Technology¹, University of Massachusetts Boston², University of Technology, Sydney³

01 May 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly and an efficient Fast-OSFS algorithm is proposed to improve feature selection performance.

...read moreread less

Abstract: We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

...read moreread less

240 citations

Journal Article•DOI•

CoFIM: A community-based framework for influence maximization on large-scale networks

[...]

Jiaxing Shang¹, Shangbo Zhou¹, Xin Li², Lianchen Liu³, Hongchun Wu¹ - Show less +1 more•Institutions (3)

Chongqing University¹, City University of Hong Kong², Tsinghua University³

01 Feb 2017-Knowledge Based Systems

TL;DR: CoFIM is proposed, a community-based framework for influence maximization on large-scale networks that derives a simple evaluation form of the total influence spread which is submodular and can be efficiently computed and a fast algorithm to select the seed nodes.

...read moreread less

Abstract: Influence maximization is a classic optimization problem studied in the area of social network analysis and viral marketing. Given a network, it is defined as the problem of finding k seed nodes so that the influence spread of the network can be optimized. Kempe et al. have proved that this problem is NP hard and the objective function is submodular, based on which a greedy algorithm was proposed to give a near-optimal solution. However, this simple greedy algorithm is time consuming, which limits its application on large-scale networks. Heuristic algorithms generally cannot provide any performance guarantee. To solve this problem, in this paper we propose CoFIM, a community-based framework for influence maximization on large-scale networks. In our framework the influence propagation process is divided into two phases: (i) seeds expansion; and (ii) intra-community propagation. The first phase is the expansion of seed nodes among different communities at the beginning of diffusion. The second phase is the influence propagation within communities which are independent of each other. Based on the framework, we derive a simple evaluation form of the total influence spread which is submodular and can be efficiently computed. Then we further propose a fast algorithm to select the seed nodes. Experimental results on synthetic and nine real-world large datasets including networks with millions of nodes and hundreds of millions of edges show that our algorithm achieves competitive results in influence spread as compared with state-of-the-art algorithms and it is much more efficient in terms of both time and memory usage.

...read moreread less

139 citations

Journal Article•DOI•

Streaming Feature Selection for Multilabel Learning Based on Fuzzy Mutual Information

[...]

Yaojin Lin¹, Qinghua Hu², Jinghua Liu¹, Jinjin Li¹, Xindong Wu³ - Show less +1 more•Institutions (3)

Zhangzhou Normal University¹, Tianjin University², University of Louisiana at Lafayette³

03 Aug 2017-IEEE Transactions on Fuzzy Systems

TL;DR: This paper introduces fuzzy mutual information to evaluate the quality of features in multilabel learning, and design efficient algorithms to conduct multILabel feature selection when the feature space is completely known or partially known in advance.

...read moreread less

Abstract: Due to complex semantics, a sample may be associated with multiple labels in various classification and recognition tasks. Multilabel learning generates training models to map feature vectors to multiple labels. There are several significant challenges in multilabel learning. Samples in multilabel learning are usually described with high-dimensional features and some features may be sequentially extracted. Thus, we do not know the full feature set at the beginning of learning, referred to as streaming features. In this paper, we introduce fuzzy mutual information to evaluate the quality of features in multilabel learning, and design efficient algorithms to conduct multilabel feature selection when the feature space is completely known or partially known in advance. These algorithms are called multilabel feature selection with label correlation (MUCO) and multilabel streaming feature selection (MSFS), respectively. MSFS consists of two key steps: online relevance analysis and online redundancy analysis. In addition, we design a metric to measure the correlation between the label sets, and both MUCO and MSFS take label correlation to consideration. The proposed algorithms are not only able to select features from streaming features, but also able to select features for ordinal multilabel learning. However streaming feature selection is more efficient. The proposed algorithms are tested with a collection of multilabel learning tasks. The experimental results illustrate the effectiveness of the proposed algorithms.

...read moreread less

121 citations

Journal Article•DOI•

Online feature selection for high-dimensional class-imbalanced data

[...]

Peng Zhou¹, Xuegang Hu¹, Peipei Li¹, Xindong Wu²•Institutions (2)

Hefei University of Technology¹, University of Louisiana at Lafayette²

15 Nov 2017-Knowledge Based Systems

TL;DR: This work formalizes the problem of online streaming feature selection for class imbalanced data, and presents an efficient online feature selection framework regarding the dependency between condition features and decision classes, and proposes a new algorithm of Online Feature Selection based on the Dependency in K nearest neighbors, called K-OFSD.

...read moreread less

Abstract: When tackling high dimensionality in data mining, online feature selection which deals with features flowing in one by one over time, presents more advantages than traditional feature selection methods. However, in real-world applications, such as fraud detection and medical diagnosis, the data is high-dimensional and highly class imbalanced, namely there are many more instances of some classes than others. In such cases of class imbalance, existing online feature selection algorithms usually ignore the small classes which can be important in these applications. It is hence a challenge to learn from high-dimensional and class imbalanced data in an online manner. Motivated by this, we first formalize the problem of online streaming feature selection for class imbalanced data, and then present an efficient online feature selection framework regarding the dependency between condition features and decision classes. Meanwhile, we propose a new algorithm of Online Feature Selection based on the Dependency in K nearest neighbors, called K-OFSD. In terms of Neighborhood Rough Set theory, K-OFSD uses the information of nearest neighbors to select relevant features which can get higher separability between the majority class and the minority class. Finally, experimental studies on seven high-dimensional and class imbalanced data sets show that our algorithm can achieve better performance than traditional feature selection methods with the same numbers of features and state-of-the-art online streaming feature selection algorithms in an online manner.

...read moreread less

113 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Collapse