scispace - formally typeset
Search or ask a question
Author

Fei Wu

Other affiliations: BioMérieux, Google, Wuhan University  ...read more
Bio: Fei Wu is an academic researcher from Nanjing University of Posts and Telecommunications. The author has contributed to research in topics: Computer science & Discriminative model. The author has an hindex of 57, co-authored 472 publications receiving 12266 citations. Previous affiliations of Fei Wu include BioMérieux & Google.


Papers
More filters
Proceedings Article
11 Jul 2010
TL;DR: WOE is presented, an open IE system which improves dramatically on TextRunner's precision and recall and is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data.
Abstract: Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner's precision and recall. The key to WOE's performance is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE's extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE can operate in two modes: when restricted to POS tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.

634 citations

Proceedings ArticleDOI
01 Aug 2017
TL;DR: In this article, an attentional factorization machine (AFM) is proposed to learn the importance of each feature interaction from data via a neural attention network, which outperforms Wide&Deep and DeepCross with a much simpler structure and fewer model parameters.
Abstract: Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions. Despite effectiveness, FM can be hindered by its modelling of all feature interactions with the same weight, as not all feature interactions are equally useful and predictive. For example, the interactions with useless features may even introduce noises and adversely degrade the performance. In this work, we improve FM by discriminating the importance of different feature interactions. We propose a novel model named Attentional Factorization Machine (AFM), which learns the importance of each feature interaction from data via a neural attention network. Extensive experiments on two real-world datasets demonstrate the effectiveness of AFM. Empirically, it is shown on regression task AFM betters FM with a $8.6\%$ relative improvement, and consistently outperforms the state-of-the-art deep learning methods Wide&Deep and DeepCross with a much simpler structure and fewer model parameters. Our implementation of AFM is publicly available at: this https URL

583 citations

Journal ArticleDOI
04 Jan 2018-Nature
TL;DR: It is shown that PD-L1 protein abundance is regulated by cyclin D–CDK4 and the cullin 3–SPOP E3 ligase via proteasome-mediated degradation, which reveals the potential for using combination treatment with CDK4/6 inhibitors and PD-1–PD-L 1 immune checkpoint blockade to enhance therapeutic efficacy for human cancers.
Abstract: Treatments that target immune checkpoints, such as the one mediated by programmed cell death protein 1 (PD-1) and its ligand PD-L1, have been approved for treating human cancers with durable clinical benefit. However, many patients with cancer fail to respond to compounds that target the PD-1 and PD-L1 interaction, and the underlying mechanism(s) is not well understood. Recent studies revealed that response to PD-1-PD-L1 blockade might correlate with PD-L1 expression levels in tumour cells. Hence, it is important to understand the mechanistic pathways that control PD-L1 protein expression and stability, which can offer a molecular basis to improve the clinical response rate and efficacy of PD-1-PD-L1 blockade in patients with cancer. Here we show that PD-L1 protein abundance is regulated by cyclin D-CDK4 and the cullin 3-SPOP E3 ligase via proteasome-mediated degradation. Inhibition of CDK4 and CDK6 (hereafter CDK4/6) in vivo increases PD-L1 protein levels by impeding cyclin D-CDK4-mediated phosphorylation of speckle-type POZ protein (SPOP) and thereby promoting SPOP degradation by the anaphase-promoting complex activator FZR1. Loss-of-function mutations in SPOP compromise ubiquitination-mediated PD-L1 degradation, leading to increased PD-L1 levels and reduced numbers of tumour-infiltrating lymphocytes in mouse tumours and in primary human prostate cancer specimens. Notably, combining CDK4/6 inhibitor treatment with anti-PD-1 immunotherapy enhances tumour regression and markedly improves overall survival rates in mouse tumour models. Our study uncovers a novel molecular mechanism for regulating PD-L1 protein stability by a cell cycle kinase and reveals the potential for using combination treatment with CDK4/6 inhibitors and PD-1-PD-L1 immune checkpoint blockade to enhance therapeutic efficacy for human cancers.

577 citations

Journal ArticleDOI
TL;DR: This paper proposes a multi-task deep saliency model based on a fully convolutional neural network with global input (whole raw images) and global output (Whole saliency maps) and presents a graph Laplacian regularized nonlinear regression model for saliency refinement.
Abstract: A key problem in salient object detection is how to effectively model the semantic properties of salient objects in a data-driven manner. In this paper, we propose a multi-task deep saliency model based on a fully convolutional neural network with global input (whole raw images) and global output (whole saliency maps). In principle, the proposed saliency model takes a data-driven strategy for encoding the underlying saliency prior information, and then sets up a multi-task learning scheme for exploring the intrinsic correlations between saliency detection and semantic image segmentation. Through collaborative feature learning from such two correlated tasks, the shared fully convolutional layers produce effective features for object perception. Moreover, it is capable of capturing the semantic information on salient objects across different levels using the fully convolutional layers, which investigate the feature-sharing properties of salient object detection with a great reduction of feature redundancy. Finally, we present a graph Laplacian regularized nonlinear regression model for saliency refinement. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.

497 citations

Proceedings ArticleDOI
06 Nov 2007
TL;DR: This paper argues that autonomously "Semantifying Wikipedia" is the best way to solve the chicken-and-egg problem, and describes a prototype implementation of a self-supervised, machine learning system which realizes the vision.
Abstract: Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously "Semantifying Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data source, because it is comprehensive, not too large, high-quality, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. We identify several types of structures which can be automatically enhanced in Wikipedia (e.g., link structure, taxonomic data, infoboxes, etc.), and we describea prototype implementation of a self-supervised, machine learning system which realizes our vision. Preliminary experiments demonstrate the high precision of our system's extracted data - in one case equaling that of humans.

413 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

01 Jan 2002

9,314 citations