scispace - formally typeset
Search or ask a question
Author

Ji Wu

Bio: Ji Wu is an academic researcher from Tsinghua University. The author has contributed to research in topics: Computer science & Germline. The author has an hindex of 23, co-authored 173 publications receiving 1908 citations. Previous affiliations of Ji Wu include Chinese Ministry of Education & Shanghai Jiao Tong University.


Papers
More filters
Journal ArticleDOI
TL;DR: Extensive experimental results on the AURORA2 corpus show that the DBN-based VAD not only outperforms eleven referenced VADs, but also can meet the real-time detection demand of VAD.
Abstract: Fusing the advantages of multiple acoustic features is important for the robustness of voice activity detection (VAD). Recently, the machine-learning-based VADs have shown a superiority to traditional VADs on multiple feature fusion tasks. However, existing machine-learning-based VADs only utilize shallow models, which cannot explore the underlying manifold of the features. In this paper, we propose to fuse multiple features via a deep model, called deep belief network (DBN). DBN is a powerful hierarchical generative model for feature extraction. It can describe highly variant functions and discover the manifold of the features. We take the multiple serially-concatenated features as the input layer of DBN, and then extract a new feature by transferring these features through multiple nonlinear hidden layers. Finally, we predict the class of the new feature by a linear classifier. We further analyze that even a single-hidden-layer-based belief network is as powerful as the state-of-the-art models in the machine-learning-based VADs. In our empirical comparison, ten common features are used for performance analysis. Extensive experimental results on the AURORA2 corpus show that the DBN-based VAD not only outperforms eleven referenced VADs, but also can meet the real-time detection demand of VAD. The results also show that the DBN-based VAD can fuse the advantages of multiple features effectively.

326 citations

Journal ArticleDOI
Qiang Zhang1, Ji Wu1, J.J Wang, W.L Zheng, J.G Chen, A.B Li 
TL;DR: In this article, the corrosion behavior of weathering steel and carbon steel exposed in marine atmosphere for 4 years were studied by means of regression analysis, rust structure observation, X-ray diffraction, Raman spectrum and electrochemical impedance spectra techniques.

153 citations

Journal ArticleDOI
TL;DR: Using faster regions with convolutional neural network features (faster R-CNN) in the TensorFlow tool package to detect and number teeth in dental periapical films indicates that machines already perform close to the level of a junior dentist.
Abstract: We propose using faster regions with convolutional neural network features (faster R-CNN) in the TensorFlow tool package to detect and number teeth in dental periapical films. To improve detection precisions, we propose three post-processing techniques to supplement the baseline faster R-CNN according to certain prior domain knowledge. First, a filtering algorithm is constructed to delete overlapping boxes detected by faster R-CNN associated with the same tooth. Next, a neural network model is implemented to detect missing teeth. Finally, a rule-base module based on a teeth numbering system is proposed to match labels of detected teeth boxes to modify detected results that violate certain intuitive rules. The intersection-over-union (IOU) value between detected and ground truth boxes are calculated to obtain precisions and recalls on a test dataset. Results demonstrate that both precisions and recalls exceed 90% and the mean value of the IOU between detected boxes and ground truths also reaches 91%. Moreover, three dentists are also invited to manually annotate the test dataset (independently), which are then compared to labels obtained by our proposed algorithms. The results indicate that machines already perform close to the level of a junior dentist.

141 citations

Posted Content
Xien Liu1, You Xinxin, Xiao Zhang1, Ji Wu1, Ping Lv 
TL;DR: This paper investigates graph-based neural networks for text classification problem with a new framework TensorGCN (tensor graph convolutional networks), which presents an effective way to harmonize and integrate heterogeneous information from different kinds of graphs.
Abstract: Compared to sequential learning models, graph-based neural networks exhibit some excellent properties, such as ability capturing global information. In this paper, we investigate graph-based neural networks for text classification problem. A new framework TensorGCN (tensor graph convolutional networks), is presented for this task. A text graph tensor is firstly constructed to describe semantic, syntactic, and sequential contextual information. Then, two kinds of propagation learning perform on the text graph tensor. The first is intra-graph propagation used for aggregating information from neighborhood nodes in a single graph. The second is inter-graph propagation used for harmonizing heterogeneous information between graphs. Extensive experiments are conducted on benchmark datasets, and the results illustrate the effectiveness of our proposed framework. Our proposed TensorGCN presents an effective way to harmonize and integrate heterogeneous information from different kinds of graphs.

108 citations

Journal ArticleDOI
Xien Liu1, You Xinxin, Xiao Zhang1, Ji Wu1, Ping Lv 
03 Apr 2020
TL;DR: In this paper, a text graph tensor is constructed to describe semantic, syntactic, and sequential contextual information, and two kinds of propagation learning are used to harmonize and integrate heterogeneous information from different kinds of graphs.
Abstract: Compared to sequential learning models, graph-based neural networks exhibit some excellent properties, such as ability capturing global information. In this paper, we investigate graph-based neural networks for text classification problem. A new framework TensorGCN (tensor graph convolutional networks), is presented for this task. A text graph tensor is firstly constructed to describe semantic, syntactic, and sequential contextual information. Then, two kinds of propagation learning perform on the text graph tensor. The first is intra-graph propagation used for aggregating information from neighborhood nodes in a single graph. The second is inter-graph propagation used for harmonizing heterogeneous information between graphs. Extensive experiments are conducted on benchmark datasets, and the results illustrate the effectiveness of our proposed framework. Our proposed TensorGCN presents an effective way to harmonize and integrate heterogeneous information from different kinds of graphs.

101 citations


Cited by
More filters
Book
Li Deng1, Dong Yu1
12 Jun 2014
TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.
Abstract: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

2,817 citations

Journal ArticleDOI
TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.
Abstract: Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This paper provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then, we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multitalker separation), and speech dereverberation, as well as multimicrophone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.

1,009 citations

19 Apr 2011
TL;DR: Administration of spermidine markedly extended the lifespan of yeast, flies and worms, and human immune cells and inhibited oxidative stress in ageing mice, and found that enhanced autophagy is crucial for polyamine-induced suppression of necrosis and enhanced longevity.
Abstract: Ageing results from complex genetically and epigenetically programmed processes that are elicited in part by noxious or stressful events that cause programmed cell death Here, we report that administration of spermidine, a natural polyamine whose intracellular concentration declines during human ageing, markedly extended the lifespan of yeast, flies and worms, and human immune cells In addition, spermidine administration potently inhibited oxidative stress in ageing mice In ageing yeast, spermidine treatment triggered epigenetic deacetylation of histone H3 through inhibition of histone acetyltransferases (HAT), suppressing oxidative stress and necrosis Conversely, depletion of endogenous polyamines led to hyperacetylation, generation of reactive oxygen species, early necrotic death and decreased lifespan The altered acetylation status of the chromatin led to significant upregulation of various autophagy-related transcripts, triggering autophagy in yeast, flies, worms and human cells Finally, we found that enhanced autophagy is crucial for polyamine-induced suppression of necrosis and enhanced longevity

974 citations

Journal ArticleDOI
TL;DR: This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture that tends to achieve significant improvements in terms of various objective quality measures.
Abstract: This letter presents a regression-based speech enhancement framework using deep neural networks (DNNs) with a multiple-layer deep architecture. In the DNN learning process, a large training set ensures a powerful modeling capability to estimate the complicated nonlinear mapping from observed noisy speech to desired clean signals. Acoustic context was found to improve the continuity of speech to be separated from the background noises successfully without the annoying musical artifact commonly observed in conventional speech enhancement algorithms. A series of pilot experiments were conducted under multi-condition training with more than 100 hours of simulated speech data, resulting in a good generalization capability even in mismatched testing conditions. When compared with the logarithmic minimum mean square error approach, the proposed DNN-based algorithm tends to achieve significant improvements in terms of various objective quality measures. Furthermore, in a subjective preference evaluation with 10 listeners, 76.35% of the subjects were found to prefer DNN-based enhanced speech to that obtained with other conventional technique.

860 citations