Home
/
Authors
/
Yongfeng Huang

Author

Yongfeng Huang

Other affiliations: Association for Computing Machinery, Microsoft

Bio: Yongfeng Huang is an academic researcher from Tsinghua University. The author has contributed to research in topics: Steganography & Steganalysis. The author has an hindex of 30, co-authored 230 publications receiving 2703 citations. Previous affiliations of Yongfeng Huang include Association for Computing Machinery & Microsoft.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

NPA: Neural News Recommendation with Personalized Attention

[...]

Chuhan Wu¹, Fangzhao Wu², Mingxiao An³, Jianqiang Huang⁴, Yongfeng Huang¹, Xing Xie² - Show less +2 more•Institutions (4)

Tsinghua University¹, Microsoft², University of Science and Technology of China³, Peking University⁴

25 Jul 2019

TL;DR: In this article, a neural news recommendation model with personalized attention (NPA) is proposed, which exploits the embedding of user ID to generate the query vector for the word-and news-level attentions.

...read moreread less

Abstract: News recommendation is very important to help users find interested news and alleviate information overload. Different users usually have different interests and the same user may have various interests. Thus, different users may click the same news article with attention on different aspects. In this paper, we propose a neural news recommendation model with personalized attention (NPA). The core of our approach is a news representation model and a user representation model. In the news representation model we use a CNN network to learn hidden representations of news articles based on their titles. In the user representation model we learn the representations of users based on the representations of their clicked news articles. Since different words and different news articles may have different informativeness for representing news and users, we propose to apply both word- and news-level attention mechanism to help our model attend to important words and news articles. In addition, the same news article and the same word may have different informativeness for different users. Thus, we propose a personalized attention network which exploits the embedding of user ID to generate the query vector for the word- and news-level attentions. Extensive experiments are conducted on a real-world news recommendation dataset collected from MSN news, and the results validate the effectiveness of our approach on news recommendation.

...read moreread less

218 citations

Proceedings Article•DOI•

Neural News Recommendation with Multi-Head Self-Attention.

[...]

Chuhan Wu¹, Fangzhao Wu², Suyu Ge¹, Tao Qi¹, Yongfeng Huang¹, Xing Xie² - Show less +2 more•Institutions (2)

Tsinghua University¹, Microsoft²

01 Nov 2019

TL;DR: A neural news recommendation approach with multi-head self-attentions to learn news representations from news titles by modeling the interactions between words and applies additive attention to learn more informative news and user representations by selecting important words and news.

...read moreread less

Abstract: News recommendation can help users find interested news and alleviate information overload. Precisely modeling news and users is critical for news recommendation, and capturing the contexts of words and news is important to learn news and user representations. In this paper, we propose a neural news recommendation approach with multi-head self-attention (NRMS). The core of our approach is a news encoder and a user encoder. In the news encoder, we use multi-head self-attentions to learn news representations from news titles by modeling the interactions between words. In the user encoder, we learn representations of users from their browsed news and use multi-head self-attention to capture the relatedness between the news. Besides, we apply additive attention to learn more informative news and user representations by selecting important words and news. Experiments on a real-world dataset validate the effectiveness and efficiency of our approach.

...read moreread less

180 citations

Journal Article•DOI•

RNN-Stega: Linguistic Steganography Based on Recurrent Neural Networks

[...]

Zhongliang Yang¹, Xiaoqing Guo², Zi-Ming Chen³, Yongfeng Huang¹, Yu-Jin Zhang¹ - Show less +1 more•Institutions (3)

Tsinghua University¹, City University of Hong Kong², Beijing University of Posts and Telecommunications³

01 May 2019-IEEE Transactions on Information Forensics and Security

TL;DR: A linguistic steganography based on recurrent neural networks, which can automatically generate high-quality text covers on the basis of a secret bitstream that needs to be hidden, and achieves the state-of-the-art performance.

...read moreread less

Abstract: Linguistic steganography based on text carrier auto-generation technology is a current topic with great promise and challenges. Limited by the text automatic generation technology or the corresponding text coding methods, the quality of the steganographic text generated by previous methods is inferior, which makes its imperceptibility unsatisfactory. In this paper, we propose a linguistic steganography based on recurrent neural networks, which can automatically generate high-quality text covers on the basis of a secret bitstream that needs to be hidden. We trained our model with a large number of artificially generated samples and obtained a good estimate of the statistical language model. In the text generation process, we propose fixed-length coding and variable-length coding to encode words based on their conditional probability distribution. We designed several experiments to test the proposed model from the perspectives of information hiding efficiency, information imperceptibility, and information hidden capacity. The experimental results show that the proposed model outperforms all the previous related methods and achieves the state-of-the-art performance.

...read moreread less

164 citations

Journal Article•DOI•

Steganography in Inactive Frames of VoIP Streams Encoded by Source Codec

[...]

Yongfeng Huang¹, Shanyu Tang², Jian Yuan¹•Institutions (2)

Tsinghua University¹, London Metropolitan University²

01 Jun 2011-IEEE Transactions on Information Forensics and Security

TL;DR: It is revealed that, contrary to existing thought, the inactive frames of VoIP streams are more suitable for data embedding than the active frames of the streams; that is, steganography in the inactive audio frames attains a largerData embedding capacity than that in the active audio frames under the same imperceptibility.

...read moreread less

Abstract: This paper describes a novel high-capacity steganography algorithm for embedding data in the inactive frames of low bit rate audio streams encoded by G.723.1 source codec, which is used extensively in Voice over Internet Protocol (VoIP). This study reveals that, contrary to existing thought, the inactive frames of VoIP streams are more suitable for data embedding than the active frames of the streams; that is, steganography in the inactive audio frames attains a larger data embedding capacity than that in the active audio frames under the same imperceptibility. By analyzing the concealment of steganography in the inactive frames of low bit rate audio streams encoded by G.723.1 codec with 6.3 kb/s, the authors propose a new algorithm for steganography in different speech parameters of the inactive frames. Performance evaluation shows embedding data in various speech parameters led to different levels of concealment. An improved voice activity detection algorithm is suggested for detecting inactive audio frames taking into packet loss account. Experimental results show our proposed steganography algorithm not only achieved perfect imperceptibility but also gained a high data embedding rate up to 101 bits/frame, indicating that the data embedding capacity of the proposed algorithm is very much larger than those of previously suggested algorithms.

...read moreread less

127 citations

Journal Article•DOI•

Steganography Integration Into a Low-Bit Rate Speech Codec

[...]

Yongfeng Huang¹, Chenghao Liu², Shanyu Tang, Sen Bai²•Institutions (2)

Tsinghua University¹, Chongqing Communication Institute²

01 Dec 2012-IEEE Transactions on Information Forensics and Security

TL;DR: A new algorithm is proposed for steganography in low bit-rate VoIP audio streams by integrating information hiding into the process of speech encoding, thus maintaining synchronization between information hiding and speech encoding.

...read moreread less

Abstract: Low bit-rate speech codecs have been widely used in audio communications like VoIP and mobile communications, so that steganography in low bit-rate audio streams would have broad applications in practice. In this paper, the authors propose a new algorithm for steganography in low bit-rate VoIP audio streams by integrating information hiding into the process of speech encoding. The proposed algorithm performs data embedding while pitch period prediction is conducted during low bit-rate speech encoding, thus maintaining synchronization between information hiding and speech encoding. The steganography algorithm can achieve high quality of speech and prevent detection of steganalysis, but also has great compatibility with a standard low bit-rate speech codec without causing further delay by data embedding and extraction. Testing shows, with the proposed algorithm, the data embedding rate of the secret message can attain 4 bits/frame (133.3 bits/second).

...read moreread less

109 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

High-performance medicine: the convergence of human and artificial intelligence

[...]

Eric J. Topol¹•Institutions (1)

Scripps Health¹

01 Jan 2019-Nature Medicine

TL;DR: Over time, marked improvements in accuracy, productivity, and workflow will likely be actualized, but whether that will be used to improve the patient–doctor relationship or facilitate its erosion remains to be seen.

...read moreread less

Abstract: The use of artificial intelligence, and the deep-learning subtype in particular, has been enabled by the use of labeled big data, along with markedly enhanced computing power and cloud storage, across all sectors. In medicine, this is beginning to have an impact at three levels: for clinicians, predominantly via rapid, accurate image interpretation; for health systems, by improving workflow and the potential for reducing medical errors; and for patients, by enabling them to process their own data to promote health. The current limitations, including bias, privacy and security, and lack of transparency, along with the future directions of these applications will be discussed in this article. Over time, marked improvements in accuracy, productivity, and workflow will likely be actualized, but whether that will be used to improve the patient-doctor relationship or facilitate its erosion remains to be seen.

...read moreread less

2,574 citations

Matrix Factorization Techniques for Recommender Systems

[...]

Patrick Seemann

01 Jan 2014

2,080 citations

Posted Content•

PORs: Proofs of Retrievability for Large Files

[...]

Ari Juels, Burton S. Kaliski¹•Institutions (1)

EMC Corporation¹

01 Jan 2007-IACR Cryptology ePrint Archive

TL;DR: This paper defines and explores proofs of retrievability (PORs), a POR scheme that enables an archive or back-up service to produce a concise proof that a user can retrieve a target file F, that is, that the archive retains and reliably transmits file data sufficient for the user to recover F in its entirety.

...read moreread less

Abstract: In this paper, we define and explore proofs of retrievability (PORs). A POR scheme enables an archive or back-up service (prover) to produce a concise proof that a user (verifier) can retrieve a target file F, that is, that the archive retains and reliably transmits file data sufficient for the user to recover F in its entirety.A POR may be viewed as a kind of cryptographic proof of knowledge (POK), but one specially designed to handle a large file (or bitstring) F. We explore POR protocols here in which the communication costs, number of memory accesses for the prover, and storage requirements of the user (verifier) are small parameters essentially independent of the length of F. In addition to proposing new, practical POR constructions, we explore implementation considerations and optimizations that bear on previously explored, related schemes.In a POR, unlike a POK, neither the prover nor the verifier need actually have knowledge of F. PORs give rise to a new and unusual security definition whose formulation is another contribution of our work.We view PORs as an important tool for semi-trusted online archives. Existing cryptographic techniques help users ensure the privacy and integrity of files they retrieve. It is also natural, however, for users to want to verify that archives do not delete or modify files prior to retrieval. The goal of a POR is to accomplish these checks without users having to download the files themselves. A POR can also provide quality-of-service guarantees, i.e., show that a file is retrievable within a certain time bound.

...read moreread less

1,783 citations

Posted Content•

A Survey on Multi-Task Learning

[...]

Yu Zhang¹, Qiang Yang¹•Institutions (1)

Hong Kong University of Science and Technology¹

25 Jul 2017-arXiv: Learning

TL;DR: Multi-task learning (MTL) as mentioned in this paper is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks.

...read moreread less

Abstract: Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL from the perspective of algorithmic modeling, applications and theoretical analyses. For algorithmic modeling, we give a definition of MTL and then classify different MTL algorithms into five categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach and decomposition approach as well as discussing the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, we review online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works in this paper. Finally, we present theoretical analyses and discuss several future directions for MTL.

...read moreread less

1,202 citations

Journal Article•DOI•

Applications of machine learning in drug discovery and development.

[...]

Jessica Vamathevan¹, Dominic Clark¹, Paul Czodrowski², Ian Dunham¹, Edgardo Ferran¹, George Lee³, Bin Li⁴, Anant Madabhushi⁵, Anant Madabhushi⁶, Parantu K. Shah⁷, Michaela Spitzer¹, Shanrong Zhao⁸ - Show less +8 more•Institutions (8)

European Bioinformatics Institute¹, Technical University of Dortmund², Bristol-Myers Squibb³, Takeda Pharmaceutical Company⁴, Case Western Reserve University⁵, Veterans Health Administration⁶, Merck Serono⁷, Pfizer⁸

01 Jun 2019-Nature Reviews Drug Discovery

TL;DR: The most useful techniques and how machine learning can promote data-driven decision making in drug discovery and development are discussed and major hurdles in the field are highlighted.

...read moreread less

Abstract: Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and analysis of digital pathology data in clinical trials. Applications have ranged in context and methodology, with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development. Machine learning has been applied to numerous stages in the drug discovery pipeline. Here, Vamathevan and colleagues discuss the most useful techniques and how machine learning can promote data-driven decision making in drug discovery and development. They highlight major hurdles in the field, such as the required data characteristics for applying machine learning, which will need to be solved as machine learning matures.

...read moreread less

1,159 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse