Home
/
Topics
/
Feature hashing

Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1985
1984
1983
1982
1981
1980
1979
1978
1977
1975
1970

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A Novel Feature Hashing With Efficient Collision Resolution for Bag-of-Words Representation of Text Data

[...]

Bobby A. Eclarin¹, Arnel C. Fajardo, Ruji P. Medina¹•Institutions (1)

Technological Institute of the Philippines¹

07 Sep 2018

TL;DR: Using the vector data structure, the lookup performance is improved while resolving collision and the memory usage is also efficient.

...read moreread less

Abstract: Text Mining is widely used in many areas transforming unstructured text data from all sources such as patients' record, social media network, insurance data, and news, among others into an invaluable source of information. The Bag Of Words (BoW) representation is a means of extracting features from text data for use in modeling. In text classification, a word in a document is assigned a weight according to its frequency and frequency between different documents; therefore, words together with their weights form the BoW. One way to solve the issue of voluminous data is to use the feature hashing method or hashing trick. However, collision is inevitable and might change the result of the whole process of feature generation and selection. Using the vector data structure, the lookup performance is improved while resolving collision and the memory usage is also efficient.

...read moreread less

2 citations

Journal Article•DOI•

Robust supervised matrix factorization hashing with application to cross-modal retrieval

[...]

Zhenqiu Shu, Kailing Yong, Dongling Zhang, Zhengtao Yu, Xiao-Jun Wu - Show less +1 more

27 Nov 2022-Neural Computing and Applications

2 citations

Journal Article•DOI•

Image perceptual hashing for content authentication based on Watson’s visual model and LLE

[...]

Hui Xing, Huifang Che, Qilin Wu, Honghai Wang

01 Feb 2023-Journal of Real-time Image Processing

2 citations

Journal Article•DOI•

Managing statistical behavior of large data sets in shared-nothing architectures

[...]

Isidore Rigoutsos¹, Alex Delis•Institutions (1)

IBM¹

01 Nov 1998-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A two-stage methodology that uses the knowledge of the hashing function to reorganize the group assignments so that the resulting groups have similar expected cardinalities, and is generally applicable and independent of the used hashing function.

...read moreread less

Abstract: Increasingly larger data sets are being stored in networked architectures. Many of the available data structures are not easily amenable to parallel realizations. Hashing schemes show promise in that respect for the simple reason that the underlying data structure can be decomposed and spread among the set of cooperating nodes with minimal communication and maintenance requirements. In all cases, storage utilization and load balancing are issues that need to be addressed. One can identify two basic approaches to tackle the problem. One way is to address it as part of the design of the data structure that is used to store and retrieve the data. The other is to maintain the data structure intact but address the problem separately. The method that we present here falls in the latter category and is applicable whenever a hash table is the preferred data structure. Intrinsically attached to the used hash table is a hashing function that allows one to partition a possibly unbounded set of data items into a finite set of groups; the hashing function provides the partitioning by assigning each data item to one of the groups. In general, the hashing function cannot guarantee that the various groups will have the same cardinality on average, for all possible data item distributions. In this paper, we propose a two-stage methodology that uses the knowledge of the hashing function to reorganize the group assignments so that the resulting groups have similar expected cardinalities. The method is generally applicable and independent of the used hashing function. We show the power of the methodology using both synthetic and real-world databases. The derived quasi-uniform storage occupancy and associated load-balancing gains are significant.

...read moreread less

2 citations

Journal Article•DOI•

A comparison of three strategies for computing letter oriented, minimal perfect hashing functions

[...]

John A. Trono¹•Institutions (1)

Saint Michael's College¹

01 Apr 1995-Sigplan Notices

TL;DR: Improvements to Cichelli's method for computing the set of weights used for minimal perfect hashing functions by adding a "MOD number_of_keys" operation to the hashing function, and to the removal of unnecessary backtracking due to "guaranteed collisions".

...read moreread less

Abstract: This paper will discuss improvements to Cichelli's method for computing the set of weights used for minimal perfect hashing functions[1]. The major modifications investigated here pertain to adding a "MOD number_of_keys" operation to the hashing function, and to the removal of unnecessary backtracking due to "guaranteed collisions".

...read moreread less

2 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
…
173
174
175
176
177
178
179
…
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,120

Papers

57,460

Citations

No. of papers in the topic in previous years
Year	Papers
2023	33
2022	89
2021	11
2020	16
2019	16
2018	38

Feature hashing

Papers published on a yearly basis

Papers

Trending Questions (2)

Network Information

Related Topics (5)

Performance

Metrics