Home
/
Authors
/
Yan Solihin

Author

Yan Solihin

Other affiliations: Wilmington University, Michigan State University, Los Alamos National Laboratory ...read more

Bio: Yan Solihin is an academic researcher from University of Central Florida. The author has contributed to research in topics: Cache & Cache pollution. The author has an hindex of 35, co-authored 174 publications receiving 5970 citations. Previous affiliations of Yan Solihin include Wilmington University & Michigan State University.

Topics: Cache, Cache pollution, Cache algorithms, Cache coloring, CPU cache ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1997
1996

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

[...]

Seongbeom Kim¹, Dhruba Chandra¹, Yan Solihin¹•Institutions (1)

North Carolina State University¹

29 Sep 2004

TL;DR: It is found that optimizing fairness usually increases throughput, while maximizing throughput does not necessarily improve fairness, and two algorithms are proposed that optimize fairness.

...read moreread less

Abstract: This paper presents a detailed study of fairness in cache sharing between threads in a chip multiprocessor (CMP) architecture. Prior work in CMP architectures has only studied throughput optimization techniques for a shared cache. The issue of fairness in cache sharing, and its relation to throughput, has not been studied. Fairness is a critical issue because the operating system (OS) thread scheduler's effectiveness depends on the hardware to provide fair cache sharing to co-scheduled threads. Without such hardware, serious problems, such as thread starvation and priority inversion, can arise and render the OS scheduler ineffective. This paper makes several contributions. First, it proposes and evaluates five cache fairness metrics that measure the degree of fairness in cache sharing, and shows that two of them correlate very strongly with the execution-time fairness. Execution-time fairness is defined as how uniform the execution times of co-scheduled threads are changed, where each change is relative to the execution time of the same thread running alone. Secondly, using the metrics, the paper proposes static and dynamic L2 cache partitioning algorithms that optimize fairness. The dynamic partitioning algorithm is easy to implement, requires little or no profiling, has low overhead, and does not restrict the cache replacement algorithm to LRU. The static algorithm, although requiring the cache to maintain LRU stack information, can help the OS thread scheduler to avoid cache thrashing. Finally, this paper studies the relationship between fairness and throughput in detail. We found that optimizing fairness usually increases throughput, while maximizing throughput does not necessarily improve fairness. Using a set of co-scheduled pairs of benchmarks, on average our algorithms improve fairness by a factor of 4/spl times/, while increasing the throughput by 15%, compared to a nonpartitioned shared cache.

...read moreread less

544 citations

Proceedings Article•DOI•

Predicting inter-thread cache contention on a chip multi-processor architecture

[...]

Dhruba Chandra¹, Fei Guo¹, Seongbeom Kim¹, Yan Solihin¹•Institutions (1)

North Carolina State University¹

12 Feb 2005

TL;DR: Three performance models are proposed that predict the impact of cache sharing on co-scheduled threads and the most accurate model, the inductive probability model, achieves an average error of only 3.9%.

...read moreread less

Abstract: This paper studies the impact of L2 cache sharing on threads that simultaneously share the cache, on a chip multi-processor (CMP) architecture. Cache sharing impacts threads nonuniformly, where some threads may be slowed down significantly, while others are not. This may cause severe performance problems such as sub-optimal throughput, cache thrashing, and thread starvation for threads that fail to occupy sufficient cache space to make good progress. Unfortunately, there is no existing model that allows extensive investigation of the impact of cache sharing. To allow such a study, we propose three performance models that predict the impact of cache sharing on co-scheduled threads. The input to our models is the isolated L2 cache stack distance or circular sequence profile of each thread, which can be easily obtained on-line or off-line. The output of the models is the number of extra L2 cache misses for each thread due to cache sharing. The models differ by their complexity and prediction accuracy. We validate the models against a cycle-accurate simulation that implements a dual-core CMP architecture, on fourteen pairs of mostly SPEC benchmarks. The most accurate model, the inductive probability model, achieves an average error of only 3.9%. Finally, to demonstrate the usefulness and practicality of the model, a case study that details the relationship between an application's temporal reuse behavior and its cache sharing impact is presented.

...read moreread less

543 citations

Proceedings Article•DOI•

Scaling the bandwidth wall: challenges in and avenues for CMP scaling

[...]

Brian Rogers¹, Anil Krishna², Gordon B. Bell², Ken Vu², Xiaowei Jiang¹, Yan Solihin¹ - Show less +2 more•Institutions (2)

North Carolina State University¹, Research Triangle Park²

20 Jun 2009

TL;DR: A simple but powerful analytical model is developed to predict the number of on-chip cores that a CMP can support given a limited growth in memory traffic capacity, and it is found that the bandwidth wall can severely limit core scaling.

...read moreread less

Abstract: As transistor density continues to grow at an exponential rate in accordance to Moore's law, the goal for many Chip Multi-Processor (CMP) systems is to scale the number of on-chip cores proportionally. Unfortunately, off-chip memory bandwidth capacity is projected to grow slowly compared to the desired growth in the number of cores. This creates a situation in which each core will have a decreasing amount of off-chip bandwidth that it can use to load its data from off-chip memory. The situation in which off-chip bandwidth is becoming a performance and throughput bottleneck is referred to as the bandwidth wall problem.In this study, we seek to answer two questions: (1) to what extent does the bandwidth wall problem restrict future multicore scaling, and (2) to what extent are various bandwidth conservation techniques able to mitigate this problem. To address them, we develop a simple but powerful analytical model to predict the number of on-chip cores that a CMP can support given a limited growth in memory traffic capacity. We find that the bandwidth wall can severely limit core scaling. When starting with a balanced 8-core CMP, in four technology generations the number of cores can only scale to 24, as opposed to 128 cores under proportional scaling, without increasing the memory traffic requirement. We find that various individual bandwidth conservation techniques we evaluate have a wide ranging impact on core scaling, and when combined together, these techniques have the potential to enable super-proportional core scaling for up to 4 technology generations.

...read moreread less

297 citations

Proceedings Article•DOI•

QoS policies and architecture for cache/memory in CMP platforms

[...]

Ravi Iyer¹, Li Zhao¹, Fei Guo², Ramesh Illikkal¹, Srihari Makineni¹, Donald Newell¹, Yan Solihin², Lisa Hsu³, Steve Reinhardt³ - Show less +5 more•Institutions (3)

Intel¹, North Carolina State University², University of Michigan³

12 Jun 2007

TL;DR: A QoS-enabled memory architecture for CMP platforms that enables more cache resources and memory resources for high priority applications based on guidance from the operating environment and allows dynamic resource reassignment during run-time to further optimize the performance of the high priority application with minimal degradation to low priority.

...read moreread less

Abstract: As we enter the era of CMP platforms with multiple threads/cores on the die, the diversity of the simultaneous workloads running on them is expected to increase. The rapid deployment of virtualization as a means to consolidate workloads on to a single platform is a prime example of this trend. In such scenarios, the quality of service (QoS) that each individual workload gets from the platform can widely vary depending on the behavior of the simultaneously running workloads. While the number of cores assigned to each workload can be controlled, there is no hardware or software support in today's platforms to control allocation of platform resources such as cache space and memory bandwidth to individual workloads. In this paper, we propose a QoS-enabled memory architecture for CMP platforms that addresses this problem. The QoS-enabled memory architecture enables more cache resources (i.e. space) and memory resources (i.e. bandwidth) for high priority applications based on guidance from the operating environment. The architecture also allows dynamic resource reassignment during run-time to further optimize the performance of the high priority application with minimal degradation to low priority. To achieve these goals, we will describe the hardware/software support required in the platform as well as the operating environment (O/S and virtual machine monitor). Our evaluation framework consists of detailed platform simulation models and a QoS-enabled version of Linux. Based on evaluation experiments, we show the effectiveness of a QoS-enabled architecture and summarize key findings/trade-offs.

...read moreread less

282 citations

Journal Article•DOI•

Improving Cost, Performance, and Security of Memory Encryption and Authentication

[...]

Chenyu Yan¹, Daniel Englender¹, Milos Prvulovic¹, Brian Rogers², Yan Solihin² - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, North Carolina State University²

01 May 2006

TL;DR: The new split counters for counter-mode encryption simultaneously eliminate counter overflow problems and reduce per-block counter size and dramatically improve authentication performance and security by using the Galois/counter mode of operation (GCM), which leverages counter- mode encryption to reduce authentication latency and overlap it with memory accesses.

...read moreread less

Abstract: Protection from hardware attacks such as snoopers and mod chips has been receiving increasing attention in computer architecture. This paper presents a new combined memory encryption/authentication scheme. Our new split counters for counter-mode encryption simultaneously eliminate counter overflow problems and reduce per-block counter size, and we also dramatically improve authentication performance and security by using the Galois/Counter Mode of operation (GCM), which leverages counter-mode encryption to reduce authentication latency and overlap it with memory accesses. Our results indicate that the split-counter scheme has a negligible overhead even with a small (32KB) counter cache and using only eight counter bits per data block. The combined encryption/authentication scheme has an IPC overhead of 5% on average across SPEC CPU 2000 benchmarks, which is a significant improvement over the 20% overhead of existing encryption/authentication schemes.

...read moreread less

261 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Survey over image thresholding techniques and quantitative performance evaluation

[...]

Mehmet Sezgin¹, Bulent Sankur•Institutions (1)

TÜBİTAK Marmara Research Center¹

01 Jan 2004-Journal of Electronic Imaging

TL;DR: 40 selected thresholding methods from various categories are compared in the context of nondestructive testing applications as well as for document images, and the thresholding algorithms that perform uniformly better over nonde- structive testing and document image applications are identified.

...read moreread less

Abstract: We conduct an exhaustive survey of image thresholding methods, categorize them, express their formulas under a uniform notation, and finally carry their performance comparison. The thresholding methods are categorized according to the information they are exploiting, such as histogram shape, measurement space clustering, entropy, object attributes, spatial correlation, and local gray-level surface. 40 selected thresholding methods from various categories are compared in the context of nondestructive testing applications as well as for document images. The comparison is based on the combined performance measures. We identify the thresholding algorithms that perform uniformly better over nonde- structive testing and document image applications. © 2004 SPIE and IS&T. (DOI: 10.1117/1.1631316)

...read moreread less

4,543 citations

Journal Article•DOI•

Online and off-line handwriting recognition: a comprehensive survey

[...]

Réjean Plamondon¹, Sargur N. Srihari²•Institutions (2)

École Normale Supérieure¹, University at Buffalo²

01 Jan 2000-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms are described.

...read moreread less

Abstract: Handwriting has continued to persist as a means of communication and recording information in day-to-day life even with the introduction of new technologies. Given its ubiquity in human transactions, machine recognition of handwriting has practical significance, as in reading handwritten notes in a PDA, in postal addresses on envelopes, in amounts in bank checks, in handwritten fields in forms, etc. This overview describes the nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms. Both the online case (which pertains to the availability of trajectory data during writing) and the off-line case (which pertains to scanned images) are considered. Algorithms for preprocessing, character and word recognition, and performance with practical systems are indicated. Other fields of application, like signature verification, writer authentification, handwriting learning tools are also considered.

...read moreread less

2,653 citations

DOI•

International Technology Roadmap for Semiconductors 2003の要求清浄度について－シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について－

[...]

飯田裕幸, 竹田菊男, 藤本武利

20 Sep 2004

1,387 citations

Proceedings Article•DOI•

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

[...]

Moinuddin K. Qureshi, Yale N. Patt

09 Dec 2006

TL;DR: In this article, the authors propose a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources.

...read moreread less

Abstract: This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the application that has a high demand and fewer cache resources to the application that has a low demand. However, a higher demand for cache resources does not always correlate with a higher performance from additional cache resources. It is beneficial for performance to invest cache resources in the application that benefits more from the cache resources rather than in the application that has more demand for the cache resources. This paper proposes utility-based cache partitioning (UCP), a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources. The proposed mechanism monitors each application at runtime using a novel, cost-effective, hardware circuit that requires less than 2kB of storage. The information collected by the monitoring circuits is used by a partitioning algorithm to decide the amount of cache resources allocated to each application. Our evaluation, with 20 multiprogrammed workloads, shows that UCP improves performance of a dual-core system by up to 23% and on average 11% over LRU-based cache partitioning.

...read moreread less

1,083 citations

Journal Article•DOI•

A survey on automated dynamic malware-analysis techniques and tools

[...]

Manuel Egele¹, Theodoor Scholte, Engin Kirda², Christopher Kruegel³•Institutions (3)

Vienna University of Technology¹, Institut Eurécom², University of California, Santa Barbara³

05 Mar 2008-ACM Computing Surveys

TL;DR: An overview of techniques based on dynamic analysis that are used to analyze potentially malicious samples and analysis programs that employ these techniques to assist human analysts in assessing whether a given sample deserves closer manual inspection due to its unknown malicious behavior is provided.

...read moreread less

Abstract: Anti-virus vendors are confronted with a multitude of potentially malicious samples today. Receiving thousands of new samples every day is not uncommon. The signatures that detect confirmed malicious threats are mainly still created manually, so it is important to discriminate between samples that pose a new unknown threat and those that are mere variants of known malware.This survey article provides an overview of techniques based on dynamic analysis that are used to analyze potentially malicious samples. It also covers analysis programs that leverage these It also covers analysis programs that employ these techniques to assist human analysts in assessing, in a timely and appropriate manner, whether a given sample deserves closer manual inspection due to its unknown malicious behavior.

...read moreread less

815 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse