# A Hardware Architecture for Real-Time Video Segmentation Utilizing Memory Reduction Techniques

^{1}

## Summary (3 min read)

### Introduction

- A multimodal background is caused by repetitive background object motion, e.g., swaying trees, flickering of the monitor, etc. Furthermore, FPGAs gives us real time performance, hard to achieve with DSP processors, while limiting the extensive design work required for application specific integrated circuits .
- In this work an automated surveillance system has been chosen as the target applications.
- Sections II and III discuss the original algorithm and possible modifications for hardware efficiency.

### II. GAUSSIAN MIXTURE BACKGROUND MODEL

- For a more thorough description the authors refer to [13].
- Each distribution , has a weight, , that indicates the probability of matching a new incoming pixel, .
- The higher the weight, the more likely the distribution belongs to the background.
- For those unmatched, the weight is updated according to (8) while the mean and the variance remain the same.
- Instead of mainly focusing on improving robustness, the authors propose several modifications to the algorithm with the major concern on their impact on potentially improved hardware efficiency.

### A. Color Space Transformation

- Multimodal situations only occur when repetitive background objects are present in the scene.
- By transforming RGB into space, the correlation among different color coordinates are mostly removed, resulting in nearly independent color components.
- As shown in Fig. 2(a), most pixel distributions are transformed from cylinders back to spheres, capable of being modeled with a single spherical distribution.
- The authors propose two simplifications to the algorithm.
- The distribution with dominant weight but large variance does not get to the top, identified as background distribution.

### III. HARDWARE ARCHITECTURE

- To perform the algorithm with VGA resolution in real-time, a dedicated hardware architecture, with a streamlined data flow and memory bandwidth reduction schemes, is implemented to address the computation capacity and memory bandwidth bottlenecks.
- Algorithm modifications covered in previous sections are implemented with potential benefits on hardware efficiency and segmentation quality.
- The image data is captured with one color component at a time, and the three color components are sent to the system after serial-parallel transformation.
- With the image data captured and transformed, the match and switch block tries to match the incoming pixel with Gaussian distributions obtained from the previous frame.
- Similar updating schemes are utilized for variance update.

### A. Sorting

- The updated Gaussian parameters have to be sorted for use in the next frame.
- By observing that only one Gaussian distribution is updated at a time and all the distributions are initially sorted, the sorting of Gaussian distributions can be changed to rearranging an updated distribution among ordered distributions.
- The output of each comparator signifies which distribution is to be multiplexed to the output, e.g., if the weight of any unmatched distribution is smaller than the updated one, all unmatched distributions below the current one is switched to the output at the next lower MUX.
- Since only three Gaussians are utilized in their implementation this is a trivial task.
- A comparison of hardware complexity between proposed sorting architecture and other schemes mentioned above is shown in Authorized licensed use limited to: Lunds Universitetsbibliotek.

### B. Wordlength Reduction

- Slow background updating requires large dynamic range for each parameter in the distributions, since parameter values are changed slightly between frames but could accumulate over time.
- Together with 16 bits weight and integer parts of the mean and the variance, 81–100 bits are needed for a single Gaussian distribution.
- From (13), a small positive or negative number is derived depending on whether the incoming pixel is above or below the current mean.
- The coarse updating scheme on the other hand relieves the problem to certain extent, where consecutive ones are added or subtracted to keep track of the relatively fast changes.

### C. Pixel Locality

- In addition to wordlength reduction, a data compression scheme for further bandwidth reduction is proposed by utilizing pixel locality for Gaussian distributions in adjacent areas.
- The reason for such a criteria lies in the fact that a pixel that matches one distribution will most likely match the other.
- Various threshold values are selected to evaluate the efficiency for the memory bandwidth reduction.
- With foreground objects entering the scene, part of Gaussian distributions are replaced, which results in the decrease of number of similar Gaussian distributions.
- Foreground objects activities can vary in different video scenes, e.g., continuous activities in Fig. 8(a) where people going up and down the stairs all the time, and the two peak activity periods around frames 600–900 and frames 2100–2500 in Fig. 8(b), where people walk by in two discrete time periods.

### IV. RESULTS

- The segmentation unit is prototyped on an Xilinx VirtexII vp30 development board, as shown in Fig. 11.
- The 24 BRAMs used for the DDR controller can be reduced by using low depth Gaussian parameter buffers to write/read to the off-chip DDR memory.
- Dual-port block RAMs are used as video RAMS in the VGA controller, which are shared by different blocks of the complete surveillance system to display the results from different stages on a monitor.
- Thus, the memory requirements directly dedicated to the algorithm is low while the DDR and VGA controller utilize a substantial amount of memory.

### V. CONCLUSION

- By utilizing combined memory reduction schemes, off-chip memory access can be reduced by over 70%.
- With real time performance, tracking schemes can be evaluated in varied environments for system robustness testing.
- To address the issue a joint memory reduction scheme is proposed by utilizing pixel locality and wordlength reduction.
- By measuring similarity of neighboring Gaussian distributions with overlapping volume of two cubes, threshold can be set to classify Gaussian similarities.
- Careful tradeoffs should be made based on different application environments.

Did you find this useful? Give us your feedback

##### Citations

84 citations

### Cites background or methods from "A Hardware Architecture for Real-Ti..."

...In [25], the processing capability only reaches 7....

[...]

...References [24] and [25] used 140 bits per pixel, while [27] proposed the use of 116 bits....

[...]

...The segmentation circuit of [23] is improved in [24] and [25]....

[...]

...as shown in Section VII-B and reported in [25], can be fed with the luminance channel of the YCrCb color space....

[...]

...The circuit of [24] is improved in [25]....

[...]

58 citations

### Cites background or methods from "A Hardware Architecture for Real-Ti..."

...[28, 29], Minghua and Bermak [30] and Genovese et al....

[...]

...[28, 29] and Minghua and Bermak [30] are not able to process 41....

[...]

...[29] improves the memory throughput with respect to [28] employing a memory reduction scheme....

[...]

33 citations

### Cites background from "A Hardware Architecture for Real-Ti..."

...In the work of [33] and the later improvement of [34] the authors propose a real-time video segmentation/surveillance system using a GMM also handling memory bandwidth reduction requirements....

[...]

30 citations

### Cites methods from "A Hardware Architecture for Real-Ti..."

...Keywords Field programmable gate arrays · Fixedpoint arithmetic · Real time image processing · Video surveillance...

[...]

...…a spatial background subtraction technique (Jodoin et al 2007), a DSP-embedded implementation (Ierodiaconou et al 2006), a memory reduction scheme (Jiang et al 2009) or static algorithms where it is assumed that the background is fixed as the one proposed by Horprasert et al. (Karaman et al…...

[...]

27 citations

##### References

16,062 citations

7,660 citations

### "A Hardware Architecture for Real-Ti..." refers background or methods in this paper

...To overcome this, in [13], all updated Gaussian distributions are sorted according to the ratio ....

[...]

...In this section the used algorithm is briefly described, for a more thorough description we refer to [13]....

[...]

...In [13], only a frame rate of 11–13 fps is obtained even for a small frame size of 160 120 on an SGI O2 workstation....

[...]

...A background model based on pixel wise multimodal Gaussian distribution was proposed in [13] with robustness to multimodal background situations, which are quite common in both indoor and outdoor environments....

[...]

...Table I shows five segmentation algorithms that are cited by many literatures, namely frame difference (FD) [2]–[5], median filter [6]–[8], linear predictive filter (LPF) [1], [9]–[12], mixture of Gaussian (MoG) [13]–[19] and kernel density estimation (KDE) [20]....

[...]

2,553 citations

### "A Hardware Architecture for Real-Ti..." refers background in this paper

...In order to reduce hardware complexity found in parallel sorting networks, such as [33]–[35], while still maintaining the speed, a specific feature in the algorithm is explored....

[...]

2,432 citations

### "A Hardware Architecture for Real-Ti..." refers background in this paper

...From Table I it can be seen that the KDE approach has the highest segmentation quality which however comes at the cost of a high hardware complexity and even to a larger extent, increased memory requirements....

[...]

...Table I shows five segmentation algorithms that are cited by many literatures, namely frame difference (FD) [2]–[5], median filter [6]–[8], linear predictive filter (LPF) [1], [9]–[12], mixture of Gaussian (MoG) [13]–[19] and kernel density estimation (KDE) [20]....

[...]

^{1}

1,971 citations

### "A Hardware Architecture for Real-Ti..." refers background in this paper

...Table I shows five segmentation algorithms that are cited by many literatures, namely frame difference (FD) [2]–[5], median filter [6]–[8], linear predictive filter (LPF) [1], [9]–[12], mixture of Gaussian (MoG) [13]–[19] and kernel density estimation (KDE) [20]....

[...]

...In [1], comparisons on segmentation qualities are made to evaluate a variety of approaches....

[...]

##### Related Papers (5)

##### Frequently Asked Questions (15)

###### Q2. How can The authorreduce wordlength for Gaussian parameters?

By utilizing coarse parameter updating scheme, wordlength for each Gaussian parameters are reduced substantially, which effectively decrease the memory bandwidth to off-chip memories.

###### Q3. What is the CMOS image sensor's over sampling scheme?

In their implementation, an over sampling scheme by a higher clock frequency (100Mhz) is used to ensure the accuracy of the image data.

###### Q4. What is the common way to model a pixel?

A pixel containing several background object colors, e.g., the leaves of a swaying tree and a road, can be modeled with a mixture of Gaussian distributions.

###### Q5. How can the authors reduce the wordlength of Gaussian distributions?

In addition to wordlength reduction, a data compression scheme for further bandwidth reduction is proposed by utilizing pixel locality for Gaussian distributions in adjacent areas.

###### Q6. How can the sorting of a Gaussian distribution be changed?

By observing that only one Gaussian distribution is updated at a time and all the distributions are initially sorted, the sorting of Gaussian distributions can be changed to rearranging an updated distribution among ordered distributions.

###### Q7. How many memory bandwidth reductions are achieved?

By only saving non overlapping distributions together with the number of equivalent succeeding distributions, memory bandwidth is reduced.

###### Q8. What is the mainbottleneck of the whole system?

For the implementation of the hardware units, memory usage is identified as the mainbottleneck of the whole system, which is common in many image processing systems.

###### Q9. How many BRAMs are used for the DDR controller?

The 24 BRAMs used for the DDR controller can be reduced by using low depth Gaussian parameter buffers to write/read to the off-chip DDR memory.

###### Q10. What is the primary goal of the coarse parameter updating scheme?

with the primary goal to reduce wordlength, the coarse parameter updating scheme results in limited improvements to the segmentation results.

###### Q11. What is the case for the No_match signal?

For the case that no match is found, a MUX is used together with the No_match signal to update all parameters for the distribution (3 in the figure) with predefined values.

###### Q12. What are the benefits of the algorithm modifications?

Algorithm modifications covered in previous sections are implemented with potential benefits on hardware efficiency and segmentation quality.

###### Q13. How can the authors model a sphere distribution without extensive hardware overhead?

To be able to model background pixels using a single distribution without extensive hardware overhead, color space transformation has been investigated.

###### Q14. What are the limitations of a DDR controller?

Restrictions apply.DDR controller contributes to a large part of the whole design due to complicated memory command and data signal manipulations, clock schemes, and buffer controls.

###### Q15. What is the cost of transforming RGB values to cylindrical coordinates?

it is a hardware costly computation to transform RGB values to cylindrical coordinates, e.g., division and square root.