# A Hardware Architecture for Real-Time Video Segmentation Utilizing Memory Reduction Techniques

## Summary (3 min read)

### Introduction

- A multimodal background is caused by repetitive background object motion, e.g., swaying trees, flickering of the monitor, etc. Furthermore, FPGAs gives us real time performance, hard to achieve with DSP processors, while limiting the extensive design work required for application specific integrated circuits .
- In this work an automated surveillance system has been chosen as the target applications.
- Sections II and III discuss the original algorithm and possible modifications for hardware efficiency.

### II. GAUSSIAN MIXTURE BACKGROUND MODEL

- For a more thorough description the authors refer to [13].
- Each distribution , has a weight, , that indicates the probability of matching a new incoming pixel, .
- The higher the weight, the more likely the distribution belongs to the background.
- For those unmatched, the weight is updated according to (8) while the mean and the variance remain the same.
- Instead of mainly focusing on improving robustness, the authors propose several modifications to the algorithm with the major concern on their impact on potentially improved hardware efficiency.

### A. Color Space Transformation

- Multimodal situations only occur when repetitive background objects are present in the scene.
- By transforming RGB into space, the correlation among different color coordinates are mostly removed, resulting in nearly independent color components.
- As shown in Fig. 2(a), most pixel distributions are transformed from cylinders back to spheres, capable of being modeled with a single spherical distribution.
- The authors propose two simplifications to the algorithm.
- The distribution with dominant weight but large variance does not get to the top, identified as background distribution.

### III. HARDWARE ARCHITECTURE

- To perform the algorithm with VGA resolution in real-time, a dedicated hardware architecture, with a streamlined data flow and memory bandwidth reduction schemes, is implemented to address the computation capacity and memory bandwidth bottlenecks.
- Algorithm modifications covered in previous sections are implemented with potential benefits on hardware efficiency and segmentation quality.
- The image data is captured with one color component at a time, and the three color components are sent to the system after serial-parallel transformation.
- With the image data captured and transformed, the match and switch block tries to match the incoming pixel with Gaussian distributions obtained from the previous frame.
- Similar updating schemes are utilized for variance update.

### A. Sorting

- The updated Gaussian parameters have to be sorted for use in the next frame.
- By observing that only one Gaussian distribution is updated at a time and all the distributions are initially sorted, the sorting of Gaussian distributions can be changed to rearranging an updated distribution among ordered distributions.
- The output of each comparator signifies which distribution is to be multiplexed to the output, e.g., if the weight of any unmatched distribution is smaller than the updated one, all unmatched distributions below the current one is switched to the output at the next lower MUX.
- Since only three Gaussians are utilized in their implementation this is a trivial task.
- A comparison of hardware complexity between proposed sorting architecture and other schemes mentioned above is shown in Authorized licensed use limited to: Lunds Universitetsbibliotek.

### B. Wordlength Reduction

- Slow background updating requires large dynamic range for each parameter in the distributions, since parameter values are changed slightly between frames but could accumulate over time.
- Together with 16 bits weight and integer parts of the mean and the variance, 81–100 bits are needed for a single Gaussian distribution.
- From (13), a small positive or negative number is derived depending on whether the incoming pixel is above or below the current mean.
- The coarse updating scheme on the other hand relieves the problem to certain extent, where consecutive ones are added or subtracted to keep track of the relatively fast changes.

### C. Pixel Locality

- In addition to wordlength reduction, a data compression scheme for further bandwidth reduction is proposed by utilizing pixel locality for Gaussian distributions in adjacent areas.
- The reason for such a criteria lies in the fact that a pixel that matches one distribution will most likely match the other.
- Various threshold values are selected to evaluate the efficiency for the memory bandwidth reduction.
- With foreground objects entering the scene, part of Gaussian distributions are replaced, which results in the decrease of number of similar Gaussian distributions.
- Foreground objects activities can vary in different video scenes, e.g., continuous activities in Fig. 8(a) where people going up and down the stairs all the time, and the two peak activity periods around frames 600–900 and frames 2100–2500 in Fig. 8(b), where people walk by in two discrete time periods.

### IV. RESULTS

- The segmentation unit is prototyped on an Xilinx VirtexII vp30 development board, as shown in Fig. 11.
- The 24 BRAMs used for the DDR controller can be reduced by using low depth Gaussian parameter buffers to write/read to the off-chip DDR memory.
- Dual-port block RAMs are used as video RAMS in the VGA controller, which are shared by different blocks of the complete surveillance system to display the results from different stages on a monitor.
- Thus, the memory requirements directly dedicated to the algorithm is low while the DDR and VGA controller utilize a substantial amount of memory.

### V. CONCLUSION

- By utilizing combined memory reduction schemes, off-chip memory access can be reduced by over 70%.
- With real time performance, tracking schemes can be evaluated in varied environments for system robustness testing.
- To address the issue a joint memory reduction scheme is proposed by utilizing pixel locality and wordlength reduction.
- By measuring similarity of neighboring Gaussian distributions with overlapping volume of two cubes, threshold can be set to classify Gaussian similarities.
- Careful tradeoffs should be made based on different application environments.

