Motion compensation of ultrasonic perfusion images
Sebastian Sch¨afer
a
, Kim Nylund
b,c
, Odd H. Gilja
b,c
and Klaus D. T¨onnies
a
a
Department of Simulation and Graphics, University of Magdeburg, Universit¨atsplatz 2,
39106 Magdeburg, Germany;
b
Institute of Medicine, University of Bergen, Bergen, Norway;
c
National Centre of Ultrasound in Gastroenterology, Haukeland University Hospital,
Bergen, Norway
ABSTRACT
Contrast-enhanced ultrasound (CEUS) is a rapid and inexpensive medical imaging technique to assess tissue
perfusion with a high temporal resolution. It is composed of a sequence with ultrasound brightness values and a
contrast sequence acquired simultaneously. However, the image acquisition is disturbed by various motion influ-
ences. Registration is needed to obtain reliable information of spatial correspondence and to analyze perfusion
characteristics over time. We present an approach to register an ultrasonography sequence by using a feature
label map. This label map is generated from the b-mode data sequence by a Markov-Random-Field (MRF)
based analysis, where each location is assigned to one of the user-defined regions according to its statistical
parameters. The MRF reduces the chance that outliers are represented in the label map and provides stable
feature labels over the time frames. A registration consisting of rigid and non-rigid transformations is determined
consecutively using the generated label map of the respective frames for similarity calculation. For evaluation,
the standard deviation within specific regions in intestinal CEUS images has been measured before and after
registration resulting in an average decrease of 8.6 %. Additionally, this technique has proven to be more robust
against noise influence compared to similarity calculation based on image intensities only. The latter leads only
to 7.6 % decrease of the standard deviation.
Keywords: Motion compensation, registration, 2D ultrasound, perfusion imaging, CEUS, MRF
1. INTRODUCTION
2D ultrasonography (US) is one of the most widely used medical imaging techniques. It enables immediate and
inexpensive examinations with high spatial resolution. It is well suited for imaging abdominal and thoracic organs.
There are no contraindications and the patient is not exposed to radiation. US is also used for perfusion imaging
employing contrast agents (CA), consisting of gas-filled micro bubbles that have a high degree of echogenicity
1
as they increase the US backscatter. By acquiring 2D multi-frame data, propagation and contrast uptake after
the injection of the CA can be measured. This has become a suitable tool for delineating the vascular structure
in normal and pathological tissue in order to detect primary tumors or metastases in various organs or to assess
disease activity in Crohn’s disease.
2, 3
CEUS of the intestine is also included in the latest guidelines.
4
The
manual perfusion analysis is performed by extracting and understanding the perfusion kinetics of the blood
in the tissue of interest from the acquired multi-frame data. An essential advantage of CEUS imaging is the
assessment of contrast enhancement with considerably higher temporal resolution compared to other perfusion
imaging techniques.
5
During 2D contrast enhanced US (CEUS) examinations studying perfusion, the sonographer normally holds
the probe still in a particular position and orientation to image a suitable slice of tissue of interest during CA
administration. However, the data acquired with this examination methodology often contains significant motion.
The reason is that patient movement through breathing and differently induced motion (intrinsic induced motion)
are present in addition to the motion caused by tilting or moving the US probe (extrinsic induced motion), since
US imaging is normally performed hand-held.
Further corresponding author information:
S. Sch¨afer: E-mail: sebastian.schaefer@ovgu.de, Telephone: +49 391 67 11441
Figure 1: A representative frame of a CEUS acquisition of a patient with a stenosis of the small bowel due to
Crohn’s disease, showing b-mode data on the left and contrast data on the right hand side.
While this motion can normally be interpreted by well trained physicians,
6
in computer-assisted analysis the
different image frames of a time-dependent acquisition are required to be kept aligned in order to extract valid
perfusion parameters over time.
The acquisition usually produces two parallel image sequences: standard b-mode
∗
and the measured CA
enhancement (Fig. 1). The frames are acquired alternatingly resulting in around 10 frames per second. The
CA only slightly effects the signal intensity in the b-mode data sequence which therefore has an approximately
constant brightness over time for a specific organ or tissue. This paper addresses motion compensation by image
registration of US b-mode frames by taking intestinal images as en example. Therefore, a statistically based
Markov Random Field (MRF) segmentation
7
of the b-mode sequence is used to produce feature images which
are less subject to noise influence over time. The markov property provides local interaction between labels and
compensates the influence of speckle and other artifacts. The generated feature images are then used to calculate
the registration transformation.
A particular restriction of 2D image acquisition is that the observed region may be missed during part of
the examination (out-of-plane motion), due to the three dimensional nature of the motion described above.
This is taken into account by defining temporal regions with groups of frames that show the same region of
interest (ROI).
8
Only frames within the same temporal region can be registered and thus can provide valid pixel
correspondences over time.
2. RELATED WORK
CEUS imaging has a time dependent component resulting in motion influence stemming from different sources. It
also comprises functional information in terms of blood perfusion leading to CA enhancement. As quantification
and visualization of functional signals require correct spatial correspondence of temporal samples the removal of
motion influence is necessary. As this is relatively new, the literature does not yet offer adapted methods for the
specific scenario of CEUS data registration.
However, intramodal registration in US images has been applied in other applications. Rohling et al.
9
pro-
posed the first application of automatic registration for US images. They combine multiple 3D US volumes
acquired from different angles to improve data quality through averaging. For accurate results a registration
∗
brightness modulation: measured amplitudes are mapped onto intensity values
is required to align the volumes. Registration has been performed using correlation of 3D gradient informa-
tion which has been developed for MRI to CT registration originally, delivering improved segmentation and
visualization of the combined volume.
Shekhar et al.
10
used mutual information for the similarity calculation to register time-dependent 3D US
data of the left ventricle. Calculation of the mutual information is preceded by median filtering the image data,
dropping least significant bit information and using partial volume distribution interpolation when transforming
the moving image. This results in a smoother objective function and reduces the probability of the optimizer to
end up in a local maxima.
To calculate and analyze the deformation of the human heart Ledesma-Carbayo et al.
11
presented a combined
spatio-temporal registration procedure. Similarity of the deformed 2D US frames was measured by the mean
squared distance of all frames in the temporal sequence to a specified reference frame. Transformation parameters
for B-Splines are found for all frames simultaneously restricting the parameters to be continuously smooth over
time.
As opposed to the above cited papers Woo et al.
12
present a method where the calculation of similarity
is not directly extracted from intensity values. Instead, the local phase information
13
is used as a feature for
registration. The technique is suited to register consecutive frames and was tested on synthetic and cardiac US
images.
The b-mode sequence of 2D contrast enhanced US data, which can ideally be used to calculate registration,
is disturbed by many influences. US data has a low signal-to-noise ratio resulting in poor data quality and
artifacts. CA is detected with a different modulation, but it has a slight impact on b-mode measured intensity.
Moreover, 2D imaging depicts an image plane while displacement also occurs in 3D. This causes objects moving
out of the imaging plane leading to varying intensity levels at similar locations in the image over time. Although
we try to register frames with in-plane motion only, it is impossible to eliminate all 3D motion influence. The
afore mentioned approaches have not been designed to manage these influences.
The approach described in this paper is targeted to generate more stable features (refer to Fig. 3) over time
to overcome the above mentioned limitations. Therefore, segmentation-based label maps are generated. which
lead to registration of object boundaries, as the use of these label maps does not provide information within the
regions anymore.
3. METHOD
The registration procedure consists of four main steps (Fig. 2). The first step is a frame selection which groups
the frames in temporal regions and excludes frames from being regarded in registration. Frames within the same
temporal region are showing almost the same ROI ideally disturbed by motion in the image plane only. Exclusion
of frames is enforced if there are not enough in-plane motion frames to be registered. For each temporal region
the frame with highest average similarity to all region frames is defined as reference frame for registration. This
user-assisted technique is described in detail by Sch¨afer et al.
8
Frame selection
to remove
out-of-plane motion
MRF-labelmap
generation
used as feature map for
registration
Rigid registration
using translation and
rotation
Non-rigid
registration
using B-Splines
Figure 2: The four steps of the workflow: The frame selection being handled according to Schaefer et al.
8
and
the generation of the label map representing the key part of this work.
The second step comprises the generation of features pursuing the objective to provide stable information
over time of a specific location in the image. These features are derived by brightness values of the b-mode
sequence and shall be used to calculate the registration transformations. When b-mode data is used, brightness
values of the same location do not vary over time, but noise influence will hamper the matching process. The
(a) (b) (c) (d) (e) (f)
Figure 3: Two consecutive frames of the b-mode sequence (a, b), the corresponding label maps (d, e) and the
corresponding absolute differences both (c, f).
only reason this assumption might not hold is that the temporal regions defined by step one do not exclude all
parts with out-of-plane motion. Intensity-based segmentation generally will produce a stable value at specific
locations (see Fig. 5b). However, the result will still be influenced by noise and this influence is likely to be
different over time. Thus, we introduce a MRF smoothness prior
7
in a second step to increase the probability of
labels to have the same label as its neighbors. This smoothing of the initial labeling should enhance the temporal
robustness of the feature (compare Fig. 5c). It should be mentioned that the labeling is only intended for the
purpose of a feature for registration and does not represent structural semantics like tissue classes.
An initial labeling is obtained using predefined gray value classes (with mean and standard deviation) and
calculating the energy of each pixel. To obtain predefined classes for a dataset the user has to specify different
areas each exhibiting a particular average brightness µ and brightness variation σ. It has been proven that
between 2 and 4 areas are suitable. The initial label map is created by determining the site energy E
site
for each
pixel in the image w.r.t. to all labels. The one yielding the lowest energy is taken as initial label:
7
E
site
(i, L) = ln
√
2πσ
L
+
(i − µ
L
)
2
2σ
L
, (1)
where i is the current site and L the current label class. Subsequently, the iterative conditional modes (ICM)
method
14
is used to incorporate the markovian property to ensure the labels are dependent on their neighboring
label values,
7
resulting in a MRF formulation.
The local energy E
local
for each pixel is calculated for each of the labels and the label with the lowest energy
is taken as a new label at this site:
E
local
(i, L) = E
site
(i, L) +
X
{i,j}∈C
i
βγ(L
i
, L
j
) (2)
where C
i
is the set of all neighbors paired with i (Moore-neighborhood is used). β is a control parameter defining
the influence of the MRF-prior and thus the homogeneity of regions. γ(·, ·) returns -1 if its parameters (label
values) are equal and 1 else.
After each iteration the global energy, consisting of the sum of E
local
for all sites in the image, is calculated.
The ICM terminates if convergence of the global energy is reached.
Calculation of the label map is designed to generate a more robust guidance of the registration. In Fig. 3 two
consecutive images with very little motion influence and the corresponding label maps are shown. The difference
images indicate a higher accordance using the label map and a lower risk of arbitrary dissimilarity due to noise.
Step three and four of our registration system consist of the registration calculation itself, incorporating rigid
transformations (step three) and consecutive B-Spline-based transformations (step four). The goal is to remove
coarse motion influence, mainly caused by extrinsic motion influence and patient breathing, with translation and
(a) (b) (c)
(d) (e) (f)
Figure 4: (a) label map of the b-mode frame depicted in (b), where the area of interest for similarity calculation
is shown (outlined in white). (d) shows anatomical annotations of the current frame. (e) is the corresponding
contrast frame. (c) and (f) are the transformed b-mode and contrast frames after registration to a fixed image
frame.
rotation (rigid transformations) and the remaining non-linear motion with B-Spline controlled transformations
with a 8 × 8 point grid (Fig. 4c, 4f). In both cases registration is performed using the label map of the b-mode
sequence within the temporal regions defined in step one. I.e. each frame of a temporal region is registered to
a fixed image frame. The fixed image frame is the frame of the respective temporal region with highest average
similarity to all other frames and is determined by the method described in Schaefer et al.
8
The difference between the label map of the fixed image and the moving image is measured by correlation.
Rigid registration is performed in a multi-resolution setup with increasing accuracy for higher resolution steps.
To optimize the 64 point locations (128 parameters) for the B-Splines a bounded limited-memory Broyden-
Fletcher-Goldfarb-Shanno algorithm is used, whereas we constrain each point to only move within half of the
spacing of the grid points, to disallow degeneration of the grid. The transformation parameters of each frame
registration are initialized with the final transformation parameters from the preceding frame for reasons of
stability and efficiency. After the registration has terminated, the final transformations are also applied to the
contrast sequence.