scispace - formally typeset
Open AccessProceedings ArticleDOI

Monte-Carlo Imaging for Optical Interferometry

Reads0
Chats0
TLDR
Using the statistical properties from Monte-Carlo Markov chains of images, it is shown how this code can place statistical limits on image features such as unseen binary companions.
Abstract
We present a flexible code created for imaging from the bispectrum and visibility-squared. By using a simulated annealing method, we limit the probability of converging to local chi-squared minima as can occur when traditional imaging methods are used on data sets with limited phase information. We present the results of our code used on a simulated data set utilizing a number of regularization schemes including maximum entropy. Using the statistical properties from Monte-Carlo Markov chains of images, we show how this code can place statistical limits on image features such as unseen binary companions.

read more

Content maybe subject to copyright    Report

PROCEEDINGS OF SPIE
SPIEDigitalLibrary.org/conference-proceedings-of-spie
Monte-Carlo imaging for optical
interferometry
Ireland, Michael, Monnier, John, Thureau, Nathalie
Michael J. Ireland, John D. Monnier, Nathalie Thureau, "Monte-Carlo imaging
for optical interferometry," Proc. SPIE 6268, Advances in Stellar
Interferometry, 62681T (28 June 2006); doi: 10.1117/12.670940
Event: SPIE Astronomical Telescopes + Instrumentation, 2006, Orlando,
Florida , United States
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 01 Sep 2020 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

Monte-Carlo Imaging for Optical Interferometry
Michael J. Ireland
a
, John D. Monnier
b
and Nathalie Thureau
b
a
Caltech, MC 150-21,1200 E. California Blvd., Pasadena, CA 91125, USA;
b
Department of Astronomy, University of Michigan, 501 East University Av., Ann Arbor,
MI 48109, USA
ABSTRACT
We present a flexible code created for imaging from the bispectrum and V
2
. By using a simulated annealing
method, we limit the probability of converging to local chi-squared minima as can occur when traditional imaging
methods are used on data sets with limited phase information. We present the results of our code used on a
simulated data set utilizing a number of regularization schemes including maximum entropy. Using the statistical
properties from Monte-Carlo Markov chains of images, we show how this code can place statistical limits on image
features such as unseen binary companions.
Keywords: astronomical software,aperture synthesis imaging, optical interferometry, Bayesian statistics
1. INTRODUCTION
It is well known that a large class of images can be consistent with a particular interferometric data set. This
is more true for optical inferferometry than radio interferometry, due to the general unavailability of absolute
visibility phase. An imaging algorithm such as CLEAN or Maximum Entropy combined with self-calibration
attempts to find the ‘best’ possible image consistent with the interferometric data. Both finding this ‘best’
image and interpretation of features within the image can be difficult, and in general requires some kind of
regularization. Regularization punishes images that look ’bad’ (such as having too much unresolved structure)
to find a compromise between lowering the χ
2
statistic and achieving an optimal regularization statistic.
The imaging code MACIM described in this paper is a Monte-Carlo Markov chain algorithm that aims to
both reliably find the global minimum of a regularized χ
2
statistic in image space, and to characterize this
minimum. The algorithm can operate without any regularization to find images that are optimal in the Bayesian
sense. In this mode, the code can also characterize the joint probability density of images consistent with the
data. Alternatively, the code can combine model-fitting and imaging or use novel regularizations based on a
priori imaging constraints such as the expected existence of connected regions of zero flux.
1.1. Markov Chains and Bayesian Inference
Bayes theorem states that the probability that a model θ (i.e. an image in our context) is correct given a given
data set D (which includes errors on the data) is:
1
p(θ|D)=
f(D|θ)p(θ)
f(D)
, (1)
where
f(D)=
f(D|θ)p(θ). (2)
Here p(θ) is the prior distribution of θ,andp(θ|D) is the posterior distribution. In the case of independent
Gaussian errors, the likelihood function f (D|θ) takes a multivariate Gaussian form:
Further author information: E-mail: mireland@gps.caltech.edu
Advances in Stellar Interferometry, edited by John D. Monnier,
Markus Schöller, William C. Danchi, Proc. of SPIE Vol. 6268,
62681T, (2006) · 0277-786X/06/$15 · doi: 10.1117/12.670940
Proc. of SPIE Vol. 6268 62681T-1
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 01 Sep 2020
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

f(D|θ) exp(
(D
m
(θ) D)/2σ
2
i
)=exp(χ
2
/2) (3)
Here D
m
(θ) is the model data (V
2
, bispectrum) corresponding to the image θ. For the context of imaging,
a regularization technique is contained in the pre-determined prior distribution p(θ).
When imaging or tackling many other problems with high dimensionality, the integral in Equation 2 can
not be evaluated in a reasonable time, so it is not possible to explicitly evaluate Equation 1. An alternative to
explicit evaluation is to use a Monte-Carlo Markov Chain technique to sample the regions of image space where
p(θ|D)ishighest.
1
The distribution of images in the resultant Markov Chain θ
j
then becomes a discrete version
of the posterior distribution p(θ|D) from which inferences on the set of possible images can be made.
2. MACIM IMAGING ALGORITHM
2.1. General Algorithm
The general algorithm used by MACIM is a simulated annealing algorithm with the Metropolis sampler.
1, 2
The
image state space θ
j
at iteration j consists of the set of pixel vectors {p
i
}
j
for all flux elements i with 1 i λ.
λ is the total number of flux elements. The flux in the image is constrained to be equal to 1, unless model fitting
of Section 2.3 is used. The vectors p
i
exist on a finite square grid with resolution at least λ/4max({B}), with
max({B}) the maximum baseline length. There are two classes of steps that the algorithm can take. The first
class of step moves a flux element, i.e. it randomly chooses a flux element I,modiesp
I
to form the tentative
state {q
i
} = {p
1
,p
2
, ..., p
I
+ s, ..., p
n
} for some step s, chosen to be in a random direction. Given a temperature
T , the modification to the image state is accepted with a probability:
p(j, j +1)=min(1, exp(
χ
2
({p
i
}
j
) χ
2
({q
i
})
2T
+ αR)). (4)
Here the χ
2
function is the total χ
2
function calculated directly from the interferometric data in oifits
format (V
2
, bispectrum, complex visibility). R is the change in the regularization parameter R and α is a
regularization scaling parameter. If the tentative state is accepted, then {p
i
}
j+1
is set to {q
i
}.Otherwise,we
set {p
i
}
j+1
= {p
i
}
j
.
The tentative moves for p
i
include several different types of flux steps s: moving one or several pixels along
one of the image axes, moving the flux unit anywhere in the image, or moving to the location of another
randomly selected flux element. Large steps in general have a smaller probability of success than small steps.
For this reason, the step type for the tentative move is chosen so that on average the probability of accepting
the tentative state is between 0.2 and 0.45. For all steps s, the probability of choosing the tentative reverse
transition at random is equal to the probability of choosing the forward transition.
The configuration space entropy (i.e. the logarithm of the image degeneracy) does not explicitly come into
Equation 4, but does enter the picture if one wishes to find the most probable, or mode image. To understand
why this is, consider the image representation {N
k
} where N
k
represents the number of flux elements in pixel
k. Two images with the same {N
k
} are equal but can be degenerate, as each can be formed by a number of
possible state vectors {p
i
}. We will follow the notation of Ref. 3, and call this the multiplicity W :
W =
λ!
N
1
!N
2
!...N
n
!
. (5)
Here n is the total number of pixels in the image. Changing the total number of image elements λ was done in
Ref. 3 by assuming a uniform prior on λ: all numbers of non-zero flux elements were assumed equally probable.
This means that the normalized prior distribution of {N
k
} is given by:
p({N
k
})=
W
n
(1δ)λ
, (6)
Proc. of SPIE Vol. 6268 62681T-2
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 01 Sep 2020
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

with the parameter δ = 0. In general, this prior distribution for {N
k
} with δ = 0 does not give ‘sensible’
images when λ was permitted to vary (the Markov chain would converge at very high χ
2
and a low number of
elements). Therefore, non-zero values of, δ can be input as an optional parameter. Note that δ = 1 is equivalent
to the prior distribution for p({N
k
}) that all configurations are equally probable, as opposed to all values of λ
being equally probable.
The second class of step consists of adding a new flux element in pixel K or removing flux element I (i.e.
changing λ). This step is intrinsically asymmetrical, as the probability of the reverse step not equal to the forward
step. However, for δ = 0, the ratio of the probabilities of the forward and reverse steps is equal to the inverse of
the ratio of the prior probability from Equation 6, meaning that Equation 4 is still appropriate for determining
the Metropolis algorithm acceptance probability. For other values of δ, the exponential function in Equation 4
is multiplied by n
δ
when adding a flux element, and divided by n
δ
when removing a flux element. For removing
flux element I,weuse{q
i
} = {p
1
,p
2
, ..., p
I1
,p
I+1
, ..., p
n
}, and for adding to pixel K we use {q
i
} = {p
i
,K}.
The annealing temperature T is modified based on the reduced χ
2
, χ
2
r
, according to the following algorithm:
T
j+1
= T
j
+
(χ
2
r,j
γT
j
)(1 χ
2
t
2
r,j
)
j
(7)
The parameter γ is always greater than 1 (set to 4 by default). The other parameters are the reduced
χ
2
target χ
2
t
and the timescale of temperature changes j. This algorithm fixes T
j
to be near χ
2
r,j
during
convergence, and then fixes χ
2
r
to be near χ
2
t
once the algorithm has converged. A minimum temperature limit
T
min
can also be placed on T
j
.
2.2. Regularizers
There are two currently implemented regularizers in MACIM, although many are possible given that no derivative
is required, as is often the case for other imaging algorithms. The first regularizer is simply the maximum entropy
regularizer R = log(W ). With a sufficient number of flux elements, MACIM can therefore be used to find the
maximum entropy regularized image. The second implemented regularizer is a dark interaction energy regularizer.
This regularizer is the sum of all pixel boundaries with zero flux on either side of the pixel boundary. Inspired by
the Ising model, this regularizer encourages large regions of dark space in-between regions of ux and represents
a means to utilize aprioriknowledgeofsourcestructure.
2.3. Model Fitting
For certain astrophysical targets, a combination of model-fitting and imaging can significantly aid in data in-
terpretation. An example of this is the point-source plus extended flux images of VY Cma and NML Cyg in
Ref. 4 where the use of a maximum entropy prior with a central point source changed the image morphology
significantly. Model fitting is combined with imaging in MACIM by varying model parameters simultaneously
with flux movement. Currently, the only implemented model fitting option is a centrally-located uniform disk
(or point source) that takes up some fraction of the total image flux, and an over-resolved (background) flux
component. This model has three parameters: flux fraction of the central source, diameter of the central source
and over-resolved flux fraction. Any of these parameters can be fixed or be allowed to move freely according to
the Metropolis-Hastings algorithm. The parameter step sizes are chosen so that the probability of accepting the
tentative new parameter is on average 0.3.
2.4. Specific Modes of Operation
There are several ways in which MACIM can create images:
Bayesian mean map. This is the default mode of operation, with the settings T
min
=1,χ
2
t
=0,λ fixed at
the number of input degrees of freedom and no regularization. Starting from an initial map (by default a
point source), the simulated annealing algorithm converges to a global minimum where as long as χ
2
r
(default γ =4)wehaveT =1. OncewehaveT = 1, the properties of Markov chains enable the full
posterior distribution of images to be sampled. Optionally, the full chain can be output instead of just the
Bayesian mean.
Proc. of SPIE Vol. 6268 62681T-3
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 01 Sep 2020
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

Variable-λ Bayesian mean map. By choosing λ
min
max
, the number of image elements is permitted to
vary. Due to frequent convergence problems, δ is set to 0.1 rather than 0 by default.
Bayesian mode map. This output is also output whenever the Bayesian mean map is output. The average
ofanumberofimagesnearthemaximumofp({N
k
}) can be output, giving a variant of a maximum-entropy
map (a maximum multiplicity map). Due to the quantization of flux, averaging a number of images (say,
10% of the final chain) about the mode is more aesthetically pleasing than the the single mode map.
An alternative to this kind of averaging is using the single mode map to adaptively bin the image plane
according to the level of quantization noise.
Pseudo-maximum entropy map. By setting T
min
=0,χ
2
t
=1andxingλ to a large number (e.g. double
the number of input degrees of freedom), multiplicity (which converges to entropy for large λ) is maximized
while fixing χ
2
r
= 1. Three kinds of potentially useful maps are simultaneously output in this case: the
mean map, the mode map and the ‘maximum entropy’ map, which contains the same number of images as
the mode map but weights the multiplicity so that the mean χ
2
r
in the final image is 1.
Regularized map. In general, model fitting and dark interaction energy regularization is most easily per-
formedusingafixedλ. Clearly a large range of possible input parameters are possible here, depending on
the exact nature of any aprioriinformation.
2.5. Difficulties and Future Work
The seemingly greatest difficulty in using MACIM to make images is choosing the value of λ (or δ if λ is
allowed to vary). One argument for an optimal λ choice comes from the requirement that MACIM converges
and well-samples the posterior distribution.
The optimal value of the acceptance probability p(j, j + 1) is thought to be in the range of 0.2 to 0.5.
1
For acceptance probabilities outside this range, the Markov Chain samples the posterior distribution at a much
slower rate. Acceptance probabilities in this range can only be found at moderate values of λ. For this reason,
MACIM can not well sample the posterior distribution in the high-λ limit (where it becomes just like for the
MEM algorithm) or in the low-λ limit (as occurs if δ is set near zero).
Another argument for optimal λ may be the desire that the mean value of χ
2
r
is 1.0. In principle, there is
a minimum in mean χ
2
r
(generally less than 1.0) at some value of λ = λ
min
,andχ
2
r
increases on either side of
this minimum.
3
Therefore, there should be two λ values for which χ
2
r
= 1. However, the lower-λ value for mean
χ
2
r
= 1 can not be well sampled, because of the very high barriers to flux movement or adding/removing flux
elements. Certainly this problem will require more work for either completely automatic operation of MACIM or
at least a well-defined knowledge of the influence of λ on deriving statistical inferences from the output MACIM
Markov Chain.
3. SOFTWARE IMPLEMENTATION
MACIM is written in the c programming language, with an option for multi-threaded operation (multiple Markov
chains running simultaneously, which are combined on completion). The transform between image-space and
complex visibility is stored in memory as vectors containing exp(iu
m
x
k
) and exp(iv
m
y
k
) for baselines m and
pixels k. u
m
and v
m
are the standard u and v coordinates for baseline m. Splitting the pixel coordinates into
x
k
and y
k
in this fashion means that only 2M
b
n complex numbers need to be stored in memory (with M
b
the number of baselines). At each iteration, the mathematical functions required are limited to elementary
arithmetic operations and one evaluation of the exponential function (Equation 4). No evaluations of FFTs,
trigonometric functions or square roots are required. For this reason, the millions of iterations required to
characterize the posterior distribution can be run on a 2 GHz class computer in several minutes for a typical
modern interferometric data set.
Only one argument is required to run MACIM: the input oifits file name. However, the default image size of
λ/ min(B) with min(B) the minimum baseline length is often not appropriate for a given data set. The maximum
number of image elements λ
max
and the pixel scale are other parameters that sometimes should be tweaked for
Proc. of SPIE Vol. 6268 62681T-4
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 01 Sep 2020
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

Citations
More filters
Journal ArticleDOI

Imaging the Surface of Altair

TL;DR: Using optical long-baseline interferometry, a near-infrared image of the rapidly rotating hot star Altair is constructed with a resolution of <1 milliarcsecond that clearly reveals the strong effect of gravity darkening on the highly distorted stellar photosphere.
Journal ArticleDOI

Accreting protoplanets in the LkCa 15 transition disk

TL;DR: In this paper, adaptive optics observations of LkCa 15 that probe within the disk clearing are reported, with accurate source positions over multiple epochs spanning 2009-2015, infer the presence of multiple companions on Keplerian orbits.
Journal ArticleDOI

LkCa 15: A Young Exoplanet Caught at Formation?

TL;DR: In this article, the authors reported the direct imaging discovery of a likely (proto)planet around the young (~2 Myr) solar analog LkCa 15, located inside a known gap in the protoplanetary disk (a "transitional disk").
References
More filters
Book

Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference

Dani Gamerman
TL;DR: Model Adequacy Model Choice: MCMC Over Model and Parameter Spaces Convergence Acceleration Exercises Further topics in MCMC are explained.
Book

Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues

TL;DR: This book describes the development of Markov models for discrete-time Carlo simulation and some of the models used in this study had problems with regard to consistency and Ergodicity.
Journal ArticleDOI

Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference

S. E Ahmed
- 01 Feb 2008 - 
TL;DR: The current edition of the handbook is intended to provide practitioners with a comprehensive resource for the use of software package Stata, which provides almost all standard commonly used methods of data analysis.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What have the authors contributed in "Monte-carlo imaging for optical interferometry" ?

The authors present a flexible code created for imaging from the bispectrum and V. The authors present the results of their code used on a simulated data set utilizing a number of regularization schemes including maximum entropy. Using the statistical properties from Monte-Carlo Markov chains of images, the authors show how this code can place statistical limits on image features such as unseen binary companions. By using a simulated annealing method, the authors limit the probability of converging to local chi-squared minima as can occur when traditional imaging methods are used on data sets with limited phase information. 

The general algorithm used by MACIM is a simulated annealing algorithm with the Metropolis sampler.1, 2 The image state space θj at iteration j consists of the set of pixel vectors {pi}j for all flux elements i with 1 ≤ i ≤ λ. λ is the total number of flux elements. 

The main benefit of MACIM are the simulated annealing algorithm that can converge where self-calibration does not, and the flexibility in regularization techniques. 

Starting from an initial map (by default a point source), the simulated annealing algorithm converges to a global minimum where as long as χ2r < γ (default γ = 4) the authors have T = 1. 

the only implemented model fitting option is a centrally-located uniform disk (or point source) that takes up some fraction of the total image flux, and an over-resolved (background) flux component. 

By adding up the flux elements in a 3× 3 pixel region for each step in the Markov Chain and calculating the fraction of time there is non-zero flux, the confidence level for the feature is only 54%. 

The transform between image-space and complex visibility is stored in memory as vectors containing exp(iumxk) and exp(ivmyk) for baselines m and pixels k. um and vm are the standard u and v coordinates for baseline m. 

For this data set, one could argue that the optimal number of image elements is about 2000, because with 2000 elements the mean value of χ2r is 1.0 at unity temperature. 

the lower-λ value for mean χ2r = 1 can not be well sampled, because of the very high barriers to flux movement or adding/removing flux elements. 

Inspired by the Ising model, this regularizer encourages large regions of dark space in-between regions of flux and represents a means to utilize a priori knowledge of source structure. 

Given that there are 2000 flux elements, an appropriate question phrasing is “What is the confidence level for the top right feature containing more than 1/2000th of the flux”. 

There is a very small chance that many (in this case ∼100) flux elements can congregate in a single pixel, so the presence of a point-source becomes strong a priori knowledge that influences the final image. 

Splitting the pixel coordinates into xk and yk in this fashion means that only 2Mb √ n complex numbers need to be stored in memory (with Mb the number of baselines). 

This is more true for optical inferferometry than radio interferometry, due to the general unavailability of absolute visibility phase. 

Given a temperature T , the modification to the image state is accepted with a probability:p(j, j + 1) = min(1, exp( χ2({pi}j) − χ2({qi})2T + α∆R)).