What have the authors contributed in "Monte-carlo imaging for optical interferometry" ?

The authors present a flexible code created for imaging from the bispectrum and V. The authors present the results of their code used on a simulated data set utilizing a number of regularization schemes including maximum entropy. Using the statistical properties from Monte-Carlo Markov chains of images, the authors show how this code can place statistical limits on image features such as unseen binary companions. By using a simulated annealing method, the authors limit the probability of converging to local chi-squared minima as can occur when traditional imaging methods are used on data sets with limited phase information.

What is the general algorithm used by MACIM?

The general algorithm used by MACIM is a simulated annealing algorithm with the Metropolis sampler.1, 2 The image state space θj at iteration j consists of the set of pixel vectors {pi}j for all flux elements i with 1 ≤ i ≤ λ. λ is the total number of flux elements.

What are the main benefits of MACIM?

The main benefit of MACIM are the simulated annealing algorithm that can converge where self-calibration does not, and the flexibility in regularization techniques.

How many flux elements are in the top right corner of the map?

By adding up the flux elements in a 3× 3 pixel region for each step in the Markov Chain and calculating the fraction of time there is non-zero flux, the confidence level for the feature is only 54%.

What is the standard u and v coordinates for m?

The transform between image-space and complex visibility is stored in memory as vectors containing exp(iumxk) and exp(ivmyk) for baselines m and pixels k. um and vm are the standard u and v coordinates for baseline m.

What is the optimal number of image elements for this data set?

For this data set, one could argue that the optimal number of image elements is about 2000, because with 2000 elements the mean value of χ2r is 1.0 at unity temperature.

Why is the lower- value not well sampled?

the lower-λ value for mean χ2r = 1 can not be well sampled, because of the very high barriers to flux movement or adding/removing flux elements.

What is the main purpose of the regularizer?

Inspired by the Ising model, this regularizer encourages large regions of dark space in-between regions of flux and represents a means to utilize a priori knowledge of source structure.

What is the confidence level for the top right feature?

Given that there are 2000 flux elements, an appropriate question phrasing is “What is the confidence level for the top right feature containing more than 1/2000th of the flux”.

What is the probability of a point source in the top right corner?

There is a very small chance that many (in this case ∼100) flux elements can congregate in a single pixel, so the presence of a point-source becomes strong a priori knowledge that influences the final image.

How many pixel coordinates are needed to be stored in memory?

Splitting the pixel coordinates into xk and yk in this fashion means that only 2Mb √ n complex numbers need to be stored in memory (with Mb the number of baselines).

(Open Access) Monte-Carlo Imaging for Optical Interferometry (2020) | Michael J. Ireland

Q: What is the only model fitting option in MACIM?

the only implemented model fitting option is a centrally-located uniform disk (or point source) that takes up some fraction of the total image flux, and an over-resolved (background) flux component.

PROCEEDINGS OF SPIE

SPIEDigitalLibrary.org/conference-proceedings-of-spie

Monte-Carlo imaging for optical

interferometry

Ireland, Michael, Monnier, John, Thureau, Nathalie

Michael J. Ireland, John D. Monnier, Nathalie Thureau, "Monte-Carlo imaging

for optical interferometry," Proc. SPIE 6268, Advances in Stellar

Interferometry, 62681T (28 June 2006); doi: 10.1117/12.670940

Event: SPIE Astronomical Telescopes + Instrumentation, 2006, Orlando,

Florida , United States

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 01 Sep 2020 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

Monte-Carlo Imaging for Optical Interferometry

Michael J. Ireland

, John D. Monnier

and Nathalie Thureau

Caltech, MC 150-21,1200 E. California Blvd., Pasadena, CA 91125, USA;

Department of Astronomy, University of Michigan, 501 East University Av., Ann Arbor,

MI 48109, USA

ABSTRACT

We present a ﬂexible code created for imaging from the bispectrum and V

. By using a simulated annealing

method, we limit the probability of converging to local chi-squared minima as can occur when traditional imaging

methods are used on data sets with limited phase information. We present the results of our code used on a

simulated data set utilizing a number of regularization schemes including maximum entropy. Using the statistical

properties from Monte-Carlo Markov chains of images, we show how this code can place statistical limits on image

features such as unseen binary companions.

Keywords: astronomical software,aperture synthesis imaging, optical interferometry, Bayesian statistics

1. INTRODUCTION

It is well known that a large class of images can be consistent with a particular interferometric data set. This

is more true for optical inferferometry than radio interferometry, due to the general unavailability of absolute

visibility phase. An imaging algorithm such as CLEAN or Maximum Entropy combined with self-calibration

attempts to ﬁnd the ‘best’ possible image consistent with the interferometric data. Both ﬁnding this ‘best’

image and interpretation of features within the image can be diﬃcult, and in general requires some kind of

regularization. Regularization punishes images that look ’bad’ (such as having too much unresolved structure)

to ﬁnd a compromise between lowering the χ

statistic and achieving an optimal regularization statistic.

The imaging code MACIM described in this paper is a Monte-Carlo Markov chain algorithm that aims to

both reliably ﬁnd the global minimum of a regularized χ

statistic in image space, and to characterize this

minimum. The algorithm can operate without any regularization to ﬁnd images that are optimal in the Bayesian

sense. In this mode, the code can also characterize the joint probability density of images consistent with the

data. Alternatively, the code can combine model-ﬁtting and imaging or use novel regularizations based on a

priori imaging constraints such as the expected existence of connected regions of zero ﬂux.

1.1. Markov Chains and Bayesian Inference

Bayes theorem states that the probability that a model θ (i.e. an image in our context) is correct given a given

data set D (which includes errors on the data) is:

p(θ|D)=

f(D|θ)p(θ)

f(D)

, (1)

where

f(D)=



f(D|θ)p(θ)dθ. (2)

Here p(θ) is the prior distribution of θ,andp(θ|D) is the posterior distribution. In the case of independent

Gaussian errors, the likelihood function f (D|θ) takes a multivariate Gaussian form:

Further author information: E-mail: mireland@gps.caltech.edu

Advances in Stellar Interferometry, edited by John D. Monnier,

Markus Schöller, William C. Danchi, Proc. of SPIE Vol. 6268,

62681T, (2006) · 0277-786X/06/$15 · doi: 10.1117/12.670940

Proc. of SPIE Vol. 6268 62681T-1

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 01 Sep 2020

f(D|θ) ∝ exp(



(θ) − D)/2σ

)=exp(χ

/2) (3)

Here D

(θ) is the model data (V

, bispectrum) corresponding to the image θ. For the context of imaging,

a regularization technique is contained in the pre-determined prior distribution p(θ).

When imaging or tackling many other problems with high dimensionality, the integral in Equation 2 can

not be evaluated in a reasonable time, so it is not possible to explicitly evaluate Equation 1. An alternative to

explicit evaluation is to use a Monte-Carlo Markov Chain technique to sample the regions of image space where

p(θ|D)ishighest.

The distribution of images in the resultant Markov Chain θ

then becomes a discrete version

of the posterior distribution p(θ|D) from which inferences on the set of possible images can be made.

2. MACIM IMAGING ALGORITHM

2.1. General Algorithm

The general algorithm used by MACIM is a simulated annealing algorithm with the Metropolis sampler.

1, 2

The

image state space θ

at iteration j consists of the set of pixel vectors {p

}

for all ﬂux elements i with 1 ≤ i ≤ λ.

λ is the total number of ﬂux elements. The ﬂux in the image is constrained to be equal to 1, unless model ﬁtting

of Section 2.3 is used. The vectors p

exist on a ﬁnite square grid with resolution at least λ/4max({B}), with

max({B}) the maximum baseline length. There are two classes of steps that the algorithm can take. The ﬁrst

class of step moves a ﬂux element, i.e. it randomly chooses a ﬂux element I,modiﬁesp

to form the tentative

state {q

} = {p

, ..., p

+ s, ..., p

} for some step s, chosen to be in a random direction. Given a temperature

T , the modiﬁcation to the image state is accepted with a probability:

p(j, j +1)=min(1, exp(

({p

}

) − χ

({q

})

+ α∆R)). (4)

Here the χ

function is the total χ

function calculated directly from the interferometric data in oifits

format (V

, bispectrum, complex visibility). ∆R is the change in the regularization parameter R and α is a

regularization scaling parameter. If the tentative state is accepted, then {p

}

j+1

is set to {q

}.Otherwise,we

set {p

}

j+1

= {p

}

The tentative moves for p

include several diﬀerent types of ﬂux steps s: moving one or several pixels along

one of the image axes, moving the ﬂux unit anywhere in the image, or moving to the location of another

randomly selected ﬂux element. Large steps in general have a smaller probability of success than small steps.

For this reason, the step type for the tentative move is chosen so that on average the probability of accepting

the tentative state is between 0.2 and 0.45. For all steps s, the probability of choosing the tentative reverse

transition at random is equal to the probability of choosing the forward transition.

The conﬁguration space entropy (i.e. the logarithm of the image degeneracy) does not explicitly come into

Equation 4, but does enter the picture if one wishes to ﬁnd the most probable, or mode image. To understand

why this is, consider the image representation {N

} where N

represents the number of ﬂux elements in pixel

k. Two images with the same {N

} are equal but can be degenerate, as each can be formed by a number of

possible state vectors {p

}. We will follow the notation of Ref. 3, and call this the multiplicity W :

W =

λ!

!...N

. (5)

Here n is the total number of pixels in the image. Changing the total number of image elements λ was done in

Ref. 3 by assuming a uniform prior on λ: all numbers of non-zero ﬂux elements were assumed equally probable.

This means that the normalized prior distribution of {N

} is given by:

p({N

})=

(1−δ)λ

, (6)

Proc. of SPIE Vol. 6268 62681T-2

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 01 Sep 2020

with the parameter δ = 0. In general, this prior distribution for {N

} with δ = 0 does not give ‘sensible’

images when λ was permitted to vary (the Markov chain would converge at very high χ

and a low number of

elements). Therefore, non-zero values of, δ can be input as an optional parameter. Note that δ = 1 is equivalent

to the prior distribution for p({N

}) that all conﬁgurations are equally probable, as opposed to all values of λ

being equally probable.

The second class of step consists of adding a new ﬂux element in pixel K or removing ﬂux element I (i.e.

changing λ). This step is intrinsically asymmetrical, as the probability of the reverse step not equal to the forward

step. However, for δ = 0, the ratio of the probabilities of the forward and reverse steps is equal to the inverse of

the ratio of the prior probability from Equation 6, meaning that Equation 4 is still appropriate for determining

the Metropolis algorithm acceptance probability. For other values of δ, the exponential function in Equation 4

is multiplied by n

when adding a ﬂux element, and divided by n

when removing a ﬂux element. For removing

ﬂux element I,weuse{q

} = {p

, ..., p

I−1

I+1

, ..., p

}, and for adding to pixel K we use {q

} = {p

,K}.

The annealing temperature T is modiﬁed based on the reduced χ

, χ

, according to the following algorithm:

j+1

= T

(χ

r,j

− γT

)(1 − χ

/χ

r,j

)

∆j

(7)

The parameter γ is always greater than 1 (set to 4 by default). The other parameters are the reduced

target χ

and the timescale of temperature changes ∆j. This algorithm ﬁxes T

to be near χ

r,j

/γ during

convergence, and then ﬁxes χ

to be near χ

once the algorithm has converged. A minimum temperature limit

min

can also be placed on T

2.2. Regularizers

There are two currently implemented regularizers in MACIM, although many are possible given that no derivative

is required, as is often the case for other imaging algorithms. The ﬁrst regularizer is simply the maximum entropy

regularizer R = log(W ). With a suﬃcient number of ﬂux elements, MACIM can therefore be used to ﬁnd the

maximum entropy regularized image. The second implemented regularizer is a dark interaction energy regularizer.

This regularizer is the sum of all pixel boundaries with zero ﬂux on either side of the pixel boundary. Inspired by

the Ising model, this regularizer encourages large regions of dark space in-between regions of ﬂux and represents

a means to utilize aprioriknowledgeofsourcestructure.

2.3. Model Fitting

For certain astrophysical targets, a combination of model-ﬁtting and imaging can signiﬁcantly aid in data in-

terpretation. An example of this is the point-source plus extended ﬂux images of VY Cma and NML Cyg in

Ref. 4 where the use of a maximum entropy prior with a central point source changed the image morphology

signiﬁcantly. Model ﬁtting is combined with imaging in MACIM by varying model parameters simultaneously

with ﬂux movement. Currently, the only implemented model ﬁtting option is a centrally-located uniform disk

(or point source) that takes up some fraction of the total image ﬂux, and an over-resolved (background) ﬂux

component. This model has three parameters: ﬂux fraction of the central source, diameter of the central source

and over-resolved ﬂux fraction. Any of these parameters can be ﬁxed or be allowed to move freely according to

the Metropolis-Hastings algorithm. The parameter step sizes are chosen so that the probability of accepting the

tentative new parameter is on average 0.3.

2.4. Speciﬁc Modes of Operation

There are several ways in which MACIM can create images:

• Bayesian mean map. This is the default mode of operation, with the settings T

min

=1,χ

=0,λ ﬁxed at

the number of input degrees of freedom and no regularization. Starting from an initial map (by default a

point source), the simulated annealing algorithm converges to a global minimum where as long as χ

<γ

(default γ =4)wehaveT =1. OncewehaveT = 1, the properties of Markov chains enable the full

posterior distribution of images to be sampled. Optionally, the full chain can be output instead of just the

Bayesian mean.

Proc. of SPIE Vol. 6268 62681T-3

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 01 Sep 2020

• Variable-λ Bayesian mean map. By choosing λ

min

<λ

max

, the number of image elements is permitted to

vary. Due to frequent convergence problems, δ is set to 0.1 rather than 0 by default.

• Bayesian mode map. This output is also output whenever the Bayesian mean map is output. The average

ofanumberofimagesnearthemaximumofp({N

}) can be output, giving a variant of a maximum-entropy

map (a maximum multiplicity map). Due to the quantization of ﬂux, averaging a number of images (say,

10% of the ﬁnal chain) about the mode is more aesthetically pleasing than the the single mode map.

An alternative to this kind of averaging is using the single mode map to adaptively bin the image plane

according to the level of quantization noise.

• Pseudo-maximum entropy map. By setting T

min

=0,χ

=1andﬁxingλ to a large number (e.g. double

the number of input degrees of freedom), multiplicity (which converges to entropy for large λ) is maximized

while ﬁxing χ

= 1. Three kinds of potentially useful maps are simultaneously output in this case: the

mean map, the mode map and the ‘maximum entropy’ map, which contains the same number of images as

the mode map but weights the multiplicity so that the mean χ

in the ﬁnal image is 1.

• Regularized map. In general, model ﬁtting and dark interaction energy regularization is most easily per-

formedusingaﬁxedλ. Clearly a large range of possible input parameters are possible here, depending on

the exact nature of any aprioriinformation.

2.5. Diﬃculties and Future Work

The seemingly greatest diﬃculty in using MACIM to make images is choosing the value of λ (or δ if λ is

allowed to vary). One argument for an optimal λ choice comes from the requirement that MACIM converges

and well-samples the posterior distribution.

The optimal value of the acceptance probability p(j, j + 1) is thought to be in the range of 0.2 to 0.5.

For acceptance probabilities outside this range, the Markov Chain samples the posterior distribution at a much

slower rate. Acceptance probabilities in this range can only be found at moderate values of λ. For this reason,

MACIM can not well sample the posterior distribution in the high-λ limit (where it becomes just like for the

MEM algorithm) or in the low-λ limit (as occurs if δ is set near zero).

Another argument for optimal λ may be the desire that the mean value of χ

is 1.0. In principle, there is

a minimum in mean χ

(generally less than 1.0) at some value of λ = λ

min

,andχ

increases on either side of

this minimum.

Therefore, there should be two λ values for which χ

= 1. However, the lower-λ value for mean

= 1 can not be well sampled, because of the very high barriers to ﬂux movement or adding/removing ﬂux

elements. Certainly this problem will require more work for either completely automatic operation of MACIM or

at least a well-deﬁned knowledge of the inﬂuence of λ on deriving statistical inferences from the output MACIM

Markov Chain.

3. SOFTWARE IMPLEMENTATION

MACIM is written in the c programming language, with an option for multi-threaded operation (multiple Markov

chains running simultaneously, which are combined on completion). The transform between image-space and

complex visibility is stored in memory as vectors containing exp(iu

) and exp(iv

) for baselines m and

pixels k. u

and v

are the standard u and v coordinates for baseline m. Splitting the pixel coordinates into

and y

in this fashion means that only 2M

√

n complex numbers need to be stored in memory (with M

the number of baselines). At each iteration, the mathematical functions required are limited to elementary

arithmetic operations and one evaluation of the exponential function (Equation 4). No evaluations of FFTs,

trigonometric functions or square roots are required. For this reason, the millions of iterations required to

characterize the posterior distribution can be run on a 2 GHz class computer in several minutes for a typical

modern interferometric data set.

Only one argument is required to run MACIM: the input oifits ﬁle name. However, the default image size of

λ/ min(B) with min(B) the minimum baseline length is often not appropriate for a given data set. The maximum

number of image elements λ

max

and the pixel scale are other parameters that sometimes should be tweaked for

Proc. of SPIE Vol. 6268 62681T-4

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 01 Sep 2020

Monte-Carlo Imaging for Optical Interferometry

Figures

Citations

Imaging the Surface of Altair

Accreting protoplanets in the LkCa 15 transition disk

LkCa 15: A Young Exoplanet Caught at Formation?

IMAGING AND MODELING RAPIDLY ROTATING STARS: α CEPHEI AND α OPHIUCHI

First Resolved Images of the Eclipsing and Interacting Binary Beta Lyrae

References

Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference

Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues

Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference

High-resolution imaging of dust shells by using Keck aperture masking and the IOTA interferometer

An interferometry imaging beauty contest

Related Papers (5)

Analytic continuation of quantum Monte Carlo data by stochastic analytical inference.

Comparison of monte carlo methods for model probability distribution determination in sar interferometry

A New Adaptive Sampling Technique for Monte Carlo Global Illumination

Circulant Embedding of Approximate Covariances for Inference From Gaussian Data on Large Lattices

CIGALEMC: Galaxy Parameter Estimation using a Markov Chain Monte Carlo Approach with Cigale

Frequently Asked Questions (15)

Q1. What have the authors contributed in "Monte-carlo imaging for optical interferometry" ?

Q2. What is the general algorithm used by MACIM?

Q3. What are the main benefits of MACIM?

Q4. What is the default value for the annealing algorithm?

Q5. What is the only model fitting option in MACIM?

Q6. How many flux elements are in the top right corner of the map?

Q7. What is the standard u and v coordinates for m?

Q8. What is the optimal number of image elements for this data set?

Q9. Why is the lower- value not well sampled?

Q10. What is the main purpose of the regularizer?

Q11. What is the confidence level for the top right feature?

Q12. What is the probability of a point source in the top right corner?

Q13. How many pixel coordinates are needed to be stored in memory?

Q14. Why is the MACIM code more accurate for optical inferferometry than radio?

Q15. What is the probability of the modification to the image state?