scispace - formally typeset

Proceedings ArticleDOI

Adaptive Strategies for Dynamic Pricing Agents

22 Aug 2011-Vol. 2, pp 323-328

AbstractDynamic Pricing (DyP) is a form of Revenue Management in which the price of a (usually) perishable good is changed over time to increase revenue. It is an effective method that has become even more relevant and useful with the emergence of Internet firms and the possibility of readily and frequently updating prices. In this paper a new approach to DyP is presented. We design an adaptive dynamic pricing strategy and optimize its parameters with an Evolutionary Algorithm (EA) offline, while the strategy can deal with stochastic market dynamics quickly online. We design the adaptive heuristic dynamic pricing strategy in a duopoly where each firm has a finite inventory of a single type of good. We consider two cases, one in which the average of a customer population's stochastic valuation for each of the goods is constant throughout the selling horizon and one in which the average customer valuation for each good is changed according to a random Brownian motion. We also design an agent-based software framework for simulating various dynamic pricing strategies in agent-based marketplaces with multiple firms in a bounded time horizon. We use an EA to optimize the parameters of the pricing strategy in each of the settings and compare our strategy with other strategies from the literature. We also perform sensitivity analysis and show that the optimized strategy works well even when used in settings with varied demand functions.

Topics: Dynamic pricing (71%), Pricing strategies (66%), Time horizon (56%), Revenue management (55%), Duopoly (51%)

Summary (4 min read)

Introduction

  • Dynamic Pricing (DyP) is a form of Revenue Management (RM) that involves changing the price of goods or services over time with the aim of increasing revenue.
  • Today, the Internet provides exceptional opportunities for practicing RM and particularly DyP.
  • By changing prices in time, firms can ask for the price that yields the highest revenue at each moment.
  • So on one hand, the strategies are capable of very fast adaptive decision making in run-time, and on the other hand their parameters are optimized offline to tune them for a more specific setting.
  • Also, because of the adaptiveness of the proposed pricing strategies, any deviations from the expected dynamics of the market will be detected quickly and accounted for by the strategy online, and thus the strategies also work reasonably well in various market settings different from what they have been tuned for.

III. MODEL

  • The authors have a market with two competing firms.
  • The revenue (sum of price of items sold) of one of the firms is optimized using DyP.
  • Each firm can change the price of its goods at the start of equi-distant time intervals.

A. Firms

  • The model is a finite horizon model, the goods left at the end of each time step are transferred into the next and all goods are lost at the end of the whole time span.
  • Each firm announces a selling price, pj(t), for each good type in each time interval t.
  • A cost for each good type, crj , serves as a reserve price for goods of that type.

B. Customers

  • The customers specify their preferences using non-negative cardinal utilities that are exchangeable with monetary payments.
  • Thus, a customer’s utility for getting an item is equal to the difference between its valuation for the item and the item’s price, which is uj(t) = v(gj) − pj(t) for firm j’s good at time t. 2) Population:.
  • This distribution, which is denoted by Prj,t for firm j’s good at time t, may or may not change in time.
  • The number of customers that arrive in each time step follows a Poisson process with a constant intensity a.
  • They may or may not buy a product based on their choice function and, in any case, leave the market afterwards.

C. Modeling Time

  • In any dynamic pricing model, by definition, the firms should be able to adjust their prices in time.
  • While changing prices at any particular moment may become more plausible, particularly with internet firms, it is still more common in the literature for the change of prices to occur in fixed time intervals.
  • At the start of each time interval, all firms set the prices for their goods.

IV. MARKET SIMULATION

  • The authors have developed software for simulating a marketplace described in the previous section.
  • The software uses an event queue to keep track of two types of events: pricing events, firms setting the price for each of their item types at the beginning of each time period, and customer arrival events.
  • Some notation that helps describe the experiments in the following section is defined here.
  • These parameters consist of the properties of the firms (costs of goods, initial stock, etc.) and the valuation distributions and arrival rate of the customers.
  • Based on the above definitions, an instance of the problem is deterministic given the firms’ strategies, i.e. will yield the same results when the firms use the same strategies, while a configuration alone does not contain enough information to determine an outcome.

A. The Inventory Based (IB) Strategy

  • The first strategy is one that adaptively adjusts the prices for a firm based on the number of goods it has left and the number of goods that it has sold in the previous time interval, the authors call it the Inventory Based (IB) strategy.
  • The maxDecPercent and maxIncPercent parameters along with the distance that the sales rate has from the expected sales rate control the amount of change in price in each time step.
  • Finally, pastCustomers is the total number of customers in the previous time step and aveCustomers is the average number of customers per time step (same as a).
  • Note that this is a dynamic indicator updated in the beginning of each time step, so it will take into account the current state of the agent.
  • If α is smaller than one, then the sales rate is too slow, and if it is larger than one, the the inventory would be exhausted sooner than the end of the time horizon, so there is an opportunity for increasing the price.

B. The Revenue Based (RB) Strategy

  • The Revenue Based (RB) strategy uses an estimation of a desirable price to estimate the price in each time step.
  • The algorithm for this strategy can be seen in algorithm 2.
  • These variables are also used: RP , the revenue per customer in the previous time step, and the expected RPC, used as a control parameter, expRPC.
  • This is not always the case though, but this can be a safe assumption when the initial price and expected price are chosen properly, as the authors can see from the experimental results.
  • Note that both strategies depend only on information from the sales in one previous time step.

C. Computing the Parameters

  • The authors want to have settings for the parameters of the strategies that they have defined such that the strategies perform well.
  • The authors therefore need black-box optimization algorithms that are capable of tackling a large class of problems effectively.
  • AMaLGaM is essentially an Evolutionary Algorithm (EA) in which a normal distribution is estimated from the better, selected solutions and subsequently adapted to be aligned favorably with the local structure of the search space.
  • In order to tune the experiments for the EA, which is not designed to handle stochasticity on one hand, and not to over-fit a single instance of the problem on the other, the authors use the following method.
  • The parameters of the heuristic strategies defined in section V are optimized for firm 0, given that firm 1 follows a fixed price strategy.

VI. EXPERIMENTAL RESULTS

  • The authors ran the EA in this way multiple times for each strategy, both for the case where the customers’ valuation distribution does not change with Brownian motion, and for the case in which it does.
  • It also illustrates a case where firm 1 has a slightly more “expensive” good, its cost, price and the customers’ valuation for it are higher than firm 0’s good.
  • In the Brownian case, the IB strategy has a 66% increase in profit and the RB algorithm has a 63% increase compared to the fixed price strategy, with similar results for the DF strategy.
  • GD still yields an almost %35 profit gain compared to the FP, which is pretty good for an algorithm that is using only one parameter to adjust the price.
  • In each cell of these tables, the first number (from top) shows the percentage of instances in which the first (row) strategy performs better than the other (column strategy).

VII. SENSITIVITY ANALYSIS

  • In this section the authors study the robustness of the optimized strategies by computing the amount of revenue loss suffered in case of wrong assumptions about the market configuration.
  • To do this, the authors run the best performing strategy that they have designed up to now, IB, for some configurations that vary with their default Brownian and non-Brownian configurations (see section VI).
  • The authors consider a class of varied configurations where the customer arrival rate, a, is changed compared to the standard non-Brownian configuration discussed above.
  • The results show that even in the most severe cases in their experiments less than 10% of the profit can be lost by incorrectly predicting the model.

VIII. CONCLUSIONS AND FURTHER WORK

  • The authors have presented a framework for implementing dynamic pricing in an interactive agent-based marketplace.
  • The authors showed that for the cases they study, the heuristics yield revenues that are consistently better than that of the best offline-optimized fixed price and the results of various derivative follower algorithms.
  • The strategies are also adaptive and robust to market dynamics.
  • In both the Brownian and non-Brownian cases, their IB strategy can still perform well with the same optimized parameters when the demand is increased up to 160% of the original configuration, compared to when the parameters are specifically optimized considering the demand change.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

Adaptive Strategies for Dynamic Pricing Agents
Sara Ramezani, Peter A. N. Bosman, and Han La Poutr
´
e
CWI, Dutch National Institute for Mathematics and Computer Science
P.O. Box 94079, NL-1090 GB, Amsterdam, The Netherlands
Email: S.Ramezani, Peter.Bosman, Han.La.Poutre@cwi.nl
Abstract—Dynamic Pricing (DyP) is a form of Revenue
Management in which the price of a (usually) perishable good
is changed over time to increase revenue. It is an effective
method that has become even more relevant and useful with
the emergence of Internet firms and the possibility of readily
and frequently updating prices. In this paper a new approach
to DyP is presented. We design adaptive dynamic pricing
strategies and optimize their parameters with an Evolutionary
Algorithm (EA) offline while the strategies can deal with
stochastic market dynamics quickly online. We design two
adaptive heuristic dynamic pricing strategies in a duopoly
where each firm has a finite inventory of a single type of
good. We consider two cases, one in which the average of a
customer population’s stochastic valuation for each of the goods
is constant throughout the selling horizon and one in which the
average customer valuation for each good is changed according
to a random Brownian motion. We also design an agent-based
software framework for simulating various dynamic pricing
strategies in agent-based marketplaces with multiple firms
in a bounded time horizon. We use an EA to optimize the
parameters for each of the pricing strategies in each of the
settings and compare the strategies with other strategies from
the literature. We also perform sensitivity analysis and show
that the optimized strategies work well even when used in
settings with varied demand functions.
I. INTRODUCTION
Dynamic Pricing (DyP) is a form of Revenue Manage-
ment (RM) that involves changing the price of goods or ser-
vices over time with the aim of increasing revenue. Revenue
management is a much broader term that refers to various
techniques for increasing revenue of (usually) perishable
goods or services. RM particularly became popular within
the airline industry after the deregulation of the industry in
the United States in the late 1970’s [18].
Today, the Internet provides exceptional opportunities
for practicing RM and particularly DyP. This is due both
to the amount of data available and the restructuring of
price posting procedures. Thus, the Internet can facilitate
offering different prices for different customers and posting
new prices with minimum extra costs. This also allows for
the increased use of intelligent autonomous agents in e-
commerce, agents designed for automatically buying, sell-
ing, price comparison, bargaining, etc..
Most RM methods exploit the differences in different cus-
tomers’ valuations of a good or changes in these valuations
in time to boost revenue. This has led to different methods of
distinguishing between customers based on their valuation
for the goods, such as fare class distinction, capacity control,
dynamic pricing, auctions, promotions, coupons, and price
discrimination methods such as group discounts [18].
Here we focus on dynamic pricing [7]. By changing prices
in time, firms can ask for the price that yields the highest
revenue at each moment. This allows them to distinguish
between customers in cases where customers with different
utilities buy at different times, as well as exploit the changes
of valuation of the same customers in time. Many RM
methods can be categorized as dynamic pricing, be it the
end-of-season markdown of a fashion retailer, or the inflated
last-minute price of a business-class flight ticket. The main
question is when and how to change the prices in order
to obtain the most revenue. This depends on the market
structure and dynamics, most importantly, on the customer
demand rate and how it changes in time.
In this paper we study dynamic pricing of a limited supply
of goods in a competitive finite-horizon market. We design
and implement an interactive agent based marketplace where
the agents are the firms who wish to increase their revenue
using dynamic pricing strategies. We study two cases, one
in which the customers’ valuation for the products follow
the same valuation throughout the selling horizon and one
in which the average of their valuations follow a random
Brownian motion over time.
We design two adaptive pricing strategies and use learning
to optimize them. The execution of the strategies is not
computationally intensive (O(1) complexity) and they are
understandable from a practical perspective. The strategies
use only the observed market response to set new prices at
each point in time. The parameters used in each strategy
are then optimized using an Evolutionary Algorithm (EA)
[3]. A simulation software has been implemented that can
generate and simulate dynamic pricing strategies in various
oligopolistic markets. The EA uses this simulator as a black
box, and optimizes the parameters for a given strategy using
the obtained revenue as the fitness criteria. So on one hand,
the strategies are capable of very fast adaptive decision
making in run-time, and on the other hand their parameters
are optimized offline to tune them for a more specific setting.
In many real world applications, some general knowledge
of the market dynamics exists beforehand, although it may
be different from what will actually happen, both because

of inaccuracies in the estimations and predictions and unex-
pected changes to the market. Using our proposed approach,
this knowledge can be used for offline learning, to optimize
the parameters of the strategies before the actual selling
starts. Also, because of the adaptiveness of the proposed
pricing strategies, any deviations from the expected dynam-
ics of the market will be detected quickly and accounted
for by the strategy online, and thus the strategies also work
reasonably well in various market settings different from
what they have been tuned for.
In the strategies we present, the selling agents do not
assume the demand to have an a priori structure, though the
EA uses the market simulation with a particular structure to
optimize the performance of the strategy over the space of
its parameters. Thus, information of the demand structure is
passed to the selling agents implicitly through the parame-
ters. The strategies are adaptive in the sense that they detect
changes in the market and adjust their prices accordingly.
Hence they can effectively deal with market changes.
In order to evaluate our strategies’ performance, we com-
pare them to a number of strategies previously studied in
the DyP literature. We show that our strategies outperform
a fixed price (FP) strategy that is optimized offline. This
is significant, because FP strat egies perform very well in
many configurations [9]. It should be noted here that in most
other models a FP strategy is actually the optimal strategy
and DyP is used to find this optimal fixed price, but in our
case no fixed price is optimal due to the combination of
competition, finite inventory, and a finite time horizon. This
can also be shown by the fact that our strategies outperform
the offline-optimized FP (that is very close to the actual best
possible fixed price). The same features make the analytical
computation of the best solution in our model intractable,
requiring experimentation to evaluate our strategies. In fact,
a benefit of using simulations is that we are able to tackle
more complicated models that are too difficult to approach
theoretically.
Furthermore, the strategies also perform better than the
optimized versions of the derivative follower (DF) learn-
ing algorithm that change the price in the same direction
(increasing or decreasing) as long as the revenue keeps
increasing, and then changes the price change direction.
Such algorithms have previously been used successfully in
DyP settings [4], [6], [14]. Our proposed strategies are also
compared to the Goal Directed (GD) strategy of [6] which is
particularly similar to one of our strategies. Both strategies
outperform the GD strategy for which the only parameter,
the initial price, is optimized for the given setting.
Finally, we show that optimized adaptive strategies still
perform reasonably well when various changes are made
to the market configuration after the learning phase. For
this means, we evaluate the performance of a strategy
with parameters optimized for a given configuration, on a
stochastically varied configuration. The obtained revenue
is then compared with the revenue obtained by using the
same strategy with parameters optimized for the varied
setting. This can give us the regret of wrongly estimating
the market structure. The variations in configuration that we
study include altering the demand function by changing the
customer/good ratio.
II. RELATED WORK
DyP has been a very active research area in recent
years. Many studies try to learn the demand structure, or
the parameters for a known demand function, on the fly.
They typically use part of the selling horizon for exploring
the market, trying out the demand rate for different prices
in a systematic way, and another portion of the time for
exploiting the market, using the best price(s) based on their
estimates [2], [8]. Others use statistical learning methods and
heuristics based on mathematical estimations of the optimal
price [1]. While most DyP models are monopolies, there
are some that model competitors in the marketplace as well
[14], [15], [16]. In this work we deal with a duopoly market,
though the firm does not model a competitor explicitly.
Works that are similar to ours in experimenting with
heuristic strategies by simulation are fewer. The Information
Economics group at IBM has investigated the effect of
interacting pricing agents which they call pricebots in a
number of works (see [14] for a survey). In some, they use
use game-theoretic analysis and experiment with heuristics
that aim at achieving the optimal equilibrium price [11].
They focus on the market dynamics and pricing patterns
that arise when using these strategies against each other.
They also study shopbots [10], strategic buyer agents, and
pricing where agents may be differentiated horizontally or
vertically based on their preferences for different attributes
of a product. Some of the algorithms discussed in these
works rely on more information than can be obtained from
the market simulations only, but we have compared our work
to the FP and a few versions of the DF strategies, both of
which are used in these works.
Multi-attribute DyP is also discussed in [5] and [13]. In
[13] a heuristic method for dynamic pricing is presented
which consists of a preference elicitation algorithm and a
dynamic pricing algorithm. The method is then compared to
a DF algorithm and the GD algorithm from [6], their model
differs from the one we use in the existence of multiple
attributes (we consider a single attribute here, the price) and
also because they have a finite number of identifiable buyers
(compared to our infinite population of one-time buyers).
Their algorithms, although similar to ours in fast online
decision making and the use of simulation for the evaluation,
were not directly comparable to the strategies in our current
model due to the strong dependence on these differences,
the simplification of which would significantly undermine
the strengths of their strategies.

In [4], a heuristic Model Optimizer (MO) method is
designed and compared to a DF pricing strategy. The MO
strategy uses information from the previous time intervals
for a more detailed model of the demand, and solves a
non-linear equation in each time step using a simplex hill-
climbing approach. This heuristic strategy differs from ours
in its online computational complexity, which is much higher
than ours due to the online optimization.
In [6] a DF algorithm and an inventory-based GD algo-
rithm are used in a number of simulations to show how they
actually behave in a market and in which scenarios each
one is useful. We compare our strategies with both of these
strategies, because they are both compatible with our model
and comparable with our strategies in their computational
intensity and the information they use.
In [17], a few different EA methods are used to solve a
dynamic pricing problem. Their approach is not comparable
to ours since they use their optimizer to optimize actual
prices for a dynamic pricing model for a small number
(less than 10) of time steps. A method similar to [17]
is not useful in our stochastic model because optimizing
prices using an EA would lead to an over-fitted solution
that works better than the adaptive strategies only for the
instances (see definition 2 in section IV) it is optimized for
and considerably worse on average. It is also far more time-
consuming when considering a larger number of time steps.
III. MODEL
We have a market with two competing firms. The revenue
(sum of price of items sold) of one of the firms is optimized
using DyP. Each firm can change the price of its goods at
the start of equi-distant time intervals.
A. Firms
We have a finite number, m, of firms, {0, 1, . . . , m 1}.
We refer to firm js good type as g
j
. The firm starts off with
an initial inventory Y
j
of its product and the capacity left of
the good at time t is denoted by y
j
(t) (the t can be omitted
if there is no chance of ambiguity).
The model is a finite horizon model, the goods left at the
end of each time step are transferred into the next and all
goods are lost at the end of the whole time span. Each firm
announces a selling price, p
j
(t), for each good type in each
time interval t. A cost for each good type, cr
j
, serves as a
reserve price for goods of that type.
B. Customers
1) Preferences: The customers specify their preferences
using non-negative cardinal utilities that are exchangeable
with monetary payments. Each customer has a valuation
function that determines these utilities. Customers are unit-
demand, they only have preferences on sets consisting of one
item, so their valuation functions are defined as v : G R
+
.
Thus, customers have to specify only a single number for
each good type and its valuation for getting more than one
item is always zero. Thus, a customer’s utility for getting an
item is equal to the difference between its valuation for the
item and the item’s price, which is u
j
(t) = v(g
j
) p
j
(t)
for firm js good at time t.
2) Population: We model the customer populations as an
unbounded population. This means that the distribution of
customers does not change after an item is sold. Also, the
valuations of all of the customers for each unit of each of the
good types follow the same distribution. This distribution,
which is denoted by P r
j,t
for firm js good at time t, may
or may not change in time.
These distributions are all normal distributions. We con-
sider two settings, in the first setting the normal distribution
is the same for each good type and customer segment
pair throughout the time horizon (so the t can be omitted
from P r
j,t
). In the second setting, which we refer to as
the Brownian setting, the mean of the each of the P r
j,t
distributions changes over time, following a basic model
of Brownian motion: the mean increases by a constant
amount (b), decreases by the same constant amount, or does
not change, each of these cases happening with equal (
1
3
)
probability. This allows for some structured dynamism in
the demand pattern in the model.
3) Customer arrival: The number of customers that ar-
rive in each time step follows a Poisson process with a con-
stant intensity a. The firms may be aware of the parameter
of this process when making their pricing decisions. In each
time interval, the customers arrive consecutively after the
firms have set their prices. They may or may not buy a
product based on their choice function and, in any case,
leave the market afterwards.
4) Choice Model: At any time t that a customer has to
make a purchase decision, it will buy one of good g
offered
by firm f
arg max
j
{u
j
(t)|u
j
(t) > 0}, if it exists, i.e.
the item for which it has the highest utility if all items are
not priced higher than he is willing to pay, with probability
1 λ and does not purchase anything with probability λ.
The λ factor is to model a general chance for a purchase not
occurring, this is close to the natural behavior of customers
in many contexts. Note that the effect of the λ can also be
achieved by changing the arrival rate when we are using
a Poisson arrival process, but it is not so with all arrival
models. If g
does not exist, the customer will not make a
purchase. Ties are broken randomly.
As is evident from the model, the customers are myopic
(greedy) and purchase only based on current utilities, not
any prediction of what will happen next.
C. Modeling Time
In any dynamic pricing model, by definition, the firms
should be able to adjust their prices in time. While changing
prices at any particular moment may become more plausible,
particularly with internet firms, it is still more common in

the literature for the change of prices to occur in fixed
time intervals. We suppose that there are T time intervals,
numbered from 1 to T successively. At the start of each time
interval, all firms set the prices for their goods.
IV. MARKET SIMULATION
We have developed software for simulating a marketplace
described in the previous section. The software uses an event
queue to keep track of two types of events: pricing events,
firms setting the price for each of their item types at the
beginning of each time period, and customer arrival events.
Some notation that helps describe the experiments in the
following section is defined here.
Definition 1 (Configuration): A configuration is a model
where all parameters are set. These parameters consist of the
properties of the firms (costs of goods, initial stock, etc.) and
the valuation distributions and arrival rate of the customers.
Definition 2 (Instance): An instance of the problem is
a specific configuration together with samplings for the
stochastic variables (i.e. a fixed random seed for the pseudo
random generator in the software).
Definition 3 (Pricing Strategy): A pricing strategy, or
simply strategy, is a function that given a fixed number of
parameters, sets a new price for a unit of the firm’s good in
each time step. The function can also depend on the previous
events that have occurred in the market. We assume here that
the firm is aware of the previous customers’ behavior and
previous prices, and that the firm knows the customer arrival
rate, but nothing about the customers’ valuation functions.
All strategies are deterministic.
Based on the above definitions, an instance of the problem
is deterministic given the firms’ strategies, i.e. will yield the
same results when the firms use the same strategies, while
a configuration alone does not contain enough information
to determine an outcome.
Definition 4 (Simulation): By a simulation, we designate
a single execution of a particular instance of the problem
with fixed strategies for the firms.
Definition 5 (Batch run): By a batch run, or simply
batch, consisting of n simulations, we mean the simulation
of n different instances of the problem that share the same
configuration and use the same strategy for each of the firms
throughout the n instances.
V. ADAPTIVE HEURISTIC STRATEGIES
We present two heuristic pricing strategies in this section.
A. The Inventory Based (IB) Strategy
The first strategy is one that adaptively adjusts the prices
for a firm based on the number of goods it has left and
the number of goods that it has sold in the previous time
interval, we call it the Inventory Based (IB) strategy.
In each time step, the strategy retains the previous price
if the rate of items sold in the previous time interval is
Algorithm 1 InventoryBasedStrategy(initialPrice,
noChangeThreshUp, noChangeThreshDown, maxIncPer-
cent, maxDecPercent)
1: if time = 0 then
2: price initialP rice
3: return price
4: if numLeft = 0 then
5: return lastP rice
6: if pastCustomers = 0 then
7: pastCustomers 1
8: pastSold pastSold ×
aveCustomers
pastCustomers
9: α
pastSold×timeLeft
numGoodsLeft
10: if α < 1 then
11: = α 1
12: else
13: = 1
1
α
14: if || < 0 then
15: if < noChangeT hreshDown then
16: price lastP rice
17: else
18: price lastP rice(1 + × maxDecP ercent)
19: else
20: if < noChangeT hreshUp then
21: price lastP rice
22: else
23: price lastP rice(1 + × maxIncP ercent)
24: return price
close to the rate needed to sell all the items by the end of
the time horizon (this “closeness” is controlled by the pa-
rameters noChangeT hreshUp noChangeT hreshDown).
It increases the price if too many items have been sold
in the previous interval and decreases it if too little have
been sold. The maxDecP ercent and maxIncP ercent
parameters along with the distance that the sales rate has
from the expected sales rate control the amount of change
in price in each time step. The other parameter used in this
strategy is intialP rice, the price the firm uses in the first
time interval. The details of the algorithm of this strategy
can be seen in algorithm 1.
In algorithm 1, pastSold is the number of items sold
in previous time step, and numGoodsLef t is the number
of items left in the inventory. timeLef t is the number of
time steps left in the selling horizon, and pastP rice is
the price of a unit of the good in the previous time step.
Finally, pastCustomers is the total number of customers
in the previous time step and aveCustomers is the average
number of customers per time step (same as a).
In line 8, the number of items sold in the past time interval
is normalized by the average number of customers arriving
in each time step and the number of goods left to factor
out the stochasticity as much as possible. Note that this is

a dynamic indicator updated in the beginning of each time
step, so it will take into account the current state of the
agent. In line 9, α is defined as an indicator for determining
how fast the inventory would be exhausted if the sales would
go on with the current rate. If α is smaller than one, then
the sales rate is too slow, and if it is larger than one, the
the inventory would be exhausted sooner than the end of
the time horizon, so there is an opportunity for increasing
the price. The parameter is then (in the if-then statement
starting from line 10) defined as a normalized version of α
that is negative if the sales rate is too low and positive if it is
too high. The if-then statement starting from line 14 is where
the final pricing decision is made. If the absolute value of
is smaller than the respective threshold for positive or
negative threshold parameter, i.e. if the sales rate is close
enough to the desired rate, then the price is not changed
otherwise it is changed proportional to , and with regards
to the maximum allowable change rate.
B. The Revenue Based (RB) Strategy
The Revenue Based (RB) strategy uses an estimation of
a desirable price to estimate the price in each time step.
It uses the revenue per customer (RPC) criterion (which
considers all customers, even the ones that did not buy
from the firm) to assess the revenue obtained when using
a particular price and compares that to the RPC needed to
finish the inventory by the end of the selling horizon. The
algorithm for this strategy can be seen in algorithm 2. The
additional parameters used in this strategy are expP rice,
the expected price used in the estimation of the expected
RPC, and the maximum amount the price can change per
time step. These variables are also used: RP , the revenue
per customer in the previous time step, and the expected
RPC, used as a control parameter, expRP C.
Algorithm 2 RevenueBasedStrategy(initialPrice, expPrice,
maxDelta)
1: if time = 0 then
2: price initialP rice
3: return price
4: expRP C
numGoodsLeft×expP rice
timeLeft×aveCustomers
5: if numLeft = 0 then
6: return lastP rice
7: if pastCustomers = 0 then
8: pastCustomers 1
9: RP C
pastSold×lastP rice
pastCustomers
10: α
RP C
expRP C
11: if α 1 then
12: α 1
13: else
14: 1
1
α
15: price lastP rice + × maxDelta
16: return price
In this algorithm, the expRP C (defined in line 4) variable
is the expected revenue per customer for the rest of the
selling horizon, provided that the firm sells with the ex-
pected price, expP rice, that is one of the input parameters.
Then the RP C parameter is the revenue obtained for each
customer in the previous time step (regardless of whether
they make a purchase or not). In this algorithm, α is defined
as the ratio between the expected RPC and the RPC from
the previous time step. Here, also, is a normalization of
α, positive if the expected revenue is higher than expected
and negative if it is lower (line 11), and the price changes
proportional to the magnitude of . Note that here the lower
than expected revenue is attributed to a price that is too high,
thus prohibiting many customers form making a purchase.
This is not always the case though, but this can be a safe
assumption when the initial price and expected price are
chosen properly, as we can see from the experimental results.
Note that both strategies depend only on information from
the sales in one previous time step. Also, the parameters
are designed as to sustain a certain amount of stability in
the price. This is both a means to control the effects of the
stochastic noise causing sudden jumps in the price (specially
since the changes depend on one previous time step only),
and because too much price fluctuation is not desirable from
the customers’ perspective. Also, in all strategies reported
in this work, the previous price is kept if there are no
more goods to be sold. This has no effect on the simulation
because customers will not be given the option to buy from
firms that have no goods left.
C. Computing the Parameters
We want to have settings for the parameters of the strate-
gies that we have defined such that the strategies perform
well. The numerical optimization task associated with this is
generally not easy because it is the outcome of a non-trivial
simulation that we want to optimize. The problem at hand
can thus be seen as a black-box optimization problem with
unknown difficulty. We therefore need black-box optimiza-
tion algorithms that are capable of tackling a large class of
problems effectively. The algorithm of our choice is called
AMaLGaM. AMaLGaM is essentially an Evolutionary Al-
gorithm (EA) in which a normal distribution is estimated
from the better, selected solutions and subsequently adapted
to be aligned favorably with the local structure of the search
space. New solutions are then constructed by sampling the
normal distribution. A parameter-free version of AMaLGaM
exists that can easily be applied to solve any optimization
problem. This version was recently found to be among the
most competent black-box optimization algorithms [3], [12].
In order to tune the experiments for the EA, which is
not designed to handle stochasticity on one hand, and not
to over-fit a single instance of the problem on the other,
we use the following method. The fitness used in the EA
is the average revenue obtained from a batch run of 100

Citations
More filters

Journal ArticleDOI
Abstract: The topic of dynamic pricing and learning has received a considerable amount of attention in recent years, from different scientific communities. We survey these literature streams: we provide a brief introduction to the historical origins of quantitative research on pricing and demand estimation, point to different subfields in the area of dynamic pricing, and provide an in-depth overview of the available literature on dynamic pricing and learning. Our focus is on the operations research and management science literature, but we also discuss relevant contributions from marketing, economics, econometrics, and computer science. We discuss relations with methodologically related research areas, and identify directions for future research.

183 citations


DOI
27 May 2016
Abstract: In contemporary societies, industrial processes as well as domestic activities rely to a large degree on a well-functioning electricity system. This reliance exists both structurally (the system should always be available) and economically (the prices for electricity affect the costs of operating a business and the costs of living). After many decades of stability in engineering principles and related economic paradigms, new developments require us to reconsider how electricity is distributed and paid for. Twowell-known examples of important technological developments in this regard are decentralised renewable energy generation (e.g. solar and wind power) and electric vehicles. They promise to be highly useful, for instance because they allow us to decrease our CO2 emissions and our dependence on energy imports. However, a widespread introduction of these (and related) technologies requires significant engineering efforts. In particular, two challenges to themanagement of electricity systems are of interest to the scope of this dissertation. First, the usage of these technologies has significant effects on howwell (part of) supply and demand can be planned ahead of time and balanced in real time. Planning and balancing are important activities in electricity distribution for keeping the number of peaks low (peaks can damage network hardware and lead to high prices). It can become more difficult to plan and balance in future electricity systems, because supply will partly depend on intermittent sunshine and wind patterns, and demand will partly depend on dynamic mobility patterns of electric vehicle drivers. Second, these technologies are often placed in the lower voltage (LV) tiers of the grid in a decentralised manner, as opposed to conventional energy sources, which are located in higher voltage (HV) tiers in central positions. This is introducing bi-directional power flows on the grid, and it significantly increases the number of actors in the electricity systems whose day-to-day decisionmaking about consumption and generation (e.g. electric vehicles supplying electricity back to the network) has significant impacts on the electricity system. In this dissertation, we look into dynamic pricing and markets in order to achieve allocations (of electricity and money) which are acceptable in future electricity systems. Dynamic pricing and markets are concepts that are highly useful to enable efficient allocations of goods between producers and consumers. Currently, they are being used to allocate electricity between wholesale traders. In recent years, the roles of the wholesale producer and the retailer have been unbundled in many countries of the world, which is often referred to as “market liberalisation”. This is supposed to increase competition and give end consumers more choice in contracts. Market liberalisation creates opportunities to design markets and dynamic pricing approaches that can tackle the aforementioned challenges in future electricity systems. However, they also introduce new challenges themselves, such as the acceptance of price fluctuations by consumers. The research objective of this dissertation is to develop market mechanisms and dynamic pricing strategies which can deal with the challenges mentioned above and achieve acceptable outcomes. To this end, we formulate three major research questions: First, can we design pricing mechanisms for electricity systems that support two necessary featureswell, which are not complementary—namely to encourage adaptations in electricity consumption and generation on short notice (by participants who have this flexibility), but also to enable planning ahead of electricity consumption and generation (for participants who can make use of planning)? Second, the smart grid vision (among others) posits that in future electricity systems, outcomeswill be jointly determined by a large number of (possibly) small actors and allocations will be mademore frequently than today. Which pricing mechanisms do not require high computational capabilities from the participants, limit the exposure of small participants to risk and are able to find allocations fast? Third, automated grid protection against peaks is a crucial innovation step for network operators, but a costly infrastructure program. Is it possible for smart devices to combine the objective of protecting network assets (e.g. cables) from overloading with applying buying and selling strategies in a dynamic pricing environment, such that the devices can earn back parts of their own costs? In order to answer the research questions, our methods are as follows: We consider four problems which are likely to occur in future electricity systems and are of relevance to our research objective. For each problem, we develop an agent-based model and propose a novel solution. Then, we evaluate our proposed solution using stochastic computational simulations in parameterised scenarios. We thus make the following four contributions: In Chapter 3,we design a market mechanism in which both binding commitments and optional reserve capacity are explicitly represented in the bid format, which can facilitate price finding and planning in future electricity systems (and therefore gives answers to our first research question). We also show that in this mechanism, flexible consumers are incentivised to offer reserve capacity ahead of time, whichwe prove for the case of perfect competition and showin simulations for the case of imperfect competition. We are able to show in a broad range of scenarios that our proposed mechanism has no economic drawbacks for participants. Furthermore (giving answers to our second research question), the mechanism requires less computational capabilities in order to participate in it than a contemporary wholesale electricitymarket with comparable features for planning ahead. In Chapter 4, we consider the complexity of dynamic pricing strategies that retailers could use in future electricity systems (this gives answers to our first, but foremost to our second research question). We argue that two important features of pricing strategies are not complementary—namely power peak reduction and comprehensibility of prices—and we propose indicators for the comprehensibility of a pricing strategy from the perspective of consumers. We thereby add a novel perspective for the design and evaluation of pricing strategies. In Chapter 5, we consider dynamic pricing mechanisms where the price is set by a single seller. In particular, we develop pricing strategies for a seller (a retailer) who commits to respect an upper limit on its unit prices (this gives answers to both our first and second research question). Upper price limits reduce exposure of market participants to price fluctuations. We show that employing the proposed dynamic pricing strategies reduces consumption peaks, although their parameters are being simultaneously optimised for themaximisation of retailer profits. In Chapter 6, we develop control algorithms for a small storage device which is connected to a low voltage cable. These algorithms can be used to reach decisions about when to charge and when to discharge the storage device, in order to protect the cable from overloading as well as to maximise revenue from buying and selling (this gives answers to our third research question). We are able to show in computational simulations that our proposed strategies perform well when compared to an approximated theoretical lower cost bound. We also demonstrate the positive effects of one of our proposed strategies in a laboratory setupwith real-world cable hardware. The results obtained in this dissertation advance the state of the art in designing pricing mechanisms and strategies which are useful for many use cases in future decentralised electricity systems. The contributions made can provide two positive effects: First, they are able to avoid or reduce unwanted extreme situations, often related to consumption or production peaks. Second, they are suitable for small actors who do not have much computation power but still need to participate in future electricity systems where fast decision making is needed.

45 citations


DOI
14 May 2014
TL;DR: This dissertation presents a self-adaptive monitoring approach that offers monitoring assurance that agreements are honored, builds a secure audit log of agreement compliance, performs measurements while safeguarding privacy of (sensitive) data, and enables trust-building between consumers and providers.
Abstract: Large scale, distributed, digital environments offer vast potential. Within these environments, software systems will provide unprecedented support for daily life. Offering access to vast amounts of knowledge and resources, these systems will enable wider participation of society, at large. An example is the Smart Energy Grid that increases sustainability and decreases reliance on fossil fuels. Such systems require technology that is capable of negotiating Service Level Agreements (SLAs) between consumers and providers. Multi Agent Systems (MAS) is one such technology that offers a straightforward analog for complex systems of autonomous parties. MAS is based on the notion of autonomous agents that represent human actors (i.e. owners) and are capable of negotiating SLAs and coordinating processes with other agents. They know their owner’s preferences and needs. They are capable of negotiating price, Quality of Service (QoS) characteristics and penalties. They also monitor provisioning of services to detect and penalize service violations. This dissertation presents a MAS framework for automated negotiation and monitoring of SLAs in open environments. In this context, an open environment is a large-scale, distributed environment that is also dynamic and untrusted. This framework enables secure discovery, negotiation and access to distributed resources. Through a process of exchanging messages (e.g. offers, counter-offers), agents together search for a mutually acceptable agreement (e.g. service, price, quality). A negotiation protocol defines the negotiation objects (i.e. offers), language and rules governing interaction. This dissertation presents the WS-Agreement Negotiation protocol with extensions for open environments. This protocol is experimentally validated in the AgentScape middleware. Open environments also present challenges regarding security, trust and privacy. No single authority has complete control over an open environment and no single authority governs the actions of all participants (i.e. agents). Therefore, additional mechanisms are required to ensure security, privacy and promote trust between participants. Automated monitoring mechanisms using a Trusted Third Party (TTP) address issues of security and thus support negotiation in open environments. This dissertation presents a self-adaptive monitoring approach that (1) offers monitoring assurance that agreements are honored, (2) builds a secure audit log of agreement compliance, (3) performs measurements while safeguarding privacy of (sensitive) data, (4) dynamically reacts to changes in risk and (5) enables trust-building between consumers and providers. This monitoring approach is experimentally validated in the AgentScape middleware. Automation of complex tasks, such as negotiation, can increase efficiency. To illustrate these benefits, the framework is applied to two complex systems, including the Smart Energy Grid. In this case, the looming complexity crisis of intermittent generation, real-time pricing and consumer demand shifting requires immediate attention. This domain presents not only technical (e.g. smart-meters) but also social challenges (e.g. user acceptance). The MAS automation framework presented in this dissertation addresses technical challenges by reducing manual labor and increasing efficiency. Automation even enables higher utilization of green resources and reduction of waste (e.g. produced, but unconsumed energy). Transparent, trusted monitoring mechanisms address social challenges by ensuring privacy of (sensitive) data and encouraging user acceptance. Software systems, such as those presented in this dissertation, enable wider participation of society, at large, and offer vast potential.

38 citations


Journal ArticleDOI
01 Feb 2018
TL;DR: A novel machine-learning based framework based on Bayesian inference combined with bootstrap-based confidence estimation and kernel regression for estimating optimal prices under dynamic pricing constraints is proposed.
Abstract: In today's transparent markets, e-commerce providers often have to adjust their prices within short time intervals, e.g., to take frequently changing prices of competitors into account. Automating this task of determining an “optimal” price (e.g., in terms of profit or revenue) with a learning-based approach can however be challenging. Often, only few data points are available, making it difficult to reliably detect the relationships between a given price and the resulting revenue or profit. In this paper, we propose a novel machine-learning based framework for estimating optimal prices under such constraints. The framework is generic in terms of the optimality criterion and can be customized in different ways. At its core, it implements a novel algorithm based on Bayesian inference combined with bootstrap-based confidence estimation and kernel regression. Simulation experiments show that our method is favorable over existing dynamic pricing strategies. Furthermore, the method led to a significant increase in profit and revenue in a real-world evaluation.

23 citations


Dissertation
07 Feb 2013

18 citations


References
More filters

Book
17 Jun 2004
Abstract: Quantity-Based RM.- Single-Resource Capacity Control.- Network Capacity Control.- Overbooking.- Price-based RM.- Dynamic Pricing.- Auctions.- Common Elements.- Customer-Behavior and Market-Response Models.- The Economics of RM.- Estimation and Forecasting.- Industry Profiles.- Implementation.

1,974 citations


Journal ArticleDOI
Abstract: In many industries, managers face the problem of selling a given stock of items by a deadline We investigate the problem of dynamically pricing such inventories when demand is price sensitive and stochastic and the firm's objective is to maximize expected revenues Examples that fit this framework include retailers selling fashion and seasonal goods and the travel and leisure industry, which markets space such as seats on airline flights, cabins on vacation cruises, and rooms in hotels that become worthless if not sold by a specific time We formulate this problem using intensity control and obtain structural monotonicity results for the optimal intensity resp, price as a function of the stock level and the length of the horizon For a particular exponential family of demand functions, we find the optimal pricing policy in closed form For general demand functions, we find an upper bound on the expected revenue based on analyzing the deterministic version of the problem and use this bound to prove that simple, fixed price policies are asymptotically optimal as the volume of expected sales tends to infinity Finally, we extend our results to the case where demand is compound Poisson; only a finite number of prices is allowed; the demand rate is time varying; holding costs are incurred and cash flows are discounted; the initial stock is a decision variable; and reordering, overbooking, and random cancellations are allowed

1,427 citations


Journal ArticleDOI
Abstract: The benefits of dynamic pricing methods have long been known in industries, such as airlines, hotels, and electric utilities, where the capacity is fixed in the short-term and perishable. In recent years, there has been an increasing adoption of dynamic pricing policies in retail and other industries, where the sellers have the ability to store inventory. Three factors contributed to this phenomenon: (1) the increased availability of demand data, (2) the ease of changing prices due to new technologies, and (3) the availability of decision-support tools for analyzing demand data and for dynamic pricing. This paper constitutes a review of the literature and current practices in dynamic pricing. Given its applicability in most markets and its increasing adoption in practice, our focus is on dynamic (intertemporal) pricing in the presence of inventory considerations.

1,007 citations


Proceedings ArticleDOI
07 Jul 2010
TL;DR: Results of the BBOB-2009 benchmarking of 31 search algorithms on 24 noiseless functions in a black-box optimization scenario in continuous domain are presented and the choice of the best algorithm depends remarkably on the available budget of function evaluations.
Abstract: This paper presents results of the BBOB-2009 benchmarking of 31 search algorithms on 24 noiseless functions in a black-box optimization scenario in continuous domain. The runtime of the algorithms, measured in number of function evaluations, is investigated and a connection between a single convergence graph and the runtime distribution is uncovered. Performance is investigated for different dimensions up to 40-D, for different target precision values, and in different subgroups of functions. Searching in larger dimension and multi-modal functions appears to be more difficult. The choice of the best algorithm also depends remarkably on the available budget of function evaluations.

359 citations


Journal ArticleDOI
Abstract: We consider a single-product revenue management problem where, given an initial inventory, the objective is to dynamically adjust prices over a finite sales horizon to maximize expected revenues. Realized demand is observed over time, but the underlying functional relationship between price and mean demand rate that governs these observations (otherwise known as the demand function or demand curve) is not known. We consider two instances of this problem: (i) a setting where the demand function is assumed to belong to a known parametric family with unknown parameter values; and (ii) a setting where the demand function is assumed to belong to a broad class of functions that need not admit any parametric representation. In each case we develop policies that learn the demand function “on the fly,” and optimize prices based on that. The performance of these algorithms is measured in terms of the regret: the revenue loss relative to the maximal revenues that can be extracted when the demand function is known prior to the start of the selling season. We derive lower bounds on the regret that hold for any admissible pricing policy, and then show that our proposed algorithms achieve a regret that is “close” to this lower bound. The magnitude of the regret can be interpreted as the economic value of prior knowledge on the demand function, manifested as the revenue loss due to model uncertainty.

315 citations