scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A comparison of variable-ratio and variable-interval schedules of reinforcement.

01 May 1970-Journal of the Experimental Analysis of Behavior (Society for the Experimental Analysis of Behavior)-Vol. 13, Iss: 3, pp 369-374
TL;DR: Four pigeons responded under a two-component multiple schedule of reinforcement and it was found that when rates of reinforcement were equal in the two components, the rate of response was nearly twice that in the variable-interval component.
Abstract: Four pigeons responded under a two-component multiple schedule of reinforcement. Responses were reinforced in one component under a variable-ratio schedule and in the other component under a variable-interval schedule. It was found that when rates of reinforcement were equal in the two components, the rate of response in the variable-ratio component was nearly twice that in the variable-interval component. Furthermore, for three of the four subjects, the function relating response rate to relative rate of reinforcement in the variable-ratio component had a slope 2.5 to 3 times the slope of the corresponding function for the variable-interval component.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The extension of reinforcement learning models to free-operant tasks unites psychologically and computationally inspired ideas about the role of tonic dopamine in striatum, explaining from a normative point of view why higher levels of dopamine might be associated with more vigorous responding.
Abstract: Rationale Dopamine neurotransmission has long been known to exert a powerful influence over the vigor, strength, or rate of responding. However, there exists no clear understanding of the computational foundation for this effect; predominant accounts of dopamine’s computational function focus on a role for phasic dopamine in controlling the discrete selection between different actions and have nothing to say about response vigor or indeed the freeoperant tasks in which it is typically measured. Objectives We seek to accommodate free-operant behavioral tasks within the realm of models of optimal control and thereby capture how dopaminergic and motivational manipulations affect response vigor. Methods We construct an average reward reinforcement learning model in which subjects choose both which action to perform and also the latency with which to perform it. Optimal control balances the costs of acting quickly against the benefits of getting reward earlier and thereby chooses a best response latency. Results In this framework, the long-run average rate of reward plays a key role as an opportunity cost and mediates motivational influences on rates and vigor of responding. We review evidence suggesting that the average reward rate is reported by tonic levels of dopamine putatively in the nucleus accumbens. Conclusions Our extension of reinforcement learning models to free-operant tasks unites psychologically and computationally inspired ideas about the role of tonic dopamine in striatum, explaining from a normative point of view why higher levels of dopamine might be associated with more vigorous responding.

1,007 citations


Cites background or result from "A comparison of variable-ratio and ..."

  • ...Other characteristics such as faster responding on ratio schedules compared with yoked interval schedules (Zuriff 1970; Catania et al. 1977; Dawson and Dickinson 1990) are also reproduced by the model (Niv et al. 2005a)....

    [...]

  • ...…in slower response rates the higher the interval or ratio schedule (Herrnstein 1970; Barrett and Stanley 1980; Mazur 1983; Killeen 1995; Foster et al. 1997) and faster responding on ratio schedules compared with yoked interval schedules (Zuriff 1970; Catania et al. 1977; Dawson and Dickinson 1990)....

    [...]

Journal ArticleDOI
TL;DR: College students' presses on a telegraph key were occasionally reinforced by light onsets in the presence of which button presses (consummatory responses) produced points later exchangeable for money, confirming human responding is maximally sensitive to schedule contingencies when instructions are minimized and the reinforcer requires a consummatory response.
Abstract: College students' presses on a telegraph key were occasionally reinforced by light onsets in the presence of which button presses (consummatory responses) produced points later exchangeable for money. One student's key presses were reinforced according to a variable-ratio schedule; key presses of another student in a separate room were reinforced according to a variable-interval schedule yoked to the interreinforcement intervals produced by the first student. Instructions described the operation of the reinforcement button, but did not mention the telegraph key; instead, key pressing was established by shaping. Performances were comparable to those of infrahuman organisms: variable-ratio key-pressing rates were higher than yoked variable-interval rates. With some yoked pairs, schedule effects occurred so rapidly that rate reversals produced by schedule reversals were demonstrable within one session. But sensitivity to these contingencies was not reliably obtained with other pairs for whom an experimenter demonstrated key pressing or for whom the reinforcer included automatic point deliveries instead of points produced by button presses. A second experiment with uninstructed responding demonstrated sensitivity to fixed-interval contingencies. These findings clarify prior failures to demonstrate human sensitivity to schedule contingencies: human responding is maximally sensitive to these contingencies when instructions are minimized and the reinforcer requires a consummatory response.

369 citations

Journal ArticleDOI
TL;DR: In this article, the authors use the slopes of learning curves to infer which response comes closest to the organism's definition of the response, and the resulting exponentially weighted moving average provides a model of memory that is used to ground a quantitative theory of reinforcement, where incentives excite behavior and focus the excitement on responses that are contemporaneous in memory.
Abstract: Effective conditioning requires a correlation between the experimenter's definition of a response and an organism's, but an animal's perception of its behavior differs from ours. These experiments explore various definitions of the response, using the slopes of learning curves to infer which comes closest to the organism's definition. The resulting exponentially weighted moving average provides a model of memory that is used to ground a quantitative theory of reinforcement. The theory assumes that: incentives excite behavior and focus the excitement on responses that are contemporaneous in memory. The correlation between the organism's memory and the behavior measured by the experimenter is given by coupling coefficients, which are derived for various schedules of reinforcement. The coupling coefficients for simple schedules may be concatenated to predict the effects of complex schedules. The coefficients are inserted into a generic model of arousal and temporal constraint to predict response rates under any scheduling arrangement. The theory posits a response-indexed decay of memory, not a time-indexed one. It requires that incentives displace memory for the responses that occur before them, and may truncate the representation of the response that brings them about. As a contiguity-weighted correlation model, it bridges opposing views of the reinforcement process. By placing the short-term memory of behavior in so central a role, it provides a behavioral account of a key cognitive process.

250 citations

Journal ArticleDOI
TL;DR: The interaction between instrumental behavior and environment can be conveniently described at a molar level as a feedback system and two different possible theories, the matching law and optimization, differ primarily in the reference criterion they suggest for the system.
Abstract: The interaction between instrumental behavior and environment can be conveniently described at a molar level as a feedback system. Two different possible theories, the matching law and optimization, differ primarily in the reference criterion they suggest for the system. Both offer accounts of most of the known phenomena of performance on concurrent and single variable-interval and variable-ratio schedules. The matching law appears stronger in describing concurrent performances, whereas optimization appears stronger in describing performance on single schedules.

239 citations

Journal ArticleDOI
TL;DR: Four pigeons on concurrent variable interval, variable ratio approximated the matching relationship with biases toward the variable interval when time spent responding was the measure of behavior and towards the variable ratio when frequency of pecking was theMeasure of behavior.
Abstract: Four pigeons on concurrent variable interval, variable ratio approximated the matching relationship with biases toward the variable interval when time spent responding was the measure of behavior and toward the variable ratio when frequency of pecking was the measure of behavior. The local rates of responding were consistently higher on the variable ratio, even when there was overall preference for the variable interval. Matching on concurrent variable interval, variable ratio was shown to be incompatible with maximization of total reinforcement, given the observed local rates of responding and rates of alternation between the schedules. Furthermore, it was shown that the subjects were losing reinforcements at a rate of about 60 per hour by matching rather than maximizing.

232 citations

References
More filters
Journal ArticleDOI
TL;DR: The present experiment is a study of strength of response of pigeons on a concurrent schedule under which they peck at either of two response-keys and investigates output as a function of frequency of reinforcement.
Abstract: A previous paper (Herrnstein, 1958) reported how pigeons behave on a concurrent schedule under which they peck at either of two response-keys The significant finding of this investigation was that the relative frequency of responding to each of the keys may be controlled within narrow limits by adjustments in an independent variable In brief, the requirement for reinforcement in this procedure is the emission of a minimum number of pecks to each of the keys The pigeon receives food when it completes the requirement on both keys The frequency of responding to each key was a close approximation to the minimum requirement The present experiment explores the relative frequency of responding further In the earlier study it was shown that the output of behavior to each of two keys may be controlled by specific requirements of outputs Now we are investigating output as a function of frequency of reinforcement The earlier experiment may be considered a study of differential reinforcement; the present one, a study of strength of response Both experiments are attempts to elucidate the properties of rdlative frequency of responding as a dependent variable

2,220 citations

Journal ArticleDOI
TL;DR: Four pigeons were trained to peck at either of two response-keys, and the relative rate at which each pigeon pecked to obtain a secondary reinforcer equalled the Relative rate of primary reinforcement in its presence.
Abstract: Four pigeons were trained to peck at either of two response-keys. Pecking at either key occasionally produced a secondary reinforcer. Then, in the presence of the secondary reinforcer, further pecking occasionally produced the primary reinforcer, food. The relative rate at which each pigeon pecked to obtain a secondary reinforcer equalled the relative rate of primary reinforcement in its presence.

311 citations

Journal ArticleDOI
TL;DR: Post-reinforcement pause was approximately equal for the yoked and ratio pigeons, and was relatively insensitive to changes in the tandem requirement, but terminal response rate increased with increases in theandem requirement, even though reinforcement rate was invariant.
Abstract: Two variables often confounded in fixed-ratio schedules are reinforcement frequency and response requirement. These variables were isolated by a technique that yoked the distributions of reinforcements in time for one group of pigeons to those of pigeons responding on various fixed-ratio schedules. The contingencies for the yoked birds were then manipulated by adding various tandem fixed-ratio requirements to their schedules. Post-reinforcement pause was approximately equal for the yoked and ratio pigeons, and was relatively insensitive to changes in the tandem requirement. Terminal response rate increased with increases in the tandem requirement, even though reinforcement rate was invariant. This increase was attributed to the progressive interference of the tandem requirement with the differential reinforcement of long interresponse times.

112 citations