scispace - formally typeset
Search or ask a question

Showing papers on "Reinforcement published in 2016"


Proceedings Article
10 Jun 2016
TL;DR: The authors proposed a model-free imitation learning algorithm that obtains significant performance gains over existing model free methods in imitating complex behaviors in large, high-dimensional environments, and showed that a certain instantiation of their framework draws an analogy between imitation learning and generative adversarial networks.
Abstract: Consider learning a policy from example expert behavior, without interaction with the expert or access to a reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

942 citations


Journal ArticleDOI
TL;DR: In this article, a holistic review of current research on the mechanical properties of alkali-activated concretes including research on its compressive strength, tensile strength, elastic modulus, Poisson's ratio, stress-strain relationship under uniaxial compression, fracture properties, bond mechanism with steel reinforcement, dynamic mechanical properties, and high-temperature performance.

227 citations


Journal ArticleDOI
05 Oct 2016-Neuron
TL;DR: Using a probabilistic reinforcement learning task combined with reinforcement learning models and fMRI, it is found that adolescents showed better reinforcement learning and a stronger link between reinforcementlearning and episodic memory for rewarding outcomes.

168 citations


Journal ArticleDOI
01 Jan 2016-Brain
TL;DR: It is shown that affected individuals can learn using a reinforcement mechanism despite a deficit in error-based motor learning, and a critical feature of cerebellar patients’ movements (motor noise) determines the effectiveness of learning under reinforcement.
Abstract: Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and error-based feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise, but does not interfere with the reinforcement mechanism itself. Therefore, reinforcement can be used to learn and retain novel skills, but optimal reinforcement learning requires a balance between exploration variability and motor noise.

150 citations


Journal ArticleDOI
TL;DR: Results suggest that schedule-thinning procedures that use discriminative stimuli can maintain the effectiveness of FCT while they minimize the need for punishment or other supplemental procedures.
Abstract: Two principal goals of functional communication training (FCT) are (a) to eliminate destructive behavior and (b) to establish a more acceptable, yet functionally equivalent, communication response (FCR). A related and critically important goal is to thin the schedule of reinforcement for the FCR to levels that can be reasonably managed by caregivers. Researchers have described several approaches to thinning FCT reinforcement schedules. We summarize the results of 25 consecutive applications (among 20 cases) in which schedule-thinning procedures employed discriminative stimuli to signal when the FCR would and would not produce reinforcement (i.e., using multiple schedules, response restriction, or chained schedules). Results suggest that schedule-thinning procedures that use discriminative stimuli can maintain the effectiveness of FCT while they minimize the need for punishment or other supplemental procedures.

133 citations


Journal ArticleDOI
TL;DR: In this article, an experimental investigation has been conducted to collect fundamental data and to develop more understanding of the effect of steel reinforcement distribution on the dynamic response of reinforced concrete plates, where five high strength concrete (HSC) plates are tested using free-fall low-velocity impact technique.

89 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigated the bond-slip behavior of steel reinforcement embedded in engineered cementitious composites (ECC), a ductile concrete exhibiting tensile strain hardening performance.

80 citations


Journal ArticleDOI
TL;DR: This model mischaracterizes the effects of reinforcer rates on persistence and resurgence of operant behavior and is inconsistent with the momentum-based model of resurgence.
Abstract: The behavioral-momentum model of resurgence predicts reinforcer rates within a resurgence preparation should have three effects on target behavior First, higher reinforcer rates in baseline (Phase 1) produce more persistent target behavior during extinction plus alternative reinforcement Second, higher rate alternative reinforcement during Phase 2 generates greater disruption of target responding during extinction Finally, higher rates of either reinforcement source should produce greater responding when alternative reinforcement is suspended in Phase 3 Recent empirical reports have produced mixed results in terms of these predictions Thus, the present experiment further examined reinforcer-rate effects on persistence and resurgence Rats pressed target levers for high-rate or low-rate variable-interval food during Phase 1 In Phase 2, target-lever pressing was extinguished, an alternative nose-poke became available, and nose-poking produced either high-rate variable-interval, low-rate variable-interval, or no (an extinction control) alternative reinforcement Alternative reinforcement was suspended in Phase 3 For groups that received no alternative reinforcement, target-lever pressing was less persistent following high-rate than low-rate Phase-1 reinforcement Target behavior was more persistent with low-rate alternative reinforcement than with high-rate alternative reinforcement or extinction alone Finally, no differences in Phase-3 responding were observed for groups that received either high-rate or low-rate alternative reinforcement, and resurgence occurred only following high-rate alternative reinforcement These findings are inconsistent with the momentum-based model of resurgence We conclude this model mischaracterizes the effects of rein-forcer rates on persistence and resurgence of operant behavior

78 citations


Journal ArticleDOI
TL;DR: In this article, the authors present the mechanical properties and durability of different types of the fiber reinforced polymer (FRP) rebar's and their use in construction of bridges and show that reinforcement corrosion initiated by chlorides is the main cause of the loss of serviceability of bridge structures.

73 citations


Journal ArticleDOI
01 May 2016-Pain
TL;DR: This operant learning task might provide a valid paradigm to study pain-related avoidance behavior in future studies, and avoidance behavior was operationalized as the maximal distance from the shortest trajectory.
Abstract: Ample empirical evidence endorses the role of associative learning in pain-related fear acquisition. Nevertheless, research typically focused on self-reported and psychophysiological measures of fear. Avoidance, which is overt behavior preventing the occurrence of an aversive (painful) stimulus, has been largely neglected so far. Therefore, we aimed to fill this gap and developed an operant conditioning procedure for pain-related avoidance behavior. Participants moved their arm to a target location using the HapticMaster (FCS Robotics; Moog Inc, East Aurora, New York), a 3 degrees-of-freedom, force-controlled robotic arm. Three movement trajectories led to the target location. If participants in the Experimental Group took the shortest/easiest trajectory, they always received a painful stimulus (T1 = 100% reinforcement; no resistance). If they deviated from this trajectory, the painful stimulus could be partly or totally prevented (T2 = 50% reinforcement; T3 = 0% reinforcement), but more effort was needed (T2 = moderate resistance and deviation; T3 = strongest resistance and largest deviation). The Yoked Group received the same reinforcement schedule irrespective of their own behavior. During the subsequent extinction phase, no painful stimuli were delivered. Self-reported pain-expectancy and pain-related fear were assessed, and avoidance behavior was operationalized as the maximal distance from the shortest trajectory. During acquisition, the Experimental Group reported more pain-related fear and pain-expectancy to T1 vs T2 vs T3 and deviated more from the shortest trajectory than the Yoked Group. During subsequent extinction, avoidance behavior, self-reported fear, and pain-expectancy decreased significantly, but conditioned differences persisted despite the absence of painful stimuli. To conclude, this operant learning task might provide a valid paradigm to study pain-related avoidance behavior in future studies.

61 citations


Journal ArticleDOI
TL;DR: In this article, the impact of load cycling on deformation capacity of reinforced fiber-reinforced cement-based composites (HPFRCCs) flexural members subject to three-point and four-point bending was investigated.
Abstract: High-performance fiber-reinforced cement-based composites (HPFRCCs) reinforced with mild steel have been proposed for use in structural elements to enhance component strength and ductility, increase damage tolerance, and reduce reinforcement congestion. Recent research has shown that HPFRCCs have a high resistance to splitting cracks, which causes reinforcement strains to concentrate when a dominant tensile crack forms, leading to early reinforcement strain hardening and reinforcement fracture. This paper presents the impact of longitudinal reinforcement ratio, ranging from 0.54 to 2.0%, and the influence of monotonic and cyclic loading histories on the deformation capacity of reinforced HPFRCC flexural members subject to three-point and four-point bending. Experimental results show that load cycling can decrease deformation capacity of flexural members by up to 67% when compared to monotonic deformation capacity. The impact of load cycling on deformation capacity is shown to be strongly affected ...

Journal ArticleDOI
TL;DR: The hypothesis that pigeons prefer the alternative with the conditioned reinforcer that best predicts reinforcement, whereas its frequency may be relatively unimportant is supported.
Abstract: Pigeons have sometimes shown a preference for a signaled 50% reinforcement alternative (leading half of the time to a stimulus that signaled 100% reinforcement and otherwise to a stimulus that signaled 0% reinforcement) over a 100% reinforcement alternative. We hypothesized that pigeons may actually be indifferent between the 2 alternatives with previous inconsistent preferences resulting in part from an artifact of the use of a spatial discrimination. In the present experiments, we tested the hypothesis that pigeons would be indifferent between alternatives that provide conditioned reinforcers of equal value. In Experiment 1, we used the signaled 50% reinforcement versus 100% reinforcement procedure, but cued the alternatives with shapes that varied in their spatial location from trial to trial. Consistent with the stimulus value hypothesis, the pigeons showed indifference between the alternatives. In Experiment 2, to confirm that the pigeons could discriminate between the shapes, we removed the discriminative function from the 50% reinforcement alternative and found a clear preference for the 100% reinforcement alternative. Finally, in Experiment 3, when we returned the discriminative function to the 50% reinforcement alternative and reduced the 100% reinforcement alternative to 50% reinforcement, we found a clear preference for the discriminative stimulus alternative. These results support the hypothesis that pigeons prefer the alternative with the conditioned reinforcer that best predicts reinforcement, whereas its frequency may be relatively unimportant.

Journal ArticleDOI
TL;DR: In this paper, the number of the reinforcement layers is found to correlate with the flexural stiffness, and the crack width and crack spacing experimentally determined in the beams with different numbers of reinforcement layers.

Journal ArticleDOI
TL;DR: It is suggested that organisms may be seen as natural stimulus-correlation detectors so that behavioral change affects the overall correlation and directs the organism toward currently appetitive goals and away from potential aversive goals.
Abstract: Reinforcers affect behavior. A fundamental assumption has been that reinforcers strengthen the behavior they follow, and that this strengthening may be context-specific (stimulus control). Less frequently discussed, but just as evident, is the observation that reinforcers have discriminative properties that also guide behavior. We review findings from recent research that approaches choice using nontraditional procedures, with a particular focus on how choice is affected by reinforcers, by time since reinforcers, and by recent sequences of reinforcers. We also discuss how conclusions about these results are impacted by the choice of measurement level and display. Clearly, reinforcers as traditionally considered are conditionally phylogenetically important to animals. However, their effects on behavior may be solely discriminative, and contingent reinforcers may not strengthen behavior. Rather, phylogenetically important stimuli constitute a part of a correlated compound stimulus context consisting of stimuli arising from the organism, from behavior, and from physiologically detected environmental stimuli. Thus, the three-term contingency may be seen, along with organismic state, as a correlation of stimuli. We suggest that organisms may be seen as natural stimulus-correlation detectors so that behavioral change affects the overall correlation and directs the organism toward currently appetitive goals and away from potential aversive goals. As a general conclusion, both historical and recent choice research supports the idea that stimulus control, not reinforcer control, may be fundamental.

Journal ArticleDOI
TL;DR: In this article, an experimental test program on full-scale half-joint beams was conducted to evaluate the contribution of the internal steel reinforcing bars found in a typical half joint detail, and the results indicated that if certain bars are missing the overall load bearing capacity of a half joint could be approximately 40% lower than that of a properly designed detail.

Journal ArticleDOI
01 Jun 2016
TL;DR: Behavioral economics is an approach to understand decision making and behavior using principles of behavioral science and economics as mentioned in this paper, and an area of investigation in behavioral economics includes evaluating demand for a commodity (such as drug and non-drug reinforcers), given changes in p
Abstract: Behavioral economics is an approach to understanding decision making and behavior using principles of behavioral science and economics (Hursh, 1980). An area of investigation in behavioral economics includes evaluating demand for a commodity (such as drug and nondrug reinforcers), given changes in p

Journal ArticleDOI
TL;DR: In this paper, a differentiated functional analysis, which relied on an interview-informed synthesized test condition, functional communication training (FCT) was applied across the three suspected contingencies of reinforcement, partly in an attempt to understand the relevance of each.
Abstract: It is common to isolate reinforcement contingencies across several test conditions in functional analyses of problem behavior; however, synthesizing all reinforcement contingencies in a single test condition may also have merit and even be necessary in some cases. Following a differentiated functional analysis, which relied on an interview-informed synthesized test condition, functional communication training (FCT) was applied across the three suspected contingencies of reinforcement, partly in an attempt to understand the relevance of each. Communication responses were acquired for all three reinforcers, and problem behavior ceased only when all contingencies were addressed via FCT, suggesting that problem behavior was controlled by multiple contingencies of reinforcement. These analyses suggest that control by multiple contingencies of reinforcement can be understood during the treatment development process following a highly efficient functional analysis. Copyright © 2015 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the performance of 11 different shear reinforcement systems against punching of inner slab-column connections under gravity loading was compared on the basis of experiments on 12 full-scale specimens, 8 of them newly reported.
Abstract: The performance of 11 different shear reinforcement systems against punching of inner slab-column connections under gravity loading was compared on the basis of experiments on 12 full-scale specimens, 8 of them newly reported. The slab geometry and flexural reinforcement ratio (1.5%) were kept constant. The shear reinforcement systems included different layouts of double-headed studs, individual links, bent-up bars and bonded post-installed reinforcement. All the systems were found to increase both the strength and the deformation capacity of the members but exhibited varying performances. The factors influencing the maximum punching strength of different systems, such as the layout and the anchorage conditions of the transverse reinforcement units, are described and analyzed. The mechanical model of the Critical Shear Crack Theory is used to explain the observed differences and provide design guidance. Comparisons to the codes of practice (ACI 318, Eurocode 2 and Model Code 2010) are also presented.

Journal ArticleDOI
21 Jul 2016-PLOS ONE
TL;DR: In this paper, the authors used a simplified design of a discriminative alternative that, 50% of the time, led to either a signal for 100% reinforcement or a blackout period indicative of 0% reinforcement against a non-deterministic alternative that always led to a signal that predicted 50% reinforcement.
Abstract: Pigeons have shown suboptimal gambling-like behavior when preferring a stimulus that infrequently signals reliable reinforcement over alternatives that provide greater reinforcement overall. As a mechanism for this behavior, recent research proposed that the stimulus value of alternatives with more reliable signals for reinforcement will be preferred relatively independently of their frequencies. The present study tested this hypothesis using a simplified design of a Discriminative alternative that, 50% of the time, led to either a signal for 100% reinforcement or a blackout period indicative of 0% reinforcement against a Nondiscriminative alternative that always led to a signal that predicted 50% reinforcement. Pigeons showed a strong preference for the Discriminative alternative that remained despite reducing the frequency of the signal for reinforcement in subsequent phases to 25% and then 12.5%. In Experiment 2, using the original design of Experiment 1, the stimulus following choice of the Nondiscriminative alternative was increased to 75% and then to 100%. Results showed that preference for the Discriminative alternative decreased only when the signals for reinforcement for the two alternatives predicted the same probability of reinforcement. The ability of several models to predict this behavior are discussed, but the terminal link stimulus value offers the most parsimonious account of this suboptimal behavior.

Journal ArticleDOI
TL;DR: Although higher rate alternative reinforcement appears to more effectively suppress drug seeking, should it become unavailable, it can have the unfortunate effect of increasing relapse.

Journal Article
TL;DR: The experimental results when learners have equal levels of experience suggest that sharing of Q-values is not beneficial and produces similar results to single agent Q-learning, but most of the cooperative Q- learning algorithms perform similarly, but better than single agentQ- learning, especially when Q-value sharing is highly frequent.
Abstract: Cooperative reinforcement learning algorithms such as BEST-Q, AVE-Q, PSO-Q, and WSS use Q-value sharing strategies between reinforcement learners to accelerate the learning process. This paper presents a comparison study of the performance of these cooperative algorithms as well as an algorithm that aggregates their results. In addition, this paper studies the effects of the frequency of Q-value sharing on the learning speed of the independent learners that share their Q-values among each other. The algorithms are compared using the taxi problem (multi-task problem) and different instances of the shortest path problem (single-task problem). The experimental results when learners have equal levels of experience suggest that sharing of Q-values is not beneficial and produces similar results to single agent Q-learning. However, the experimental results when learners have different levels of experience suggest that most of the cooperative Q-learning algorithms perform similarly, but better than single agent Q-learning, especially when Q-value sharing is highly frequent. This paper then places Q-value sharing in the context of modern reinforcement learning techniques and suggests some future directions for research.

Journal ArticleDOI
TL;DR: Direct observation of implementation in concurrent-chains procedures may allow the identification of interventions that are implemented with sufficient integrity and preferred by caregivers.
Abstract: Social validity of behavioral interventions typically is assessed with indirect methods or by determining preferences of the individuals who receive treatment, and direct observation of caregiver preference rarely is described. In this study, preferences of 5 caregivers were determined via a concurrent-chains procedure. Caregivers were neurotypical, and children had been diagnosed with developmental disabilities and engaged in problem behavior maintained by positive reinforcement. Caregivers were taught to implement noncontingent reinforcement (NCR), differential reinforcement of alternative behavior (DRA), and differential reinforcement of other behavior (DRO), and the caregivers selected interventions to implement during sessions with the child after they had demonstrated proficiency in implementing the interventions. Three caregivers preferred DRA, 1 caregiver preferred differential reinforcement procedures, and 1 caregiver did not exhibit a preference. Direct observation of implementation in concurrent-chains procedures may allow the identification of interventions that are implemented with sufficient integrity and preferred by caregivers.

Journal ArticleDOI
TL;DR: In this paper, a non-destructive technique is presented to predict the residual life of reinforced concrete beams having different cracking levels, as results of steel reinforcement corrosion, considering the variation produced in the dynamic behavior, through the variation of the first natural vibration frequency.

Journal ArticleDOI
TL;DR: Results suggest that delayed food produced greater response persistence than did delayed tokens, and immediate reinforcement produced a reinforcer after 1 of 6 delays.
Abstract: We examined the effects of delayed reinforcement on the responding of individuals with intellectual disabilities. Three conditions were evaluated: (a) food reinforcement, (b) token reinforcement with a postsession exchange opportunity, and (c) token reinforcement with a posttrial exchange opportunity. Within each condition, we assessed responding given (a) a no-reinforcement baseline, (b) immediate reinforcement, and (c) delayed reinforcement, in which responses produced a reinforcer after 1 of 6 delays. Results suggest that delayed food produced greater response persistence than did delayed tokens.

Journal ArticleDOI
TL;DR: In this article, the influence of reinforcement mechanisms of carbon nanotubes (CNTs) on mechanical, wear, and fatigue tests on an Aluminium-Silicon (AlSi) alloy was investigated.
Abstract: This work is concerned with understanding the influence of reinforcement mechanisms of carbon nanotubes (CNTs) on mechanical, wear, and fatigue tests on an Aluminium-Silicon (AlSi) alloy. The reinforcement mechanism is presented through the observation of fracture morphology of the different tests. Results of mechanical properties, fatigue life performance, and wear loss is presented and discussed. It is shown that the CNTs reinforcement effect is active simultaneously in all previous properties and the reinforcement physical mechanism seems to be essentially due to a reinforcement effect of the interface that seems to be similar in all mentioned mechanical solicitations.

Journal ArticleDOI
TL;DR: The results suggest that the strong difference found between pigeons and rats in the suboptimal choice procedure is potentially related to differences in the impact of conditioned inhibitors.

Journal ArticleDOI
TL;DR: Polyrhodanine (PRd) was wrapped onto the surface of halloysite nanotubes (HNTs) through the oxidative polymerization of rhodane on Fe3+-impregnated HNTs as mentioned in this paper.
Abstract: Polyrhodanine (PRd) was wrapped onto the surface of halloysite nanotubes (HNTs) through the oxidative polymerization of rhodanine on Fe3+-impregnated-HNTs. The wrapping mechanisms were disclosed. The PRd-HNTs exhibited a prominent reinforcing effect for rubber. With the incorporation of 30 phr of PRd-HNTs, the tensile strength was increased by almost 8-fold, and the modulus (at 300% strain) was increased by 257% compared to neat SBR. More strikingly, with only 2.9 wt.% of PRd (relative to HNTs), the tensile strength and modulus of the composite were enhanced by 117% and 87%, respectively, suggesting the high efficiency of the modification. Such profound changes in the reinforcement were attributed to the formation of covalent linkages between PRd-HNTs and rubber through the participation of PRd-HNTs in curing process. In view of the versatility of PRd-wrapping procedure, this method offers significant insight into the interfacial design of rubber nanocomposites consisting of nonpolar matrices and inorganic reinforcements.

Journal ArticleDOI
TL;DR: The usefulness of 2 assessments to guide treatment selection for individuals whose prior functional analysis indicated that automatic reinforcement maintained their problem behavior was evaluated.
Abstract: We evaluated the usefulness of 2 assessments to guide treatment selection for individuals whose prior functional analysis indicated that automatic reinforcement maintained their problem behavior. In the 1st assessment, we compared levels of problem behavior during a noncontingent play condition and an alone or ignore condition. In the 2nd, we assessed participants' relative preferences for automatic reinforcement and social reinforcers in a concurrent-operants arrangement. We used the results of these 2 assessments to assign 5 participants to a treatment based on noncontingent access to social reinforcers or to a treatment based on differential access to social reinforcers. We conducted monthly probes with the participants over 10 to 12 months to evaluate the effects of the treatment procedures. All participants showed reductions in problem behavior over this period.


Journal ArticleDOI
TL;DR: An adapted alternating treatments design was used to compare skill acquisition during discrete-trial instruction using immediate reinforcement, delayed reinforcement with immediate praise, and delayed reinforcement for 2 children with autism spectrum disorder.
Abstract: We used an adapted alternating treatments design to compare skill acquisition during discrete-trial instruction using immediate reinforcement, delayed reinforcement with immediate praise, and delayed reinforcement for 2 children with autism spectrum disorder. Participants acquired the skills taught with immediate reinforcement; however, delayed reinforcement decreased the efficiency and effectiveness of discrete-trial instruction. We discuss the importance of evaluating the influence of treatment-integrity errors on skill acquisition during discrete-trial instruction.