The copyist model and the shaping view of reinforcement
Introduction
In Principles of Psychology, Keller and Shoenfeld (1950) noted operant conditioning is “merely the strengthening of a reflex that already exists in the organism’s repertory” (p. 48). This observation suggests that they viewed operant conditioning as altering the strength between operants and their consequent reinforcers. For Skinner (1953), on the other hand, “reinforcement” may refer, at least partly, to the process of shaping operants. In Science and Human Behavior, he noted “operant reinforcement resembles the natural selection of evolutionary theory. Just as genetic characteristics which arise as mutations are selected or discarded by their consequences, so novel forms of behavior are selected or discarded through reinforcement” (p. 430). It seems that here Skinner viewed reinforcement as a process of shaping rather than strengthening.
These two views of reinforcement, one based on strengthening and the other on shaping, differ in their implications. Although it is generally acknowledged that reinforcement has both of these two effects (Morse, 1966), its generality and implications are still in controversy (e.g., Shimp, 1976, Shimp, 2013). In this paper, we evaluate how well these two views explain: (1) the response-rate difference between variable-ratio (VR) and variable-interval (VI) schedules that provide the same reinforcement rate (Ferster and Skinner, 1957); and (2) the phenomenon of relative response rates matching relative reinforcer rates in concurrent schedules (Herrnstein, 1961). As regards these two phenomena, we believe the idea that reinforcement shapes behavior is predictively superior to the idea that reinforcement strengthening behavior.
Section snippets
The copyist model
Tanno and Silberberg’s (2012) copyist model belongs to the family of accounts based on the shaping view of reinforcement. The computational algorithm of the copyist model is shown in Fig. 1. While the algorithm is similar to interresponse time (IRT) reinforcement theory broadly conceived (Morse, 1966, Peele et al., 1984, Wearden and Clark, 1988), the copyist model differs from the IRT reinforcement theory in one important regard. In earlier IRT accounts such as Peele et al. (1984), the IRTs in
The VR–VI rate difference
The VR–VI rate difference defines a phenomenon where the strengthening view and the shaping view provide clearly different accounts of behavior. In VR schedules, the number of responses required to deliver a reinforcer varies between successive reinforcers. The mean of these interreinforcement ratios (the number of responses between successive reinforcers) defines the schedule's value (e.g., VR 30). In a VI schedule, on the other hand, a reinforcer is delivered for the first response following
The matching law and concurrent-schedule performance
The matching law is one other representative phenomenon where the strengthening view and the shaping view provide clearly different accounts of behavior. Herrnstein (1961) exposed pigeons to concurrent VI VI schedules and found that the relative response rates for one alternative equaled or “matched” the relative reinforcement rates for that alternative. This relation is expressed as:where B and R denote responses and reinforcements, and subscripts distinguish the two
Changeover delay and copyist model II
Skinner’s (1950) first published considerations of concurrent performances were in response to Tolman’s (1938) claim that the “determiners of behavior at a choice point” could be used as a surrogate for measuring strength. Skinner thought preference was a poor measure of strength, and instead reflected the shaping effects of differential reinforcement of switching on behavior (also see Skinner, 1986).
Given Skinner’s view, Herrnstein’s (1961) finding of matching in choice can be seen, at least
Limitations and implications
The copyist model has two major limitations. The first relates to stimulus control. The copyist model has no mechanism for reflecting the discriminative control of operant responses. For example, the model cannot explain schedule performances under Fixed-Interval and Fixed-Ratio schedules in which timing and counting play important roles, respectively. One possible solution is to define multiple memory sets and assign each to each discriminative stimulus as McDowell’s (2013) selection by
Acknowledgements
This work was supported by JSPS KAKENHI#00237309 and #26995075. The authors thank Dr. Kyoichi Hiraoka for his suggestions on the stay/switch idea of the copyist model.
References (41)
The correlation-based law of effect
J. Exp. Anal. Behav.
(1973)On two types of deviation from the matching law: bias and undermatching
J. Exp. Anal. Behav.
(1974)Optimization and the matching law as accounts of instrumental behavior
J. Exp. Anal. Behav.
(1981)Performances on ratio and interval schedules of reinforcement: data and theory
J. Exp. Anal. Behav.
(1993)- et al.
Choice as time allocation
J. Exp. Anal. Behav.
(1969) Concurrent performances: a baseline for the study of reinforcement magnitude
J. Exp. Anal. Behav.
(1963)Interresponse-time sensitivity during discrete-trial and free-operant concurrent variable-interval schedules
J. Exp. Anal. Behav.
(1999)Response-rate differences in variable-interval and variable-ratio schedules: an old problem revisited
J. Exp. Anal. Behav.
(1994)- et al.
Schedules of Reinforcement
(1957) Relative and absolute strength of response as a function of frequency of reinforcement
J. Exp. Anal. Behav.
(1961)
Is matching compatible with reinforcement maximization on concurrent variable interval, variable ratio?
J. Exp. Anal. Behav.
Principles of Psychology
A local model of concurrent performance
J. Exp. Anal. Behav.
The stay/switch model of concurrent choice
J. Exp. Anal. Behav.
Uninstructed human responding: sensitivity to ratio and interval contingencies
J. Exp. Anal. Behav.
A quantitative evolutionary theory of adaptive behavior dynamics
Psychol. Rev.
Matching-based hedonic scaling in the pigeon
J. Exp. Anal. Behav.
Intermittent reinforcement
The Effects of Motivation on Habitual Instrumental Behavior. Unpublished doctoral dissertation
Tonic dopamine: opportunity costs and the control of response vigor
Psychopharmacology (Berl.)
Cited by (3)
Response-bout analysis of interresponse times in variable-ratio and variable-interval schedules
2016, Behavioural ProcessesCitation Excerpt :Two questions concerning the VR-VI response rate difference are why and how the response rates differ. Tanno and Silberberg (2012; see also Anger, 1956; Morse, 1966; Peele et al., 1984; and Tanno et al., 2015) showed that the differential reinforcement of interreseponse time (IRT) sequences can account for why the rates differ. While the probability of reinforcement does not change with IRT duration in VR, in VI this probability is an increasing and bounded function of the IRT duration.
Molecular (moment-to-moment) and molar (aggregate) analyses of behavior
2020, Journal of the Experimental Analysis of BehaviorScience Shapes the Beautiful: Shaping Moment-to-Moment Aesthetic Behavior
2018, Psychological Record