Author: SMAART Horses

Treatless Clicks

Working through lots of reading material this week lots of things really started to fall in to place and although able to describe what happens with treatless clicks it has now taken on a whole new meaning. So here goes…..

As I understand it treatless clicks are intended to put the behaviour on a variable ratio of reinforcement. The idea being that this makes the behaviour more resistant to extinction (the reduction in frequency, or total loss, of a behaviour due to total withdrawal of reinforcement for the behaviour). Its intended to make the behaviour stay for a long time, if not forever.

In this approach to training, initially the click is contingently paired with the +Rer in use (the click means a +Rer is coming) each time there is a click. The click becomes a conditioned reinforcer (CR) which has been contingently paired with an unconditioned reinforcer (UR). Note; the click is not paired with the behaviour, its paired with the UR.

Then this continuous ratio (1:1) of click to +Rer changes to; every now and then when there is a click (CR) there is a +Rer (UR). However, because the contingent relationship is between the CR and the UR this means each time there is a click the behaviour IS being reinforced….so its not a variable ratio of reinforcement.

Over a period of time the presentation of the CR no longer has a reliable (contingent) relationship with the UR and so the relationship between click and +Rer gets damaged or broken (its no longer the consistent marker that +Rment will follow) and the learner will find something else that is a more reliable marker.

The click will stop acting as a CR and the behaviour is just as susceptible to extinction as it wasn’t put on a variable ratio f reinforcement (until the relationship between the CR and UR began to break down).

If we want to strengthen a behaviour against extinction then the most effective way to do this is to have a 1:1 CR-UR relationship at all times and build a long and strong reinforcement history with that 1:1 relationship. A behaviour that is more resistant to extinction is created from an increase in number, magnitude and quality of +Rment given.

Reinforc-er, -ment-, -ing

Reinforcing/Punishing (adjective) – the property of a stimulus.

Note; this is not reinforcer/punisher (noun), which describes the intent of the consequence used.

My interpretation; It is not describing what the consequence is, its describing the property (or effect) it has in that set of circumstances (internal environment of the learner as well as external environment).

If the behaviour did not decrease in some way (duration, frequency etc) then the consequence did not have punishing properties for the target behaviour. If the behaviour did not increase in some way (duration, frequency etc) then the consequence did not have reinforcing properties for the target behaviour.

What makes a reinforcer reinforcing?

….given a persons individual history and current motivational state, and the environmental conditions, “any stimulus change can be a ‘reinforcer’ if the characteristics of the change and the temporal relation of the change to the response under observation are properly selected”, Schoenfeld, 1995).

In other words, whether something is reinforcing/punishing depends on many variables. It is not a given that something will be reinforcing/punishing to a learner all the time.

The words reinforcer and punisher indicate a functional relationship, not the consequence itself.

A Positive Reinforcer for afters

“….positive reinforcers are not defined with terms such as pleasant or satisfying, aversive stimuli should not be defined with terms such as annoying or unpleasant. The terms reinforce and punisher should not be used on the basis of a stimulus event’s assumed effect on behaviour or on any inherent property of the stimulus event itself.”
Cooper et al.

In other words, reinforcer and punisher simply describe if a behaviour increases as a result of the consequence or decreases as a result of the consequence. They have no bearing on whether the consequence was good or bad, liked or disliked.

Stimulus Control, poor stimulus control and Intelligent Disobedience

Stimulus Control, poor stimulus control and Intelligent Disobedience;

When I started to write this is seemed like a simple explanation, but it has morphed in to a version of War and Peace!…….

One of the aims of training is to have a behaviour under stimulus control. However, since each behaviour is learned through a set of cues (stimuli, more accurately; discriminative stimuli) then what the learner is really learning is a stimulus class that cues a reinforce/punisher is available should the learner perform behaviour X.

But every now and then a behavior is not performed when the discriminative stimulus(i) is/are presented the learner performs the wrong behaviour, or does nothing.

Depending on the circumstances I have heard this called poor stimulus control of a behaviour or intelligent disobedience. It would be called poor stimulus control if the learner is thought to just not know what behaviour that set of stimuli are cueing, and it would be called intelligent disobedience if the learner is thought to do something else (or do nothing) because there would be a good reason not to do the behaviour.

When we deem this poor stimulus control we tend to lay responsibility on the learner. The learner did not understand the cues (stimuli). But what if the trainer did not present the cues consistently and clearly enough for the learner to understand that they were all the same cue. The learner may have responded to the cue when it looked like V but when the trainer presented what they thought was the same cue and it looked more like W then the learner might get confused. Or yesterday when the learner responded to the cue correctly and today they did not because we have not noticed that the black bucket that was in the training space yesterday is not there today. Unwittingly, the black bucket was learned as part of the environmental stimuli and without one of the stimuli the others combined no longer work function in the way we thought they did.

Although we present one cue that we intend to prompt a behaviour the learner may learn a number of cues combined. Without one part of the combination the behaviour will not be prompted. I often see people offering a cue to their learner saying “this is the cue” but in reality when looking closely the learner has picked out something else, something more meaningful or that has been more consistent throughout the learning process as the cue.

In addition, it is usually very hard, if not impossible, for us to present a cue in the same way every time. As such we are really hoping that the learner will learn a stimulus class (a class of stimuli that all prompt the same behaviour). Just the same as no behaviour carried out will look the same, each attempt will have a slightly different topography and so it is a response class.

Intelligent Disobedience; the learner always performs a behaviour ‘on cue’, the behaviour is under stimulus control. Then one day they do not perform the behaviour and not performing was deemed to be the right answer, e.g.guide dog always crosses the road but today did not because there was a car coming, this is often called intelligent disobedience in response to the cue (stimulus). However, if we look at this situation, this is not being disobedient, this is responding to the cues in the environment which were different. There are only reinforcers available if X, Y and Z are presented, but today X, Y, Z and A were all presented therefore the behaviour is still under stimulus control as the behaviour did not occur when a different set of stimuli were presented.

Or if we ask a horse to back up and they always do, but today they didn’t. If we look closely we will see one or more antecedent stimuli that indicted backing was not the appropriate response.

Habitutation

Habituation.
Updated; Where the repeated (note; not continuous) presentation of a stimulus that results in e.g. a startle response, gradually no longer elicits a startle response.

We often unwittingly rely on habituation and it works….until the stimulus is no longer presented frequently. Habituation is not permanent and so the reaction to the stimulus can return after no or delayed exposure to the stimulus.

A Daily Challenge

Observe without judgement. This includes our internal thoughts and reactions.

Update; this includes our internal thoughts and reactions about others AND about ourselves.

When we judge our thoughts and reactions as good or bad we add emotional tags that can prevent us from learning a deeper lesson. To learn from our thoughts and reactions more productively we need to observe without judgement.

A huge challenge.

What is your end goal?

When we are training how do we determine if we are training towards the end goal, or a part of the behaviour?

If I think about a horse learning to have a head collar on, or a dog learning to have a collar on. We tend to think of training for headcollar/collar on. We may even add in duration to the training plan, but is putting it on really the end goal? Or is taking it off safely without the learner panicking or pulling away from the object the end goal?

How does a shift in thinking about what we are really training for change the emotional impact on the learner?