Reliability versus validity - read on it’s important! April 16, 2009
Posted by Paul Duignan in : Evaluation debates, Indicators, Accountability, Measurement , trackbackNow that Easter is over (and the yard gate has been built to keep in the dog that my wife and the kids have their hearts set on getting). I’m back blogging. Today I want to talk about the difference between reliability and validity. It sounds technical, but read on, its really important in a lot of results and outcomes areas. In psychology, where I come from, they spend a lot of time drumming this distinction into you. Reliability is whether measurements at different times and by different people will give you the same result. Validity is whether you are measuring the right thing.
Take one of my favorite stories - the old story of the drunk looking for his or her keys, when seen looking under a street lamp they’re asked if they lost the keys under the lamp, they reply: ‘no but I’ve a better chance of seeing them here because of the light’. In a sense they are opting for reliability - ability to measure whether the keys are there or not - rather than validity, maximizing the chance that they will actually find the keys.
Take another example, from psychology. In a thing called the DSM-4 (Diagnostic and Statistical Manual of Mental Disorders) the attempt was made to tighten up the reliability of diagnoses of schizophrenia (among other mental disorders). Now, schizophrenia has what are called positive symptoms - like hallucinations - and negative symptoms - like lack of emotional connection. The DSM-4 sets out diagnostic criteria by which one can make a diagnosis of schizophrenia. Some contend that an over emphasis on reliability has led to an excessive focus on the measurement of positive symptoms of schizophrenia - because they are easier to measure. Since they are easier to measure, different people at different times are likely to come up with the same diagnosis for the same person - making it a reliable measure. However, the critics argue that negative symptoms are even more important than positive symptoms when accurately diagnosing schizophrenia. Because of the relative subtlety of negative symptoms (in comparisons to the ‘yes/no’ type questions you can use in regard to positive symptoms - ‘has the patient had hallucinations or not?’), negative symptoms are harder to measure and it is harder to get different observers to make the same judgment about their presence or absence. So negative symptoms may be a less reliable but more valid measure of the presence of schizophrenia whereas positive symptoms may be a more reliable, but less valid measure.
Now, applying this specifically to results and outcomes systems. We are constantly building systems which focus on reliability because we want results which are ‘objective’ and can be confirmed independently of the person who is observing them. This is particularly the case as soon as such measurements are being used for accountability, no one wants to be held to account for measures which vary depending on who makes them. When we are attempting to implement results and outcomes systems we also tend to want results which are easy and cheap to measure to keep down the administrative cost of the whole system - this also tends to encourage an emphasis on reliability and less concern about validity. In fact, for reasons which I fully understand, the whole accounting profession has taken a stand in favor of reliability - they term it verifiability, reproducibility and auditability - rather than validity because they don’t want to get caught up in fights about ‘what the figures actually are’.
However, in some cases the reliable measurements we make may not actually be measuring the thing that we really want to measure (i.e. they may be very reliable, but not particularly valid). This seems to have been a problem with the pay-for-performance system I discussed in my last blog, it was measuring things and rating people alright. But there were some who thought that what it was measuring was how well people could fill in the performance management paperwork. Being good at paperwork is something which it is probably relatively easy to measure reliability, but is it a valid measure of whether how well staff are performing relative to the achievement of organizational outcomes?
The whole Wall Street melt-down can be seen as another case of reliability being preferred over validity. The focus of attention was on measures which could be objectively reported and uncontested e.g. the dollar amount of mortgage business written. The much more important question was the validity of this measure as a measure of the long-term worth of the mortgages which were being written, something which proved much more difficult to measure in an uncontested way.
Paul Duignan, PhD
Comments»
I’m pleased to see some discussion about the properties of the metrics - validity and reliability being good places to start. Going further, there’s not a few measures out there, many from ostensibly reputable sources, that start to look fairly wobbly if you apply some psychometric analysis.
Yes Sue, it would be great if everything had been subject to psychometric analysis. I’m a clinical psychologist originally and spent some time doing such analysis so I know the benefits of it. Important metrics should be examined from this point of view where it is appropriate.
In monitoring and evaluation there are, of course, situations where there is little time or resources to do such analysis. One of the main points I try to push is that there should be transparency about the status of the information that people are using for decision-making. So where it is not particularly robust they should be informed of this. This is why I always think that sets of performance indicators should be mapped back onto an underlying visual model. (As argued in this article http://knol.google.com/k/paul-duignan-phd/indicators-why-they-should-be-mapped/2m7zd68aaz774/72 and this YoutTube video http://www.outcomescentral.org/performanceindicators5.html).