At what sample size do correlation coefficients stabilise?
Let's think for a minute about a different question:
-
If I were to repeat the experiment, how similar would the results be?
Try again to generate 1000 samples from the population with ρ = 0.21, but with n=10,50,100
Notice the range of sample r values
If we want our sample correlation r to approximate the population coefficient ρ,
it is not sufficient to say either that r is significantly greater than zero, or that
the experiment is well powered.
We need a measure of the difference r-ρ.
The simulation above explored how stable the sample correlation coefficients r are
(arrr!)
given a certain sample size.
Point of Stability
In this paper, Schönbrodt and Perugini (2013) asked:
-
How many subjects do I need to get a stable estimate of the correlation coefficient?
They answered the question by simulating a sample, to which they kept incrementally adding more subjects
until the sample correlation coefficient r came within an acceptable range of the population
coefficient ρ.
Then they asked how many subjects there were at the point r entered the 'corridor of stability'
By repeating this for many samples, they could work out for what sample size the sample correlation r
would be acceptably near to the population correlation ρ a certain proportion, say 80%, of the time.
Running the simulation
Select section 4 in the Matlab file CorrelationSim.m and run it
-
It will take about 60 seconds to run
You should see a plot how r evolves with sample size for a large number of simulated samples, as in
the paper by Schönbrodt and Perugini (2013)
?
You should also have a histogram showing the distribution of the Point of Stability.
?
For what sample size do you think about 80% of the samples fall within the corridor of stability?
You can find out exactly by
Sort the vector containing he PoS for all samples
The vector contains 100 samples, so if we sort them from smallest to largest, the 80th
one will be the sample size for which 80% of samples fall within the corridor of stability
?
>> PoS_sorted=sort(PoS)
>> PoS(80)
I get 228 as the point at which 80% of samples are in the corridor of stability.
However you may get a slightly different value as the simulation is random
Does this agree with the value given by Schönbrodt and Perugini (2013)?
How close is close?
The width of the corridor of stability is calculated based on a w-value.
In his classic work on power, Jacob Cohen (1988) suggested that effect sizes of w=0.1, 0.3, 0.5
represented small, medium and large effect sizes for correlation, so a corridor of stability
bounded by by w=0.1 (as in the Schönbrodt paper) is allowing for only a small deviation in effect
size between the sample and the population
We can be more liberal by defining the corridor of stability using w=0.3 or w=0.5
Try changing w in the Matlab file.
For w=0.3, at what sample size are 80% of simulations within the corridor of stability?
►►►