Measures of Variation
From our raw data, we were able to calculate a measure of central location. Although we found five measures of central location, we shall, for the remainder of this course, concentrate only on the arithmetic mean
Having found the mean of our data set, we can now proceed to calculate a statistic that measures how much are observed data varies around its mean.
Suppose that we had two sections of students (X & Y) taking an exam graded out of ten.
Observation X Y
1 7 6
2 9 10
3 6 6
4 9 4
5 4 2
6 7 8
7 5 10
8 8 6
9 8 9
10 7 9
Σ 70 Σ 70
= ΣX/n = 7 = ΣY/n = 7
The mean of both data sets is 7, yet closer inspection reveals that there is greater variation in data set Y than in data set X. For starters, the top score in X is 9, and the low score is 4. By contrast the high and low scores in Y are 10 and 2 respectively.
But this is hardly rigorous, what we need is a statistic that is calculated from as much of our data as possible, not merely the high and low scores.
The statistics of choice will be the Variance, the Standard Deviation and the Coefficient of Determination.
But. before I start throwing equations around the shop, I need to sell you on the idea of why we should be interested in a measure of variation – what does it tell us?
Consider first meteorology. The average July temperature in Buffalo NY is 69°F the average July temperature in Seattle WA is also 69°F. However, the average January temperature in Buffalo is 25°F, while Seattle it’s a balmy 41°F.
In finance, the riskyness of an asset is measured by the standard deviation (variation) of it’s price over a period of time (say, 250 days). Indeed, there is a relationship between an asset’s return and its riskyness, assets with low risk (such as US T-Bills) also have low returns, while assets with higher risk, have high returns (for example, the stock of Google). Even in the universe of stocks, some are considered stable (General Electric, IBM, to name but two), while others are considered to be volatile, such as the stocks in the bio-technology sector. For those of you who might be interested the translation of the word risk into Mandarin Chinese is 風險, It means risk but opportunity, not just plain risk
In sports, the idea of variation is pegged to consistency. For example in baseball, the closer might not necessarily be the best pitcher on a team, but he’s probably the most consistent. In golf, major tournaments are decided over four rounds. The winner is rarely the golfer who scored the lowest round in the tournament, but is definitely the most consistent.
Anyhow, the variation found in a data set is measured as following way.
Variance =
We subtract the mean from each observation and sum the squares. We then divide by the number of observation minus 1. The reason why we have to square the deviation from the mean is because the simple sum of the mean deviations will be zero. The reason we divide by ‘n-1’ rather than ‘n’, is a bit tricky but I’ll deal with that later. Let’s have look at the X data from the previous page, where we found the mean to be 7.
Observation X
1 7 0 0
2 9 2 4
3 6 -1 1
4 9 2 4
5 4 -3 9
6 7 0 0
7 5 -2 4
8 8 1 1
9 8 1 1
10 7 0 0
Σ = 70 Σ = 0 Σ = 24
Therefore our variance = 24 ÷ 9 = 2⅔ or 2.6667
We can repeat the process for our Y data and will find that its variance is 7.1111, or seven and one ninth.
So, the variance of the Y data is larger than the variance of the X data, which is what we expected when we first looked at the data. However, what we now want is a way to interpret 2.6667 and 7.1111 – what do those numbers mean? The answer is not immediately obvious, we had to square the deviations from the mean in order to ensure that they did not sum to zero, but in doing so we inflated each deviation.
The obvious thing to do would be to somehow undo the squaring, by taking the square root of the variance. This statistic is called the Standard Deviation.
Standard Deviation =
So the standard deviation of X = ( 2.6667)1/2 = 1.6330
And the standard deviation of Y = (7.1111)1/2 = 2.6667
Note: The fact that the standard deviation of Y happened to be the variance of X was purely accidental. The data sets are independent of each other.
Now we can interpret the standard deviation.
Standard Deviation: The standard deviation is the average deviation, of the individual observations, from their mean.
For practical purpose, we are only really interested in the standard deviation, the fact that we have to calculate the variance first, is neither here or there.
Notation
The variance of a sample is denoted S2
The Standard deviation of a sample is denoted S
S2 The sample variance is a statistic, it is our best estimate of the population variance, which denoted by σ2 (sigma squared). σ2 is a parameter.
S The sample standard deviation is a statistic, it is our best estimate of the population standard deviation, which denoted by σ (sigma). σ is a parameter.
Recall that we referred to the mean of a data set as its first moment. The standard deviation is called the second moment.
Degrees of Freedom
We can now return to the thorny issue of why we divided by (n-1) rather than simply (n).
The short answer is that we lost one degree of freedom, but I would venture to guess that this fact alone is not particularly helpful.
Formally,
Degrees of freedom are the number of independent pieces of information (our observations) that are available to estimate another piece of information. More concretely, the number of degrees of freedom is the number of independent observations in a sample of data that are available to estimate a parameter of the population from which that sample is drawn.
Example 1) If we have two observations, when calculating the mean we have two independent observations; however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the mean.
Example 2) Suppose we have three observation (n = 3), if I tell you that the arithmetic mean of this data set is five (5), I have lost a degree of freedom. Otherwise stated, only two of the original three variables can actually vary, the third has to be fixed – it can no long vary.
I have three volunteers: Tom, Dick and Harry who are free to chose any number they wish, but I tell them that the mean of their choices must be 5.
Tom scratches his head and comes up with three (3).
Dick choose nine (9)
Now Harry, unlike Tom & Dick, cannot choose any number he wants; he is constrained by the fact that the mean is five (5). Therefore he must say four (4) because only for can give us a mean of five (3 + 9 + 4)/3 = 5.
Thus, having calculate the mean, we no longer have (n) variables that can vary, we now only have (n-1), the last has to be fixed.
Alternative Way to Calculate the Standard Deviation/Variance
The formula given earlier on in this note has the advantage of being intuitive, we can immediately see that we are summing squares of deviations from the mean. Pedagogically this is a desirable quality. Unfortunately, it is not computationally efficient, in the sense that we can compute the standard deviation using fewer steps.
Rather that simply give you the alternative formula, I will derive it for you. I do this not to aggravate you, or to show off, but I want to give you a sense of what Mathematical Statistics looks like. Since this is a course in Business Statistics, you are not required to learn this, but I believe you will benefit from the exercise.
We will start with the variance
(1)
Since I am not going to play with the denominator, I will omit it for clarity and bring it back later. Again for clarity, I’ll also lose the super & subscripts from the Sigma.
(2)
I expand the term in the bracket
(3)
Now, I will run the sigma operator through the equation. We treat Σ in exactly the same way as we wood a constant (like a fixed number).
(4)
Now before this gets too unmanageable, why don’t we concoct a little data set to help us unravel the three above terms.
Suppose X = {4, 8, 6} So ΣX = 18 n = 3 and, = 6
X X X2
4 36 24 16
8 36 48 64
6 36 36 36
ΣX = 18 Σ 2 =108 ΣX = 108 ΣX2 = 116
So, we notice that a) Σ 2 = ΣX Think about why this has to be so.
b) Σ 2 = n. 2
Returning to equation (4)
(4)
From a) above we get
(5)
(6)
From b) above we get
(7)
Replacing the denominator we get the variance
Variance =
And, taking the square root we recover the standard deviation
Standard Deviation = (8)
We can now double check if this new equation is actually correct, with our original X
Data.
Observation X X2
1 7 49 2 9 81
3 6 36
4 9 81
5 4 16
6 7 49
7 5 25
8 8 64
9 8 64
10 7 49
Σ(X) = 70 ΣX2 = 514
n = 10
= 7
Standard Deviation = = = 1.6330 Yes!!!
I don’t care which method you use as long as the answer is correct. Computers use the above method because it is computationally more efficient than the mean deviation method.
The coefficient of Variation
Suppose we have a random variable X
The coefficient of Variation is given by:
This quantity, which gives the standard deviation as a proportion of the mean, is sometimes informative. For example, the value S = 10 has little meaning unless we can compare it to something else.. If S is observed to be 10 and is observe to be 1,000, the amount of variation is small relative to the size of the mean. However, if S is observed to be 10 and is observed to be 5, the variation is quite large relative to the size of the mean.
Example: In statistics, the term precision has a special meaning.
Precision means variation in repeated measurement
If we were interested in testing a measuring instrument, such as those stupid plastic things nurses shove into ones ear to take your temperature.
A Coefficient of variation of 10/1,000 = 0.01 might be acceptable. However, a coefficient of variation of 10/5 = 2 might be unacceptable.
Example: We have two stocks: ABC Corp. and XYZ Corp. Which has the most risk.
ABC has a standard deviation of $12 and an average price of $50
XYZ has a standard deviation of $6 and an average price $24 .
Coefficient of Variation for ABC Corp. is $12/$50 = 0.24
Coefficient of Variation for XYZ Corp. is $6/$24 = 0.25
Remember, in finance less risky is good, more risky is bad
Thus, ABC Corp. is slightly less risk (but not by much).
Summary
1) Our main measure of variability is the sample Standard Deviation denotes S
2) S is given by either or,
3) S2 is the sample variance given by or,
4) The Coefficient of Variation given by allows us to compare the relative variability of two data sets.
5) Degrees of freedom is the number of independent observations in a sample of data that are available to estimate a parameter of the population from which that sample is drawn. For the purposes of this course, whenever a mean is calculated we lose one degree of freedom. Later on in the course we will be dealing with two two random variables and how they vary together (Covariance). Not surprisingly,
if we calculate the mean of both variables for the sake of calculating their covariance – we lose two degrees of freedom.
6) S and S2 are sample statistics, they are our best estimate of the population standard deviation and variance, σ and σ2 – these are population parameters
7) The coefficient of variation for a random variable X, is given by is also a sample statistic. It is our best estimate of the population coefficient of variation, which is a parameter given by , where is the population mean of X (parameter), and is the population standard deviation of X (also a
parameter). There is no ancient Greek letter for the Coefficient of variation.
Incidentally, these ancient Greek letter were not chosen at random.
μ Is pronounced “mu”, chosen to represent the mean
Σ Is the ancient Greek capital letter “sigma,” chosen to represent the sum
σ Is the ancient Greek lower case letter “sigma”, chosen to represent the Standard deviation.
Π Is the ancient Greek capital letter “pi” , chosen to represent the product.
π Is the ancient Greek lower case letter ‘pi”, which you know from grade school to represent the mathematical constants, approximately equal to 3.14159. It represents the ratio of any circle's circumference to its diameter in Euclidean geometry.
Never be afraid of notation, it’s like manners, it’s there to put you at ease, not to frighten you.
JOIN KHALID AZIZ
ICMAP STUDENTS
DO NOT WASTE YOUR PRECIOUS TIME
* STAGE 1 FUNDAMENTALS OF FINANCIAL ACCOUNTING RS 3000 FOR COMPLETE SYLLABUS
ECONOMICS RS 3000 FOR COMPLETE SYLLABUS
*STAGE 2 COST ACCOUNTING RS 3000 FOR COMPLETE SYLLABUS
*STAGE 3 FINANCIAL ACCOUNTING RS 4000 FOR COMPLETE SYLLABUS
COST ACCOUNTING APPRAISAL RS 4000 FOR COMPLETE SYLLABUS
CONTACT:
0322-3385752
R-1173, ALNOOR SOCIETY, BLOCK 19, F.B.AREA, NEAR POWER HOUSE, KARACHI.
MA ECONOMICS
NOTES AVAILABLE IN REASONABLE PRICE
Macroeconomics and Microeconomics: Chit Chat
CHIT-CHAT TIME Commerce Heaven (In this conversation after getting 1000 Rupees Khalid is going with his friend Tariq to purcha...
Subscribe to:
Post Comments (Atom)
-
Definition and Explanation: Classic economics covers a century and a half of economic teaching. Adam Smith wrote a classic book entitled, ...
-
MANAGEMENT THEORY OF MARY PARKER FOLLET: Modern management theory owes a lot to a nearly-forgotten woman writer, Mary Parker Follett....
-
CHIT-CHAT TIME Commerce Heaven (In this conversation after getting 1000 Rupees Khalid is going with his friend Tariq to purcha...
No comments:
Post a Comment