Statistical Methods
STATISTICAL METHODS
Description also available in video format (attached below), for better experience use your desktop.
Introduction
·
In
any field of inquiry or investigation, data is first obtained which is
subsequently classified, analysed and tested for accuracy by statistical
methods.
·
Data
that is obtained directly from an individual is called primary data.
·
The
census of 1991 is an example of collecting primary data relating to the
population.
·
The
collection of data about the health and sickness of a population is primary
data.
·
Data
that is obtained from outside source is called secondary data.
·
If
we are studying the hospital records and want to use the census data, the
census data becomes secondary data.
·
Primary
data gives the precise information wanted which the secondary data may not give.
TABULATION –
- Tables are
devices for presenting data simply from masses of statistical data.
- Tabulation
is the first step before the data is used for analysis or interpretation.
- A table can
be simple or complex, depending upon the number or measurement of a single
set or multiple sets of items.
- Whether
simple or complex, there are certain general principles which should be
borne in mind in designing tables: (a) The tables should be numbered e.g.,
Table 1, Table 2, etc. (b)
A title must be given to each table. The title must be brief and
self-explanatory, (c) The headings of columns or rows should be clear and
concise, (d) The data must be presented according to size or importance;
chronologically, alphabetically or geographically, (e) If percentages or averages
are to be compared, they should be placed as close as possible, (f) No
table should be too large, (g) Most people find a vertical arrangement
better than a horizontal one because, it is easier to scan the data from
top to bottom than from left to right, (h) Foot notes may be given, where
necessary, providing explanatory notes or additional information. Some
examples of tabulation are given below:
Simple Tables –
TABLE 1 - Population of some states in |
|
States |
Population 2001 |
Andhra
Pradesh MP UP |
75 727 541 82 878 796 60 385 118 116 052 859 |
TABLE 2 Population of |
|
Year |
Population |
1901 1921 1981 1991 2001 |
238
396 000 251
321 000 685
185 000 843
930 000 1027
015 247 |
Frequency Distribution Table –
- In a
frequency distribution table, the data is first split up into convenient
groups (class intervals) and the number of items (frequency) which occur
in each group is shown in the adjacent column.
- Example: The
following figures are the ages of patients admitted to a hospital with
poliomyelitis. Construct a frequency distribution table.
- 8,24,18,5,6,12,4,3,3,2,3,23,9,
18, 16~ 1,2,3,5, 11, 13, 15,9, 11, 11, 7, 106, 9, 5, 16,20,4,3,3,3',
10,3,2, 1,6, 9,3,7,14,8,1,4,6,4,15,22,2,1,4,7,1,12,3,23,4,19,6,
2,2,4,14,2,2,21,3,2,9,3,2,1,7,19
- The data
given above may be conveniently analysed as shown below:
Age group |
Frequency |
0-4 5-9 10-14 15-19 20-24 |
35 18 11 8 6 |
· The data, analysed
above, is prepared frequency table as shown below:
TABLE 3 - Age
distribution of polio patients |
|
Age group |
No. Of pts |
0-4 5-9 10-14 15-19 20-24 |
35 18 11 8 6 |
- In the
above example, the age is split into groups of five. These are known as class intervals.
- The number
of observations in each group is called frequency.
- In
constructing frequency distribution tables, the questions that arise are:
Into how many groups the data should be split? And what class intervals
should be chosen?
- As a
practical rule, it might be stated that when there is large data, a
maximum of 20 groups, and when there is not much data, a minimum of 5
groups, could be conveniently taken.
- As far as
possible, the class intervals should be equal, so that observations could
be compared.
- The merits
of a frequency distribution tables are, that it shows at a glance how many
individual observations are in a group, and where the main concentration
lies.
- It also
shows the range, and the shape of distribution.
Charts
& Diagrams
· Charts and
diagrams are useful methods of presenting simple statistical data.
· They have a
powerful 'impact on the imagination of people.
· Therefore, they
are a popular media of expressing statistical data, especially in newspapers
and magazines.
· The impact of
the picture depends on the way it is drawn.
· A few general
remarks need be mentioned about charts and diagrams.
· Diagrams are
better retained in the memory than statistical tables.
· The data that is
to be presented by diagrams ought to be simple.
· Then there is no
risk that the reader will misunderstand.
· However,
simplicity may be obtained only at the expense of details and accuracy.
· That is, lot of
details of the original data may be lost in the charts and diagrams.
· If we want the
real study, we have to go back to the original data.
1. Bar Charts –
- Bar charts
are merely a way of presenting a set of numbers by the length of a bar -
the length of the bar is proportional to the magnitude to be represented.
- Bar charts
are a popular media of presenting statistical data because they are easy
to prepare, and enable values to be compared' visually.
- The
following are some examples of bar charts.
- (a) SIMPLE BAR CHART: Bars may be
vertical or horizontal (Fig. 1 and Fig. 2). The bars are usually separated
by appropriate spaces with an eye to neatness and clear presentation. A suitable
scale must be chosen to present the length of the bars.
- (b) MULTIPLE BAR CHARTS: Fig. 3
gives an example of a multiple bar chart or a compound bar chart. Two or
more bars can be grouped together. In Fig. 3, population and land area by
region are compared.
- (c) COMPONENT BAR CHART: The bars
may be divided into two or more parts... each part representing a certain
item and proportional to the magnitude of that particular item (Fig. 4).
2.
Histogram –
- It is a
pictorial diagram of frequency distribution.
- It consists
of a series of blocks (Fig. 5).
- The
3) Pie
Charts –
o
Instead of
comparing the length of a bar, the areas of segments of a circle are compared.
The area of each segment depends upon the angle. Pie charts are extremely popular
with the laity, but not with statisticians who consider them inferior to bar
charts. It is often necessary to indicate the percentages in the segments (Fig.
8) as it may not be sometimes very easy, virtually, to compare the areas of
segments.
4)
Pictograms
· They are a popular method of presenting data to the "man in the street" and to those who cannot understand orthodox charts. Small pictures or symbols are used to present the data. For example, a picture of doctor to represent the population per physician (Fig. 9). Fractions of the picture can be used to represent numbers smaller than the value of a whole symbol. In essence, pictograms are a form of bar charts.
STATISTICAL
AVERAGES
- The word
"average" implies a value in the distribution, around which the
other values are distributed. It gives a mental picture of the central
value. There are several kinds of averages, of which the commonly used
are: - (1) The Arithmetic Mean, (2) Median and (3) The Mode.
- The Mean - The
arithmetic mean is widely used in statistical calculation. It is sometimes
simply called Mean. To
obtain the mean, the individual observations are first added together, and
then divided by the number of observations. The operation of adding
together is called 'summation' and is denoted by the sign L or S. The individual
observation is denoted by the sign " and the mean is denoted by the
sign x (called "X bar").
- The mean
(x) is calculated thus: the diastolic blood pressure of 10 individuals was
83, 75, 81, 79, 71, 95, 75, 77, 84, 90. The total was 810. The mean is 810 divided by 10 which
is 810.
- The advantages
of the mean are that it is
easy to calculate and understand. The disadvantages are that sometimes it
may be unduly influenced by abnormal values in the distribution. Sometimes
it may even look ridiculous; for instance, the average number of children
born to a woman in a certain place was found to be 4.76, which never
occurs in reality. Nevertheless, the arithmetic mean is by far the most
useful of the statistical averages.
The Median –
- The median is an average of a
different kind, which does not depend upon the total and number of items.
To obtain the median, the data is first arranged in an ascending or
- The median is 79 which is the value
of the middle observation (Fig. 12).
- If there
are 10 values instead of 9, the median is worked out by taking the average
of the two middle values. That is, if the number of items or values is
even, the practice is to take the average of the two middle values. For
example, the diastolic blood pressure of 10 individuals was: Fig. 13.
- The
relative merits of median and mean may be examined from the following
example: The income of 7 people per day in Rupees was as follows:
- 5,5,5, 7,
10,20, 102 = (Total 154)
- The mean is
154 divided by 7 which is 22; the median is 7 which is the value of the
middle observation. In this example, the income of the seventh individual
(102) has seriously affected the mean, whereas it has not affected the
median. In an example of this kind median is more nearer the truth, and
therefore more representative than the mean.
The Mode –
- The mode is
the commonly occurring value in a distribution of data. It is the most
frequent item or the most "fashionable" value in a series of
observations. For example, the diastolic blood pressure of 20 individuals
was:
- 85,75,81,79,71,95,75,77,75,90,
- 71,75,79,95,75,77,84,75,81,75
- The mode or
the most frequently occurring value is 75. The advantages of mode are that
it is easy to understand, and is not affected by the extreme items. The
disadvantages are that the exact location is often uncertain and is often
not clearly defined. Therefore, mode is not often used in biological or
medical statistics.
MEASURES OF
DISPERSION
- The daily
calorie requirement of a normal adult doing sedentary work is laid down as
2,400 calories. This clearly is not universally true.
- There must
be individual variations. If we examine the data of blood pressure or
heights or weights of a large group of individuals, we will find that the
values vary from person to person. Even within the same subject, there may
be variation from time. The questions that arise are: What is normal
variation? And how to measure the variation?
- There are
several measures of variation (or "dispersion" as it is
technically called) of which the following are widely known: (a) The Range;
(b) The Mean or Average Deviation; (c) The Standard Deviation;
(a) The Range –
- The range
is by far the simplest measure of dispersion. It is defined as the
difference between the highest and lowest figures in a given sample. For
example, from the following record of diastolic blood pressure of 10
individuals
- 83,75,81,79,71,90,75,95,77,94.
- It can be
seen that the highest value was 95 and the lowest 71. The range is
expressed as 71 to 95 or by the actual difference (24). If we have grouped
data, the range is taken as the difference between the mid-points of the extreme
categories. The range is not of much practical importance, because it
indicates only the extreme values between the two values and nothing about
the dispersion of values between the two extreme values.
(b) The Mean
Deviation –
- It is the
average of the deviations from the arithmetic mean.
- It is given
by the formula:
- M.D = [∑( x
-
- Example: The
diastolic blood pressure of 10 individuals was as follows: 83, 75,81, 79,
71, 95, 75, 77,84 and 90. Find the mean deviation.
- Answer
(Mean deviation)
Diastolic B.P. |
Arithmetic Mean |
Deviation from the Mean |
x |
|
(x - |
83 |
81 |
2 |
75 |
81 |
-6 |
81 |
81 |
0 |
79 |
81 |
-2 |
71 |
81 |
-10 |
95 |
81 |
14 |
75 |
81 |
-6 |
77 |
81 |
-4 |
84 |
81 |
3 |
90 |
81 |
9 |
Total = 810 |
|
Total = 56 (ignoring ± sign) |
- Mean
= 810 / 10; = 81;
- The
Mean Deviation = 56/10 = 5.6
(c)
The Standard Deviation –
- The
standard deviation is the most frequently used measure of deviation. In
simple terms, it is defined as "Root - Means Square -
Deviation." It is denoted by the Greek letter sigma s or
by the initials S.D. The standard deviation is calculated from the basic
formula:
- S.D.
=
- When
the sample size is more than 30, the above basic formula may be used
without modification. For smaller samples, the above formula tends to
underestimate the standard deviation, and therefore needs correction,
which is done by substituting the denominator (11-1) for T]. The modified
formula is as follows:
- S.D.
=
- The
steps involved in calculating the standard deviation are as follows:
- (a)
First of all, take the deviation of each value from the arithmetic mean, ® (x -
- (b)
Then, square each deviation ® (x -
- (c)
Add up the squared deviations S( x -
- (d)
Divide the result by the number of observations h ® [or (h - 1) in
case the sample size is less than 30]
- (e)
Then take the square root, which gives the standard deviation.
- Example: The diastolic blood pressure of 10
individuals was as follows: 83, 75, 81, 79, 71, 95, 75, 77, 84, 90.
Calculate the standard deviation.
Answer |
|
|
X |
(x - |
(x - |
83 |
2 |
4 |
75 |
-6 |
36 |
81 |
0 |
- |
79 |
-2 |
4 |
71 |
-10 |
100 |
95 |
14 |
196 |
75 |
6 |
36 |
77 |
4 |
16 |
84 |
3 |
9 |
90 |
9 |
81 |
|
|
Total = 482 |
- S.D.
=
- The
meaning of standard deviation can only be appreciated fully when we study
it with reference to what is described as normal curve. For the present, we may contend with the
basic significance of standard deviation - that it is an abstract number;
that it gives us an idea of the 'spread' of the dispersion; that the
larger the standard deviation, the greater the dispersion of values about
the mean.
SAMPLING
- When a
large proportion of individuals or items or units have to be studied, we
take a sample. It is
easier and more economical to study the sample than the whole population
or universe. Great care therefore is taken in obtaining a sample. It is
important to ensure that the group of people or items included in the
sample are representative of the whole population to be studied.
The Sampling
frame –
- Once the universe has been defined, a
sampling frame must be prepared. A sampling frame is a listing of the
members of the universe from which the sample is to be drawn. The accuracy
and completeness of the sampling frame influences the quality of the
sample drawn from it.
Sampling Methods
–
- The
following three methods are most commonly used:
- (1) Simple random sample: This is
done by assigning a number to each of the units (the individuals or
households) in the sampling frame. A table of random numbers is then used
(see page 651) to determine which units are to be included in the sample.
Random numbers are a haphazard collection of certain numbers, arranged in
a cunning manner to eliminate personal selection of unconscious bias in
taking out the sample. With this procedure, the sample is drawn in such a
way that each unit has an equal chance of being drawn in the sample. This
technique provides the greatest number of possible samples.
- (2) Systematic random sample: This is
done by picking every 5th or 10th unit at regular intervals. For example,
to carry out a filaria survey in a town, we take 10 per cent sample. The
houses are numbered first. Then a number is selected at random between 1
and 10 (say four). Then every 10th number is selected from that point on
4, 14, 24, 34, etc. By this method, each unit in the sampling frame would
have the same chance of being selected, but the number of possible samples
is greatly reduced.
- (3) Stratified random sample: The sample
is deliberately drawn in a systematic way so that each portion of the
sample represents a corresponding strata of the universe. This method is
particularly useful where one is interested in analysing the data by a
certain characteristic of the population, viz. Hindus, Christians,
Muslims, age-groups etc. - as we know these groups are not equally
distributed in the population.
- It is
useful to note at this stage that Greek letters are usually used to refer
to population characteristics: mean (m), standard deviation (s), and
Roman letters to indicate, sample characteristics: mean (
Sampling Errors
–
- If we take
repeated samples from the same population or universe, the results
obtained from one sample will differ to some extent from the results of
another sample. This type of variation from one sample to another is
called sampling error. It
occurs because data were gathered from a sample rather than from the
entire population of concern. Presuming that the sampling procedure is
such that all the individuals in the population are favoured equally to
come to the sample, the factors that influence the sampling error are: (a)
the size of the sample and (b) the natural variability of the individual readings. As the size of the
sample increases, sampling error will decrease. As the individual readings
vary widely from one another, we get more variability from one sample to
another.
Non-Sampling
Errors –
- The
sampling error is not the only error which arises in a sample survey.
Errors may occur due to inadequately calibrated instruments, due to
observer variation, as well as due to incomplete coverage achieved in
examining the subjects selected and conceptual errors. These are often
more important than the sampling errors.
Standard Error –
- If we take
a random sample (h) from the
population, and similar samples over and over again we will find that
every sample will have a different mean (
Confidence limits |
Normal deviate (N.D.) = (x - m) / |
Significance |
m is outside
the 95 per cent confidence limits |
N.D. > 2 |
P < 0.05; Significant at 5% level |
m is just
within 95 per cent confidence limits |
N.D. =2 |
P = 0.05; Just significant at 5%
level |
m is within
the 95 per cent confidence limits |
N.D. < 2 |
P > 0.05; Not significant at
5% level |
·
Don’t forget to do
these things if you get benefitted from this article
o
Visit our Let’s
contribute page https://keedainformation.blogspot.com/p/lets-contribute.html
o
Follow our page
o
Like & comment
on our post
·
Comments