Statistical Methods

STATISTICAL METHODS

Description also available in video format (attached below), for better experience use your desktop.

Introduction

· In any field of inquiry or investigation, data is first obtained which is subsequently classified, analysed and tested for accuracy by statistical methods.

· Data that is obtained directly from an individual is called primary data.

· The census of 1991 is an example of collecting primary data relating to the population.

· The collection of data about the health and sickness of a population is primary data.

· Data that is obtained from outside source is called secondary data.

· If we are studying the hospital records and want to use the census data, the census data becomes secondary data.

· Primary data gives the precise information wanted which the secondary data may not give.

TABULATION –

Tables are devices for presenting data simply from masses of statistical data.
Tabulation is the first step before the data is used for analysis or interpretation.
A table can be simple or complex, depending upon the number or measurement of a single set or multiple sets of items.
Whether simple or complex, there are certain general principles which should be borne in mind in designing tables: (a) The tables should be numbered e.g., Table 1, Table 2, etc. (b) A title must be given to each table. The title must be brief and self-explanatory, (c) The headings of columns or rows should be clear and concise, (d) The data must be presented according to size or importance; chronologically, alphabetically or geographically, (e) If percentages or averages are to be compared, they should be placed as close as possible, (f) No table should be too large, (g) Most people find a vertical arrangement better than a horizontal one because, it is easier to scan the data from top to bottom than from left to right, (h) Foot notes may be given, where necessary, providing explanatory notes or additional information. Some examples of tabulation are given below:

Simple Tables –

TABLE 1 - Population of some states in India*

States

Population 2001

Andhra Pradesh

Bihar

75 727 541

82 878 796

60 385 118

116 052 859

TABLE 2 Population of India*

Year

Population

1901

1921

1981

1991

2001

238 396 000

251 321 000

685 185 000

843 930 000

1027 015 247

Frequency Distribution Table –

In a frequency distribution table, the data is first split up into convenient groups (class intervals) and the number of items (frequency) which occur in each group is shown in the adjacent column.
Example: The following figures are the ages of patients admitted to a hospital with poliomyelitis. Construct a frequency distribution table.
8,24,18,5,6,12,4,3,3,2,3,23,9, 18, 16~ 1,2,3,5, 11, 13, 15,9, 11, 11, 7, 106, 9, 5, 16,20,4,3,3,3', 10,3,2, 1,6, 9,3,7,14,8,1,4,6,4,15,22,2,1,4,7,1,12,3,23,4,19,6, 2,2,4,14,2,2,21,3,2,9,3,2,1,7,19
The data given above may be conveniently analysed as shown below:

Age group

Frequency

0-4

5-9

10-14

15-19

20-24

· The data, analysed above, is prepared frequency table as shown below:

TABLE 3 - Age distribution of polio patients

Age group

No. Of pts

0-4

5-9

10-14

15-19

20-24

In the above example, the age is split into groups of five. These are known as class intervals.
The number of observations in each group is called frequency.
In constructing frequency distribution tables, the questions that arise are: Into how many groups the data should be split? And what class intervals should be chosen?
As a practical rule, it might be stated that when there is large data, a maximum of 20 groups, and when there is not much data, a minimum of 5 groups, could be conveniently taken.
As far as possible, the class intervals should be equal, so that observations could be compared.
The merits of a frequency distribution tables are, that it shows at a glance how many individual observations are in a group, and where the main concentration lies.
It also shows the range, and the shape of distribution.

Charts & Diagrams

· Charts and diagrams are useful methods of presenting simple statistical data.

· They have a powerful 'impact on the imagination of people.

· Therefore, they are a popular media of expressing statistical data, especially in newspapers and magazines.

· The impact of the picture depends on the way it is drawn.

· A few general remarks need be mentioned about charts and diagrams.

· Diagrams are better retained in the memory than statistical tables.

· The data that is to be presented by diagrams ought to be simple.

· Then there is no risk that the reader will misunderstand.

· However, simplicity may be obtained only at the expense of details and accuracy.

· That is, lot of details of the original data may be lost in the charts and diagrams.

· If we want the real study, we have to go back to the original data.

1. Bar Charts –

Bar charts are merely a way of presenting a set of numbers by the length of a bar - the length of the bar is proportional to the magnitude to be represented.
Bar charts are a popular media of presenting statistical data because they are easy to prepare, and enable values to be compared' visually.
The following are some examples of bar charts.
(a) SIMPLE BAR CHART: Bars may be vertical or horizontal (Fig. 1 and Fig. 2). The bars are usually separated by appropriate spaces with an eye to neatness and clear presentation. A suitable scale must be chosen to present the length of the bars.

(b) MULTIPLE BAR CHARTS: Fig. 3 gives an example of a multiple bar chart or a compound bar chart. Two or more bars can be grouped together. In Fig. 3, population and land area by region are compared.
(c) COMPONENT BAR CHART: The bars may be divided into two or more parts... each part representing a certain item and proportional to the magnitude of that particular item (Fig. 4).

2. Histogram –

It is a pictorial diagram of frequency distribution.
It consists of a series of blocks (Fig. 5).
The class intervals are given along the horizontal axis and the frequencies along the vertical axis. The area of each block or rectangle is proportional to the frequency. Fig.5 is the histogram of the frequency distribution of blood pressure in females 45-64 years.

3) Pie Charts –

Instead of comparing the length of a bar, the areas of segments of a circle are compared. The area of each segment depends upon the angle. Pie charts are extremely popular with the laity, but not with statisticians who consider them inferior to bar charts. It is often necessary to indicate the percentages in the segments (Fig. 8) as it may not be sometimes very easy, virtually, to compare the areas of segments.

4) Pictograms

· They are a popular method of presenting data to the "man in the street" and to those who cannot understand orthodox charts. Small pictures or symbols are used to present the data. For example, a picture of doctor to represent the population per physician (Fig. 9). Fractions of the picture can be used to represent numbers smaller than the value of a whole symbol. In essence, pictograms are a form of bar charts.

STATISTICAL AVERAGES

The word "average" implies a value in the distribution, around which the other values are distributed. It gives a mental picture of the central value. There are several kinds of averages, of which the commonly used are: - (1) The Arithmetic Mean, (2) Median and (3) The Mode.
The Mean - The arithmetic mean is widely used in statistical calculation. It is sometimes simply called Mean. To obtain the mean, the individual observations are first added together, and then divided by the number of observations. The operation of adding together is called 'summation' and is denoted by the sign L or S. The individual observation is denoted by the sign " and the mean is denoted by the sign x (called "X bar").
The mean (x) is calculated thus: the diastolic blood pressure of 10 individuals was 83, 75, 81, 79, 71, 95, 75, 77, 84, 90. The total was 810. The mean is 810 divided by 10 which is 810.
The advantages of the mean are that it is easy to calculate and understand. The disadvantages are that sometimes it may be unduly influenced by abnormal values in the distribution. Sometimes it may even look ridiculous; for instance, the average number of children born to a woman in a certain place was found to be 4.76, which never occurs in reality. Nevertheless, the arithmetic mean is by far the most useful of the statistical averages.

The Median –

The median is an average of a different kind, which does not depend upon the total and number of items. To obtain the median, the data is first arranged in an ascending or descending order of magnitude, and then the value of the middle observation is located, which is called the median. For example, the diastolic blood pressure of 9 individuals was as follows (Fig. 11).
The median is 79 which is the value of the middle observation (Fig. 12).
If there are 10 values instead of 9, the median is worked out by taking the average of the two middle values. That is, if the number of items or values is even, the practice is to take the average of the two middle values. For example, the diastolic blood pressure of 10 individuals was: Fig. 13.
In the example given, the median will be 79+81 divided by 2 which is 80 (Fig. 14).
The relative merits of median and mean may be examined from the following example: The income of 7 people per day in Rupees was as follows:
5,5,5, 7, 10,20, 102 = (Total 154)
The mean is 154 divided by 7 which is 22; the median is 7 which is the value of the middle observation. In this example, the income of the seventh individual (102) has seriously affected the mean, whereas it has not affected the median. In an example of this kind median is more nearer the truth, and therefore more representative than the mean.

The Mode –

The mode is the commonly occurring value in a distribution of data. It is the most frequent item or the most "fashionable" value in a series of observations. For example, the diastolic blood pressure of 20 individuals was:
85,75,81,79,71,95,75,77,75,90,
71,75,79,95,75,77,84,75,81,75
The mode or the most frequently occurring value is 75. The advantages of mode are that it is easy to understand, and is not affected by the extreme items. The disadvantages are that the exact location is often uncertain and is often not clearly defined. Therefore, mode is not often used in biological or medical statistics.

MEASURES OF DISPERSION

The daily calorie requirement of a normal adult doing sedentary work is laid down as 2,400 calories. This clearly is not universally true.
There must be individual variations. If we examine the data of blood pressure or heights or weights of a large group of individuals, we will find that the values vary from person to person. Even within the same subject, there may be variation from time. The questions that arise are: What is normal variation? And how to measure the variation?
There are several measures of variation (or "dispersion" as it is technically called) of which the following are widely known: (a) The Range; (b) The Mean or Average Deviation; (c) The Standard Deviation;

(a) The Range –

The range is by far the simplest measure of dispersion. It is defined as the difference between the highest and lowest figures in a given sample. For example, from the following record of diastolic blood pressure of 10 individuals
83,75,81,79,71,90,75,95,77,94.
It can be seen that the highest value was 95 and the lowest 71. The range is expressed as 71 to 95 or by the actual difference (24). If we have grouped data, the range is taken as the difference between the mid-points of the extreme categories. The range is not of much practical importance, because it indicates only the extreme values between the two values and nothing about the dispersion of values between the two extreme values.

(b) The Mean Deviation –

It is the average of the deviations from the arithmetic mean.
It is given by the formula:
M.D = [∑( x - )] / h;
Example: The diastolic blood pressure of 10 individuals was as follows: 83, 75,81, 79, 71, 95, 75, 77,84 and 90. Find the mean deviation.
Answer (Mean deviation)

Diastolic B.P.	Arithmetic Mean	Deviation from the Mean
x		(x - )
83	81	2
75	81	-6
81	81	0
79	81	-2
71	81	-10
95	81	14
75	81	-6
77	81	-4
84	81	3
90	81	9
Total = 810		Total = 56 (ignoring ± sign)

Mean = 810 / 10; = 81;
The Mean Deviation = 56/10 = 5.6

(c) The Standard Deviation –

The standard deviation is the most frequently used measure of deviation. In simple terms, it is defined as "Root - Means Square - Deviation." It is denoted by the Greek letter sigma s or by the initials S.D. The standard deviation is calculated from the basic formula:
S.D. = ;
When the sample size is more than 30, the above basic formula may be used without modification. For smaller samples, the above formula tends to underestimate the standard deviation, and therefore needs correction, which is done by substituting the denominator (11-1) for T]. The modified formula is as follows:
S.D. =
The steps involved in calculating the standard deviation are as follows:
(a) First of all, take the deviation of each value from the arithmetic mean, ® (x - )
(b) Then, square each deviation ® (x - )²
(c) Add up the squared deviations S( x - )²
(d) Divide the result by the number of observations h ® [or (h - 1) in case the sample size is less than 30]
(e) Then take the square root, which gives the standard deviation.
Example: The diastolic blood pressure of 10 individuals was as follows: 83, 75, 81, 79, 71, 95, 75, 77, 84, 90. Calculate the standard deviation.

Answer
X	(x - )	(x - )²
83	2	4
75	-6	36
81	0	-
79	-2	4
71	-10	100
95	14	196
75	6	36
77	4	16
84	3	9
90	9	81
= 81 h = 10		Total = 482

S.D. = = = 53.55
The meaning of standard deviation can only be appreciated fully when we study it with reference to what is described as normal curve. For the present, we may contend with the basic significance of standard deviation - that it is an abstract number; that it gives us an idea of the 'spread' of the dispersion; that the larger the standard deviation, the greater the dispersion of values about the mean.

SAMPLING

When a large proportion of individuals or items or units have to be studied, we take a sample. It is easier and more economical to study the sample than the whole population or universe. Great care therefore is taken in obtaining a sample. It is important to ensure that the group of people or items included in the sample are representative of the whole population to be studied.

The Sampling frame –

Once the universe has been defined, a sampling frame must be prepared. A sampling frame is a listing of the members of the universe from which the sample is to be drawn. The accuracy and completeness of the sampling frame influences the quality of the sample drawn from it.

Sampling Methods –

The following three methods are most commonly used:
(1) Simple random sample: This is done by assigning a number to each of the units (the individuals or households) in the sampling frame. A table of random numbers is then used (see page 651) to determine which units are to be included in the sample. Random numbers are a haphazard collection of certain numbers, arranged in a cunning manner to eliminate personal selection of unconscious bias in taking out the sample. With this procedure, the sample is drawn in such a way that each unit has an equal chance of being drawn in the sample. This technique provides the greatest number of possible samples.
(2) Systematic random sample: This is done by picking every 5th or 10th unit at regular intervals. For example, to carry out a filaria survey in a town, we take 10 per cent sample. The houses are numbered first. Then a number is selected at random between 1 and 10 (say four). Then every 10th number is selected from that point on 4, 14, 24, 34, etc. By this method, each unit in the sampling frame would have the same chance of being selected, but the number of possible samples is greatly reduced.
(3) Stratified random sample: The sample is deliberately drawn in a systematic way so that each portion of the sample represents a corresponding strata of the universe. This method is particularly useful where one is interested in analysing the data by a certain characteristic of the population, viz. Hindus, Christians, Muslims, age-groups etc. - as we know these groups are not equally distributed in the population.
It is useful to note at this stage that Greek letters are usually used to refer to population characteristics: mean (m), standard deviation (s), and Roman letters to indicate, sample characteristics: mean ( ), standard deviation (s).

Sampling Errors –

If we take repeated samples from the same population or universe, the results obtained from one sample will differ to some extent from the results of another sample. This type of variation from one sample to another is called sampling error. It occurs because data were gathered from a sample rather than from the entire population of concern. Presuming that the sampling procedure is such that all the individuals in the population are favoured equally to come to the sample, the factors that influence the sampling error are: (a) the size of the sample and (b) the natural variability of the individual readings. As the size of the sample increases, sampling error will decrease. As the individual readings vary widely from one another, we get more variability from one sample to another.

Non-Sampling Errors –

The sampling error is not the only error which arises in a sample survey. Errors may occur due to inadequately calibrated instruments, due to observer variation, as well as due to incomplete coverage achieved in examining the subjects selected and conceptual errors. These are often more important than the sampling errors.

Standard Error –

If we take a random sample (h) from the population, and similar samples over and over again we will find that every sample will have a different mean ( ). If we make a frequency distribution of all the sample means drawn from the same population, we will find that the distribution of the mean is nearly a normal distribution and the mean of the sample means practically the same as the population mean (m). This is a very important observation that the sample means are distributed normally about the population mean (m). The standard deviation of the means is a measure of the sample error and is given by the formula which is called the standard error or the standard error of the mean: Since the distribution of the means follows the pattern of a normal distribution, it is not difficult to visualize that 95 per cent of the sample means will lie within limits of two standard error [m ± 2 ( )] on either side of the true or population mean. Therefore standard error (S.E.) is a measure which enables us to judge whether the mean of a given sample is within the set confidence limits or not.

Confidence limits	Normal deviate (N.D.) = (x - m) /	Significance
m is outside the 95 per cent confidence limits	N.D. > 2	P < 0.05; Significant at 5% level
m is just within 95 per cent confidence limits	N.D. =2	P = 0.05; Just significant at 5% level
m is within the 95 per cent confidence limits	N.D. < 2	P > 0.05; Not significant at 5% level

Video Description

· Don’t forget to do these things if you get benefitted from this article

o Visit our Let’s contribute page https://keedainformation.blogspot.com/p/lets-contribute.html

o Follow our page

o Like & comment on our post

Search This Blog

Information Keeda

Statistical Methods

Comments

Popular posts from this blog

Bio Medical Waste Management

Basic concepts of Pharmacology

Introduction, History, Growth & Evolution of Management