Description
This is a presentation explaining various methods of data collection and sampling.

Data Collection and Sampling

January 31, 2013

Dr.B.Sasidhar

1

Types of Data - Two Types
?

Qualitative Categorical or Nominal: Examples are?Color ?Gender ?Nationality

?

Quantitative Measurable or Countable: Examples are-

January 31, 2013

?Temperatures ?Salaries ?Number of points scored on a 100 Dr.B.Sasidhar point exam

2

Scales of Measurement
• Nominal Scale - groups or classes
?Gender, Nationality • Ordinal Scale - order matters

?Ranks (top ten videos)
• Interval Scale - difference or distance matters – has arbitrary zero value.

?Temperatures (0F, 0C), Likert Scale
• Ratio Scale - Ratio matters – has a natural zero value. ?Salaries, Age, Income
January 31, 2013 Dr.B.Sasidhar 3

Types of Data

For example:
•The Statistical abstracts of the United States, compiles data from primary sources • Compustat, sells variety of financial data tapes compiled from primary sources

PUBLISHED DATA – This is often a preferred source of data due to low cost and convenience.

– Published data is found as printed material, tapes, disks, and on the Internet. For example: – Data published by the Data published by the organization that has US Bureau of Census. collected it is called PRIMARY DATA.

– Data published by an organization different than the organization that has collected it is called January 31, 2013 Dr.B.Sasidhar SECONDARY DATA.

4

Secondary Data
• Secondary Data are data that have already been collected for purpose other than the problem at hand. • Can be located quickly • Inexpensive • Example: Stock exchange directory, Government data on population etc. • IMF, World Bank, ADB and other international Organisations • Web pages of different Organisations
January 31, 2013 Dr.B.Sasidhar 5

Primary versus Secondary data
Differences
Primary data Collection Purpose Collection process Collection cost Collection Time
January 31, 2013

Secondary data

For the problem at hand Very Involved
High

For other problems Rapid and easy

Relatively Low

Long
Dr.B.Sasidhar

Short
6

Classification of Secondary data
Secondary data

External Internal

Ready to use

Further process

Published Material
January 31, 2013

Computerised data bases
Dr.B.Sasidhar

Syndicated services 7

External Secondary Sources
Published Secondary data

Government sources

Internet Sources

Census data
General Business sources

Other Government Publications

Guides
January 31, 2013

Indexes

Annual Reports Dr.B.Sasidhar of companies

Directories
8

Observational and experimental studies
• When published data is unavailable, one needs to conduct a study to generate the data.
– Observational study is one in which measurements representing a variable of interest are observed and recorded, without controlling any factor that might influence their values. – Experimental study is one in which measurements representing a variable of interest are observed and recorded, while controlling factors that might influence their values.
January 31, 2013 Dr.B.Sasidhar 9

Primary Data
Primary data are collected by conducting a survey. If the survey covers the entire population, then it is known as the census Survey or complete enumeration. In contrast, if the

survey covers only a part of a population, or a
subset from a set of units, with the object of investigating the properties of the parent population

or set, it is known as the sample survey.
January 31, 2013 Dr.B.Sasidhar

10

Advantages of Sampling
1. A sample survey is cheaper than a census survey. 2. Since the magnitude of operations involved in a sample survey is small, both the execution of the field work and the analysis of the results can be carried out speedily. 3. Sampling results in a greater economy of effort as a relatively small staff is required to carry out the survey and to tabulate and process the survey data. 4. As compared to the census survey, more detailed information can be collected in a sample survey.

5. Since the scale of operations involved in a sample survey is
small, the quality of interviewing, supervision and other related activities can be better than that in a census survey.
January 31, 2013 Dr.B.Sasidhar

11

Limitations of Sampling
1. When the information is needed on every unit in the population such as individuals, dwelling units or business establishments, a sample survey cannot be of much help for it fails to provide information on individual count.

2.

Sampling gives rise to certain errors. If these errors are too large, the results of the sample survey will be of extremely limited use.

3.

While in a census survey it may be easy to check the omissions of certain units in view of complete coverage, this is not so in the case of a sample survey. January 31, 2013 Dr.B.Sasidhar 12

Surveys
• Surveys solicit information from people. • Surveys can be made by means of
– personal interview – telephone interview – through mailed questionnaire

January 31, 2013

Dr.B.Sasidhar

13

Survey Methods
Telephone Personal Mail

Traditional Telephone

Computer asst. TI

In-Home

Mall Intercept

CAPI

January 31, 2013

Dr.B.Sasidhar

Mail interview

Mail Panel 14

Questionnaire
• Questionnaire: A structured technique for data collection consisting of a series of questions, written or verbal, that a respondent answers. • Objectives of questionnaire:
– 1. Information needed, into a set of questions – 2. Motivate and encourage the respondent to give answers – 3. Should minimise the response error
January 31, 2013 Dr.B.Sasidhar 15

Questionnaire design process
1.Specify the information needed
2.Specify the type of interviewing method

3.Determine the content of individual questions
4.Design the questions to over come the Respondent’s unwillingness to answer

5.Decide the structure of the questions
6.Determine the question wording 7.Arrange the questions in proper order 8.Decide the form and layout 9.Pre-test the questionnaire
January 31, 2013 Dr.B.Sasidhar 10.Eliminate Bugs if any 16

Question Content
• Every question should elicit information which is needed to the study • If there is no use for the data resulting from a question, that question should be eliminated • However the introductory questions are to be there else the respondents may not co-operate • Some questions will be duplicated to test the reliability or validity • Questions unrelated to the immediate problem may be included and may be used later
January 31, 2013 Dr.B.Sasidhar 17

Questions - Structure
• Double Barreled Questions - A single question attempts to to cover two issues simultaneously • Example: Is Coca-Cola a tasty and refreshing soft drink? • Tasty and refreshing are two issues and should be asked in two questions. Else we will get ambiguous answers. • Multiple questions: Why do you shop in Big Bazaar? • Many types of answers, sometimes unwanted and coding will be difficult
January 31, 2013 Dr.B.Sasidhar 18

Structured Questions
• A structured question could be – 1. Multiple choice – 2. Dichotomous – 3. A Scale • Multiple choice questions could be coded easily but difficult to give all alternatives • Dichotomous questions have only two answers yes or no, agree or disagree etc • sometimes don’t know or both or none are given as alternatives • In scaling attributes like quality, beauty etc will be tested with say Lirket Scale etc January 31, 2013 Dr.B.Sasidhar 19

Question Wording
• 1. Define the issue in simple words • 2. Use unambiguous words • 3. Avoid leading and biasing questions (a question that leads the respondent to the desirable answer needed by the Researcher) • 4. Avoid implicit alternatives • 5 . Avoid implicit assumptions • 6. Avoid generalisations and Estimates • 7.Use positive and negative statements January 31, 2013 Dr.B.Sasidhar 20 for measuring attitudes and lifestyles

Inability to Answer
• The respondents may not remember, may not articulate or describe, may not be informed • Wife may not inform the husband about the groceries purchases and vice versa • Many things we may not remember like what you ate 4 days back • Not well educated people may not be able to describe the answer • Filter questions may be asked to drop these people. Before asking how many kids you have ask whether he or she is married.
January 31, 2013 Dr.B.Sasidhar 21

Unwillingness to Answer
• The possible reasons for unwilling to answer • 1.Too much effort is necessary • 2. The situation or context mat not be appropriate • 3. No legitimate purpose (if income is asked) • 4. If Sensitive questions are asked • 5. If privacy is touched
January 31, 2013 Dr.B.Sasidhar 22

Questionnaire Form and Layout
• 1. Divide the questionnaire into several parts • 2. Questions in each part should be numbered • 3. The questionnaire should be precoded • 4. The questionnaires themselves should be numbered serially • 5. This facilitates the control of questionnaires in the field as well as January 31, 2013 Dr.B.Sasidhar coding

23

Questionnaire Pretesting
• 1. All aspects of questionnaire should be tested (content, wording, sequence, from and layout, difficulty and instructions) • 2. The respondents should be similar to those who will be included in the actual survey • 3. The pretest sample size should be around 30 • 4. After seeing the results revise the questions • 5. Again pretest • 6. The responses obtained in the pretest should be coded and analysed to see approximate results
January 31, 2013 Dr.B.Sasidhar 24

Sampling
• Motivation for conducting a sampling procedure:
– Costs. – Population size. – The possible destructive nature of the sampling process.

• The sampled population and the target population should be similar to one another.
January 31, 2013 Dr.B.Sasidhar 25

Sampling and Non-sampling errors
• Two major types of errors can arise when a sampling procedure is performed. • Sampling Error
– Sampling error refers to differences between the sample and the population, because of the specific observations that happen to be selected, due to wrong process of selection (Bias) – Sampling error is expected to occur when making a statement about the population based on the sample taken. Expect sampling error to decrease with increase in sample size.
January 31, 2013 Dr.B.Sasidhar 26

Sampling Errors
Population income distribution

m ( population mean) The sample mean falls here only because Sampling error certain randomly selected observations were included in the sample.

January 31, 2013

x ( sample mean)

Dr.B.Sasidhar

27

Non-sampling Errors
• Non-sampling errors occur due to mistakes made along the process of data acquisition • Increasing sample size will not reduce this type of errors. • There are three types of Non-sampling errors;
– Errors in data acquisition, – Non-response errors, – 2013 January 31, Selection bias. Dr.B.Sasidhar

28

Data Acquisition Error
Population

If this observation…

Sample

Sampling error + Data acquisition error

…is wrongly recorded here…
January 31, 2013 Dr.B.Sasidhar 29

…then the sample mean is affected

Non-Response Error

Population

No response here... …may lead to biased results here.

Sample

January 31, 2013

Dr.B.Sasidhar

30

Selection Bias

Population

When parts of the population cannot be selected...

Sample

…the sample cannot represent the whole population.

January 31, 2013

Dr.B.Sasidhar

31

Causes of Non-sampling Errors
1. Using imprecise definition or wrong concept while launching the survey. 2. Entrusting the survey work to untrained and inexperienced investigators. 3. Despatching a defective mail questionnaire to respondents who may not clearly understand certain questions. 4. Errors that may arise on account of non-response from respondents. 5. Poor supervision of the field staff. 6. Faulty tabulation while transferring the questionnaire data to tabulation sheets.

7. Calculation mistakes in the processing and analysis of data.
8. Committing mistakes while oral or written presentation of the survey results.
32

Target Population
• Population: It is the collection of units or objects
that posses the information sought by the researcher and about which inferences are to be made

• Dividend paying companies • Sampling unit: The basic unit containing the
elements of the population to be sampled • Like SAIL, Infosys etc. • Element: An object that possesses the information sought by the researcher and about which inferences are to be made. • Current ratio, ROI, Debt Equity ratio
January 31, 2013 Dr.B.Sasidhar 33

Sampling Frame
• Frame: A representation of the units of the
target population that consists of a list or set of directions for identifying the target population

• Stock Exchange directories, Industry Publications etc • Often it is the researcher who compiles the frame • He or She should avoid inclusion of unnecessary units and should not omit the relevant units
January 31, 2013 Dr.B.Sasidhar 34

Sampling Techniques Classification

Non-probability Sampling

Probability Sampling

Convenience

Judgmental

Quota

Snowball

Random

January 31, 2013

Systematic

Dr.B.Sasidhar

Stratified

Cluster

35

Probability and Nonprobability Sampling
• Nonprobability Sampling: Sampling techniques that do not use chance selection procedures but rely on personal judgement of the researcher (Convenience procedure) • Probability Sampling: A sampling procedure in which each element of the population has a fixed probabilistic chance of being selected (Lottery method)
January 31, 2013 Dr.B.Sasidhar 36

Convenience Sampling
• The selection of sampling units is done by the researcher according to his convenience. • Examples: While studying consumer behaviour convenience sample may be used • This sampling is widely used in exploratory research • Useful in pre testing the questionnaire and pilot studies • It is the least time consuming and inexpensive method • This sample will not be a true representative of the population being studied
January 31, 2013 Dr.B.Sasidhar 37

Judgmental Sampling
• It is a form of convenience sampling in which the elements are selected by the researcher on some criteria • Example: Questions on E-commerce can’t be asked to all, but only to people looking like having internet computer knowledge • Useful in testing hypotheses of specialised nature • Limited use • It can’t be used for studies of general nature
January 31, 2013 Dr.B.Sasidhar 38

Quota Sampling
• It is a 2 stage restricted judgmental sampling. • First stage develops control categories on some criteria like age, sex etc. • Second stage the sample elements are selected based on convenience or judgment till the quota is fulfilled • Quota sampling will be effective in determining magazine readership
Control characteristic Sex Male Female Total Age 18-30 31-45 46-60 January 31, 2013 Total Sample % 48 52 100 27 39 34 Dr.B.Sasidhar 100 Sample items 480 520 1000 270 390 340 1000

39

Snowball Sampling
• It is a sample like snow ball • An initial sample is selected first on a random basis and subsequent respondents are selected based on referrals • An advertisement was given to get some data relating to work shop machines and the respondents were asked to give names and addresses of people who use same or similar type of machines. • This technique will be useful in collecting data about rare and antique pieces etc 40 January 31, 2013 Dr.B.Sasidhar

• Each element in the population has a known and equal probability of selection (lottery method) • To draw a random sample the researcher first compiles a sampling frame in which each element has a unique identification number • From a random table numbers will be drawn and the element will be added to sample traditionally • But now computer random numbers are chosen
January 31, 2013 Dr.B.Sasidhar 41

Simple Random Sampling

Random Sampling (contd.)
• Merits: 1. It is easy to understand • 2. It could be projected to the target population • 3. All Statistical inferences assume that the data have been collected by random sampling • Demerits: 1. Difficult to construct sampling frame • 2. Time consuming • 3. Cost of data collection is also high • 4. Often results in larger standard errors • 5. It may or may not result in a representative sample
January 31, 2013 Dr.B.Sasidhar 42

Systematic Sampling / Quasi-random Sampling

• In Systematic sampling procedure the sample is chosen by selecting a random starting point and then picking every ith element in succession from the sampling frame. • Gallup poll, data collection from say 4th house and then pick up every 5th house ie 4,9,14,19,24, 29,34…..and so on. • Sales of companies can be arranged in ascending order and systematic sampling can be used. This will give a fair sample of big and small firms for analysis • This is less costly and easy to collect data • Random number generation need not be understood January 31, 43 • Can be2013 applied even if Dr.B.Sasidhar Sampling frame is not available

Stratified Sampling
• This sampling uses 2 step process. First it partitions the population into sub populations or strata. Secondly the elements are selected from each stratum by a random procedure. • The elements with in each strata should be homogeneous as for as possible and it should be heterogeneous in different strata • People can be classified into strata on the basis of income, age etc and samples can be collected from each strata • This increases the precision but not the cost
January 31, 2013 Dr.B.Sasidhar 44

Cluster Sampling / Area Sampling
• This sampling uses 3 step process. First it partitions the population into sub populations or Clusters. Secondly few clusters are selected by random sampling procedure. Thirdly all elements in a cluster or a few elements from each cluster is chosen as sample • The elements with in each cluster should be homogeneous as for as possible and it should be heterogeneous in different clusters
Stratified Sapling 1 All strata are selected for sampling 2 Homogeneity within subgroups and heterogeneity between subgroups 3 Elements are randomly selected from within each subgroup Cluster Sampling Few clusters are selected Heterogeneity within subgroups and homogeneity between subgroups Only a few subgroups are randomly selected and all elements in those subgroups are covered
45

January 31, 2013

Dr.B.Sasidhar

Comparison of sampling techniques
Sampling 1.Convenience 2. Judgmental 3. Quota 4. Snowball 5. Random Merits Least expensive, lesser time consuming, most convenient Low cost, convenient, not time consuming Sample can be controlled for certain characteristics Can estimate rare characteristics Easy to understand, results could be projected Easy, no sampling frame, Can increase the representativeness, Includes all sub populations, precision Easy, Cost effective
Dr.B.Sasidhar

6. Systematic 7. Stratified 8. Cluster
January 31, 2013

Demerits Selection bias, not representative, not suitable for descriptive and causal Does not allow generalisation subjective Selection bias, no assurance of representativeness Time-consuming Difficult to construct sampling frame, expensive, lower precision, no assurance of representativeness No randomness Selection of criteria for strtification difficult, expensive Difficult to cluster and interpret, imprecise
46



doc_478661836.ppt
 

Attachments

Back
Top