CHAPTER 1: Introduction TERMS: We shall begin our discussion with some basic terminology: STATISTICS: A method of quantification, organizing and summarizing data. DATA: Another name for the numbers or scores collected. POPULATION: The collection of individuals to whom the research is concerned or whomever the study tries to describe. SAMPLE: A portion or subset of the population of interest. PARAMETER: A characteristic in the population, such as the average age of its members. STATISTIC: A characteristic in the sample; therefore, an estimate of a parameter. The average age of the members of a sample may be the best guess of the average age of the population from which the sample was drawn. INFERENTIAL STATISTICS: A method of quantification, organizing and summarizing data based upon a sample. DESCRIPTIVE STATISTICS: A method of quantification, organizing and summarizing data based upon a population. *Note...The plural of statistic (statistics) is not defined as more than one statistic, as the term statistics shall be reserved for the method of quantification and each statistic shall be evaluated one characteristic at a time. DISCUSSION: Statistics is a technique that is used to describe and assess the features of a specified group of people, the population. The term population is not necessarily defined by a geographic boundary or nationality, the way it usually is in casual conversation. If the researcher is interested in the AIDS epidemic, then all people with AIDS may be included in the referent population. If interest is limited to only those cases within a specific nationality, say for the United States, then the referent population will be the US AIDS cases, only. If interest is limited to young men, as they are the majority of AIDS cases, the population may be called US AIDS AMONG MEN. Note the title of the population grows larger as the population grows more specific. As it is not likely that every member of the population can participate in the study, a portion of the population will actually be assessed. That portion, or sample, will serve to 'best guess' the status of the referent population. Statistical techniques will be used to quantify certain features, or parameters, of the population. When a sample is used, those same techniques (with minor adjustments to the formulas) will be applied to each feature, or statistic, of that sample. As a result, there are two types of Statistics; Inferential, in the case of the sample, and Descriptive, in those cases when the researcher can actually use the entire population. As you might guess, the former happens much more often than the latter. Unless the population is very small, Inferential Statistics will be used. LEVEL OF DATA: Traditionally, statistical techniques had been chosen to fit the level of data that was collected. The data level, or scale, refers to the degree of precision of the measure used to score the subjects. The higher the level of the data, the greater the precision, and presumably, the greater the degree of detail gleaned from the measures. This was considered very desirable, as it was presumed that higher, more sophisticated analyses, which require more precise measures, could be applied to the analysis. In actuality, it is common practice for researchers to ignore the level of the data. Further, there is mathematical support that in the great number of studies a researcher might conduct in a career, any minor variation in interpretation of one's data would balance out. In so much as many social scientists and especially clinicians will not do many studies, this generous indulgence may be rather risky. As a result, many discussions of introductory statistics advise the novice to consider the level, or scale, of the data being analyzed. An Englishman named Stevens, suggested a standard: Nominal-data which is classified or named only. i.e. Cats and Dogs. Ordinal-data which is ranked by class. i.e. Course grades, A > B > C > D > F. Interval-data ranked by equal classes. i.e. The number of stars a critic gives a movie. Ratio-ranked by equal classes, with an absolute zero. i.e. Height or weight. Most sophisticated statistical procedures would require interval scale, though ratio was better still. The vague nominal and ordinal scales would require special techniques. An American by the name of Savage, however, pointed out, that as there were only two levels of techniques, only two levels of data were required. Discrete-data with a limited number of classes. i.e.1 It is either a dog or a cat; never both, though in some ways a dog may be preferred to a cat and vice versa. i.e.2 Nominal and ordinal scales. Continuous-data with an unlimited number of classes. i.e.1 Inches of height and satisfaction with one's spouse can be considered in degrees, while an absolute zero can only be specified in the case of inches. One may no longer be satisfied with one's spouse but it may not make sense to specify an absolute zero. i.e.2 Interval and ratio scales. It bears pointing out that the two levels of statistical techniques referred to are parametric and nonparametric procedures, which concerns a separate set of formulas. These terms should not be equated with the two types of statistics, descriptive and inferential, which alludes to the source of the data. Nonparametric procedures, which are often given less credence, will be discussed further in the final chapters of this course. NOTATION: Sometimes letters are use to denote specific functions in statistics, particularly, greek letters. Knowing these will aid calculating outcomes based upon formulas. The only new symbol introduced at this point is the upper case (capital) sigma [∑], which indicates that a summation is required. This does not negate the directives of parentheses [( )] to specify order of calculation. There in the notation X, one is to sum all numbers in the column called X. If X is squared and no parentheses appear, then it is insinuated that only the numbers in a column called X are each squared. It is the square of each of those numbers that are added. Therefore, X reads 'the sum of the squares'. If parentheses incorporate the X and the upper case sigma, the numbers in a row called X are to be summed first, and it is the sum of those numbers that is squared. Therefore, (∑X) is called 'the square of the sum'. The 'sum of the squares', then, is not equal to the 'square of the sums' as they are distinct concepts. This distinction becomes crucial in succeeding chapters, so be sure to do the problems in the text and the work book to clarify any obscurities you may have about them. METHODOLOGY: It was noted in the BASIC SKILLS section that the method of research determined the statistical technique required. The basic types of investigation in the social sciences are: NATURAL OBSERVATION (NO): Subjects are observed in their natural setting, without interference. Ideally, the observer will not be detected, though this is not always possible. ADVANTAGE: NO provides frequencies without the contrived trappings of a laboratory to tamper with the true flow of events. DISADVANTAGE: NO can cause subjects to behave self-consciously if the observer is detected, and they usually are. Further, observers who hope to stay long enough for selfconsciousness to ware off, may linger long enough to become engrossed in the phenomenon to the point of losing their objectivity. CORRELATION (Corr): Studies in which subjects may be asked questions for further clarification, usually in the form of a survey. Scales can add a degree of precision, or magnitude, as well (Likerk's Strongly Agree, Moderately Agree, etc.) ADVANTAGE: Corr provides frequency and magnitude of response, allowing comparison of fluctuations across factors, or variables. DISADVANTAGE: Corr can only assess linear relations. In the case of answer scales, such as Likerk's subjective rating scale, subjects are being told to restrict their answers to those available, which may or may not fit the actuality. Further, the quality of the data received can be determined, in part, by the form of the survey. Consider the pros and cons of each: INTERVIEW: Most expensive and most dishonest, as the least anonymous; however, most returns and detail. QUESTIONNAIRE: Least expensive and most honest, as the most anonymous; however least detail, as not their to make comments per item. Even worse, many subjects will simply not respond. Losing 70% of the subjects is common. PHONE SURVEY: Rather popular, as it seems to have a medial impact in all aspects (expense, honesty, detail and returns). Still, if money or subjects are scarce, the researcher may not have a choice. THE EXPERIMENT: Unlike the first two discussed, which are only quasi-experimental, the true experiment is the only investigation techniques appropriate when considering a causal relation. This is because, only an experiment includes a manipulation under controlled circumstances. Specifically, the presumed causal agent (independent variable) is manipulated and the presumed effect (the dependent variable) is observed. Variables should be defined operationally; that is, in a way which can be measured. Thus, the dependent variable is also referred to as the dependent measure. ADVANTAGE: The experiment can support a causal inference, in addition to frequencies and magnitudes. DISADVANTAGE: The experiment is the most contrived, and may behave in a way quite different from outside of the testing situation (In the real world). Even though the results of an experiment are generally given more credence than quasi-experimental procedures, it may be necessary to do then for preliminary or exploratory purposes. Grant requests require some initial investigation to justify supporting a potential experiment. Further, conditions and data quality may make true experimentation impossible. DOCUMENTATION: Research of all levels benefit from clear documentation. Documentation may mean official certificates which verify subject records. In the most technical sense, the CASE STUDY is a form of record keeping, or documentation. Not a specific research method, all manner of information about a specific case may be included: the individuals's response to a survey; the individual's participation in an experiment, their IQ, health record, school grades or even their credit rating. DATA COLLECTION: The method of data collection also affects the data quality. This includes concerns about from whom the data was collected. If data was collected from all the major subsets of the population, the study can be described as CROSS-SECTIONAL. If subjects are followed for a long time, the study can be described as LONGITUDINAL. These are not types of research, per se. A given study can be both cross-sectional and longitudinal. MEDICAL RESEARCH: Medical research is especially sensitive to ethical restraints, so many studies are abridged compared to the social sciences. Natural Observation is not directly useful to the development of medical interventions. Correlations are distinguished as: CROSS-SECTION: Usually a preliminary one, time survey. RETROPECTIVE: A longitudinal study of archival cases, which is convenient, but based upon the memory of survivors or old documents which may not be accurate. PROPECTIVE: A longitudinal study of individual cases (cohort) followed forward in time; more accurate but very expensive. Medical experiments are called CLINICAL TRIALS. These are abridged, in that, the moment a treatment's effective is suspected, ethical restraint requires that all patients receive that treatment; even the control group.