Archive for October, 2010

Study Design

October 30, 2010 Comments off


100% of all disasters are failures of design, not analysis.

— Ron Marks, Toronto, August 16, 1994

To propose that poor design can be corrected by subtle analysis

techniques is contrary to good scientific thinking.

— Stuart Pocock (Controlled Clinical Trials, p 58) regarding the use of retrospective adjustment for trials with historical controls.

Issues of design always trump issues of analysis.

— GE Dallal, 1999, explaining to a client why it would be wasted effort to focus on the analysis of data from a study whose design was fatally flawed.

Bias dominates variability.

— John C. Bailler, III, Indianapolis, August 14, 2000

Statistics is not just a collection of computational techniques. It is a way of thinking about the world. Anyone can take a set of numbers and apply formulas to them. There are many computer programs that will do the calculations for you. But there is no point to analyzing data from a study that was not properly designed to answer the research question under investigation. In fact, there’s a real point in refusing to analyze such data lest faulty results be responsible for implementing a program or policy contrary to what’s really needed. Continue to read this valuable article at this link. From “Some Aspects of Study Design” by Gerard E. Dallal, Ph.D. via Study Design.

However, statistics still often get a bad press:

Lies, Damned Lies, and Medical Science

Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong. So why are doctors—to a striking extent—still drawing upon misinformation in their everyday practice? Dr. John Ioannidis has spent his career challenging his peers by exposing their bad science.

Read more of this here: link

Additional material on research design .


Cause & Effect

October 30, 2010 Comments off

“Cause and Effect”! You almost never hear these words in an introductory statistics course. The subject is commonly ignored. Even on this site, all it gets is this one web page. If cause and effect is addressed at all, it is usually by giving the (proper) warning “Association does not imply causation!” along with a few illustrations.

For example, in the early part of the twentieth century, it was noticed that, when viewed over time, the number of crimes increased with membership in the Church of England. This had nothing to do with criminals finding religion. Rather, both crimes and Church membership increased as the population increased.

via Causality: Cause & Effect.

There are more examples of this important principle at the address above.

Categories: First Principles

Is statistics hard?

October 30, 2010 Comments off

Statistics is backwards! You might think that given a particular set of data, you are able to say how likely it is that a particular theory is true. Unfortunately, you would be wrong!

One thing most people (even statisticians!) would like to do is describe how likely a theory or hypothesis might be in light of a particular set of data. This is not possible in the commonly used classical/frequentist approach to statistics, which is the approach taken in these notes. Instead, statistics talks about the probability of observing particular sets of data, assuming a theory holds.

We are NOT allowed to say, “Because of these data, there is only a small probability that this theory is true.” Instead, we say things like, “If this theory is true, the probability of seeing data like these is small.”

The first statement is relatively clear. If we could say that based on a particular set of data a theory has a 10% chance of being true, then the theory has a 10% chance of being true. The second statement is murky. If the result is that there is only a 10% chance of seeing data like these if a theory is true, is that small enough to make us doubt the theory? How likely are the data under some other theory? Perhaps there’s no theory under which data like these are more likely! This means we need methods for translating this latter type of statement into a declaration that a theory is true or false. As a result…

Statistical methods are convoluted! In order to show an effect exists, […]

via Is statistics hard?.

There are many more important points made on this site that need to be read by all those new (and not so new) to statistics.

Main parameters of a distribution

October 29, 2010 Comments off

The main parameters of a distribution are generally spoken of as (i) central tendency, (ii) variability, (iii) skew, (iv) kurtosis, and (v) modality. Central tendency and variability will need to be examined separately in considerable detail, since these two parameters need to be clearly understood and precisely measured…for more on this follow the link below…

via Ch2 Distributions Pt1.

Categories: Statistical Concepts

What shape is my data

October 29, 2010 Comments off

Fitting my observed data to a distribution

For predictive purposes it is often desirable to understand the shape of the underlying distribution of the population. To determine this underlying distribution, it is common to fit the observed distribution to a theoretical distribution by comparing the frequencies observed in the data to the expected frequencies of the theoretical distribution (i.e., a Chi-square goodness of fit test). In addition to this type a test, some software packages also allow you to compute Maximum Likelihood tests and Method of Matching Moments (see Fitting Distributions by Moments in the Process Analysis topic) tests.

Which Distribution to use. As described above, certain types of variables follow specific distributions. Variables whose values are determined by an infinite number of independent random events will be distributed following the normal distribution, whereas variables whose values are the result of an extremely rare event would follow the Poisson distribution. The major distributions that have been proposed for modeling survival or failure times are the exponential (and linear exponential) distribution, the Weibull distribution of extreme events, and the Gompertz distribution. The section on types of distributions contains a number of distributions generally giving a brief example of what type of data would most commonly follow a specific distribution as well as the probability density function (pdf) for each distribution.

via Distribution Fitting.

Categories: Statistical Concepts

Statistical Tests & Distributions

October 29, 2010 Comments off

Are all test statistics normally distributed?

Not all, but most of them are either based on the normal distribution directly or on distributions that are related to and can be derived from normal, such as t, F, or Chi-square.

Typically, these tests require that the variables analyzed are themselves normally distributed in the population, that is, they meet the so-called “normality assumption.” Many observed variables actually are normally distributed, which is another reason why the normal distribution represents a “general feature” of empirical reality.

The problem may occur when we try to use a normal distribution-based test to analyze data from variables that are themselves not normally distributed. In such cases, we have two general choices. First, we can use some alternative “nonparametric” test ; but this is often inconvenient because such tests are typically less powerful and less flexible in terms of types of conclusions that they can provide.

Alternatively, in many cases we can still use the normal distribution-based test if we only make sure that the size of our samples is large enough. The latter option is based on an extremely important principle that is largely responsible for the popularity of tests that are based on the normal function. Namely, as the sample size increases, the shape of the sampling distribution (i.e., distribution of a statistic from the sample; this term was first used by Fisher, 1928a) approaches normal shape, even if the distribution of the variable in question is not normal.

via Elementary Concepts in Statistics.


October 29, 2010 Comments off
A selection of Normal Distribution Probability...

Image via Wikipedia

The concept of the probability distribution and the random variables which they describe underlies the mathematical discipline of probability theory, and the science of statistics. There is spread or variability in almost any value that can be measured in a population (e.g. height of people, durability of a metal, sales growth, traffic flow, etc.).  For these and many other reasons, simple numbers are often inadequate for describing a quantity, while probability distributions are often more appropriate.

There are various probability distributions that show up in various different applications. Two of the most important ones are the normal distribution and the categorical distribution. The normal distribution, also known as the Gaussian distribution, has a familiar “bell curve” shape and approximates many different naturally occurring distributions over real numbers. The categorical distribution describes the result of an experiment with a fixed, finite number of outcomes. For example, the toss of a fair coin is a categorical distribution, where the possible outcomes are heads and tails, each with probability 1/2.

via Probability distribution – Wikipedia, the free encyclopedia.

For most people, knowing about distributions is only important insofar as it helps you to decide whether to use parametric or non-parametric tests on your data.