Confounding is a statistical tool that is also known as confounding factor, lurking variable, confound or confounder. Extensive experiments of statistics and the studies of six sigma feature this topic and practical discussions on it.
There are certain situations where the effects of factor variables after interaction can’t be differentiated from each other. Their contribution to the result on the response variable is intertwined and can not be separated. This particular occurrence is called confounding. This occurs in some degrees in all situations but least in a well calculated experiment where the data is obtained having a predefined objective.
Statistically speaking, confounding variable, which is also known as lurking variable, confounding factor, confounder or confound, is a type of unrelated variable in statistical experiment and correlates, whether positively or negatively, with the independent as well as the dependent variable. This occurrence can lead to the Type I error and certain methodologies should be adopted to check it. It can prove a bar in the actual inferences validity that are furnished about cause & effect, that is, internal validity, because the effects would not be attributed the independent variable, that should have the real cause, but to the confounder.
By definition, a confounding variable is related with both the probable cause and the effect. It is not an effect of the probable cause either, rather it is one of the factors that is really or apparently intertwined with the real probable cause and appears to be the cause of the outcome.
A simple example should clarify the concept better. Suppose that a child’s height and a country’s gross national product or the GNP rise with time. Now, it can be concluded that the child’s height increases due to the rise in the country’s GNP or that the country’s GNP rises due to the increase in the height of the child. But there is the confounding or lurking variable that we are missing in this case and that is the actual cause of the effects. Even a simpler example would be like a soccer coach commands his team to run 5 miles every day during the morning for the upcoming match. Apart from these, the players start taking protein supplements. Now, the result on the match day can either be the running or protein supplements. This type of occurrences is called confounding.
Though thorough research has been carried out on the criteria of causality, it can also be said that the confounding factors cannot have a definition in statistical terms only and sometimes they are necessary factors. In 1965, a causal criteria set was prepared by Austin Bradford Hill. A simple criterion that is used when communicating causal assumptions in a causal graph is the “backdoor” which is to identify the sets of the possible lurking factors.
• Experimental Controls: A study can be modified to include or exclude the causal assumptions in a number of ways.
• Case-Control Studies: In this method, confounding variables are assigned to both types of groups, cases, which are to lead to the inferences and controls to which the comparisons are to be made for the probability of the factor to be the real cause. For example, if a study is being conducted on myocardial infarct’s real cause, and age can be one of the real deciding factors, then each 56 year old myocardial infarct patient would be compared with another healthy 56 year old person, that acts as the control. In this method, most factors that are matched are sex and age.
• Cohort Studies: Matching to some degree is possible in this method and it can be done by admitting a certain age group or sex group into the experimental population and hence, all cohorts can be compared regarding to the confounding variable. Let’s take the example above; if sex and age prove to be the confounders in the study, then 40-60 years old males to be admitted to the study and thus the risk of myocardial infarct can be assessed in the cohorts according to the level of active lifestyle they lead.
• Stratification: Physical activity is supposed to be one of the factors, in the above example, in the cause of the myocardial infarct and age being a lurking variable in the study. Now, the data is being stratified and categorized in different age groups, that is, the relation between myocardial infarct risk and the level of involvement in physical activity would be decided according to each age group. When different age groups yield different results, it surely has got to be a confounder.
But, these methods also has their own drawbacks, like matching for case control studies can really be a hard job to do, while cohort studies can cut out a lot of data out in the process or as in the stratification, the inferences made might be from a very small population and be absolutely different from the true cause. Controlling the confounding variables by measuring them and including as covariates in the processes of multivariate analysis, can have a drawback, that is, it might not serve to provide a true measurement of the strength of the confounder. The problem that occurs here is that all confounding variables are not measurable or known, which might lead to residual confounding. The best way to deal with it is randomization, if carried out successfully across a large sample population; confounding variables will get equally distributed among all specific study groups.