Fundamental Science: How do we know that smoking causes cancer?

December 20, 2021

Robert

Randomized Clinical Trials, RCT Randomized Controlled Trial), entered the vocabulary of those who followed the discussions about Covid-19. In this experiment, participants are divided into groups at random; one of the groups receives a vaccine or a new drug, for example, and the other, known as the control group, is offered something with a known effect, an inert substance or the usual treatment for that condition. These trials involve many participants, are expensive and complex to conduct and monitor. And they often run into ethical issues.

Such experiments allow us to differentiate a causal relationship from a statistical association. But how to carry out a randomized clinical trial to try to establish a causal relationship between smoking and lung cancer? We would have to force one of the groups to smoke for years, aware of the risk of harming these people. And we would still need to follow them through these years, to ensure that they continued smoking until the tumors appeared. If until today there has been no RCT to establish this relationship, how is it that since the mid-1960s there has been a scientific consensus that smoking causes cancer?

First of all, it is important to understand other hypotheses. Until the 18th century, lung cancer was not even a described disease; in the mid-1900s there were only 140 documented cases. With the technological advance that made possible the mass production and commercialization of cigarettes, in 1950 this cancer was already the most diagnosed tumor among men in the United States. It is reasonable to assume that it was previously diagnosed as another disease, but detailed autopsies in Germany are evidence that it was not a common disease.

In 1900 per capita consumption in the United States was 54 cigarettes per year and jumped to 4,345 in 1963, according to Otis Brawley in a 2013 article. Although correlation does not imply causality, the explosion in consumption and cases of the disease has called for attention of several scientists. Hypotheses have been raised by blaming car and industrial smoke, the 1918 influenza pandemic, and even a gene that would have an effect on both smoking and developing the disease. Several relevant names participated in this discussion, from the author of “How to Lie with Statistics”, Darrell Huff, to the famous statistician Sir Richard Fischer.

Unfortunately, both were on the “wrong” side of the story, as they downplayed the role of cigarettes in the development of the disease. As it was not possible to perform an RCT, and the methods of causal inference in observational data were not as sophisticated as they are today (the 2019 and 2021 Alfred Nobel prize in economics in memory honored researchers in this area), it was not trivial to solve possible confounding biases that could confound estimates of the effect of smoking on the development of lung cancer.

Natural experiments, among other methods of investigation, make up a large area of causality literature called “causal inference in observational data” – through them, we try to solve the same problems posed to RCTs, or even more. One such method is Mendelian randomization. Smokers die earlier than non-smokers, but there are other habits that are more common among smokers, such as increased alcohol consumption and less healthy diets. One way to avoid confounding factors would be to do an RCT, but as we have seen, it is neither practical nor ethical. Observational data, in turn, do not guarantee that our measurements will be accurate or that there are no other confounding factors.

On the other hand, we’ve had our genes since we’re conceived, and our lifelong choices and experiences don’t alter them. It is as if our lives are part of a randomized experiment, as having a version of a gene does not depend on other issues that are problematic in conventional experiments.

An example of these issues is that most participants in the intervention group of an experiment may have a severe version of the disease, and their terminal state was the reason why they decided to take the risk of trying an experimental drug, which would bias the study results. This sort of thing shouldn’t happen when we randomly assign participants to study groups, and in the case of Mendelian randomization that’s what happens: it’s not in our power to have chosen to be born with such a version of a particular gene.

With this in mind, scientists have identified a gene known as CHRNA5; smokers who had a certain version of this gene tended to smoke less than those who had another variant. When grouping people according to the versions of the CHRNA5 gene, those who had the one associated with heavy smoking tended to die younger, from heart disease or lung cancer. One of the hypotheses supported by Fischer in the past was that some gene could shorten the lives of these individuals, a gene responsible for events other than smoking. But surveys of nonsmokers who had the version of the gene associated with heavy smoking revealed that there was no difference in life expectancy. Thus, the decrease in longevity must be due to the habit of smoking.

However, we are talking about the 1960s, when the area of causal inference and genetic sequencing was behind the current scenario. So, how did they come to that conclusion? It was a summation of strategies. Population studies following like-minded people whose only substantial difference was whether or not they smoked pointed to a higher relative risk of death from lung cancer among smokers. Experiments on model animals, such as rabbits and mice, have revealed similar results. Sensitivity analyses, which seek to assess the reasonableness of alternative hypotheses, suggested that other hypotheses were not plausible given the magnitude of the disease. Pathologists also noted that cigarette smoke caused damage to cells in the airways and in regions where tumors used to appear. Other scientists found that chemicals with polycyclic aromatic hydrocarbons, which were already known to be carcinogenic, were found in cigarettes.

This accumulation of evidence, and the scale that lung cancer has taken on, helped to form a consensus, which has only grown stronger with the stronger evidence that has emerged over the years. It is evident that systematic reviews and meta-analyses were carried out along this path, giving strength to the studies carried out in that period. In addition, public policies seeking to combat smoking, based on the hypothesis that science pointed out, brought favorable results, further endorsing the hypothesis that is currently consensual.

Complex phenomena tend to be controversial at some point. Over time, however, efforts from different areas of science tend to complement each other, allowing for a reasonable understanding and guiding us to make decisions towards a better society.

Marcel Ribeiro-Dantas is a researcher at the Institut Curie, part of the PSL Research University and a doctoral candidate at the Sorbonne University, where he researches causal inference in observational data in the health area.

Subscribe to Serrapilheira’s newsletter to follow more news from the institute and from the Ciência Fundamental blog.