STS and epidemiological scientific production using large databases in Brazil

Mariana Pitta Lima and Bethânia Almeida
10/30/2023 | Projects

Data, the science of social epidemiology and cohort studies

"Data are not raw materials; they are, as suggested by Latour (1999), achievements (p. 42)." When we started looking through the STS literature on the relationship between data production and health, we found the above paraphrase in Hoeyer, Bauer and Pickersgill’s work (p. 467). In Pandora's Hope, Bruno Latour follows and unpacks the process of botanical data production to develop his argument. In a similar argumentative line, in Data Journeys in Science, Sabina Leonelli uses the metaphor of the journey to draw attention to the tendency to consider data as raw material, suggesting the importance of examining different stages of data production. The author propose transforming the question from “What constitutes raw data?” to “What typologies of data processing are there, and what do they achieve within different types of inquiry?” (p. 17) to help us move forward in this reflection.

This post aims to explore initial reflections from an STS perspective on the production of science in epidemiology and data for health research in Brazil. We believe that STS can deepen critical reflection on the production of research involving the relationship between population data and health outcomes in specific studies that explore how "the social" enacts on health and disease. At the same time, health research based on large linked databases conducted in the Global South can also contribute to Critical Data Studies.
We base our reflection on research conducted at The Centre for Data and Knowledge Integration for Health (CIDACS) in Salvador, Bahia-Brazil, where we are researchers. When CIDACS was created in 2016, it aimed to integrate data and knowledge to answer questions related to social determinants of health from a population point of view in Brazil. It aimed to conduct interdisciplinary research, develop new methodologies, and promote professional training using large-scale linked databases, high-performance computing and a secure environment. The Centre focuses mainly on "administrative data" - electronic hospital records and other health-related information systems.

The 100 Million Brazilians Cohort. [Image credit: Cidacs/Fiocruz]

Data sources also include non-health registries, such as the Cad Único (Unified Registry for Social Programmes), which is a type of population registry consisting of all individuals who have applied for any type of social welfare assistance in Brazil since 2001. 

Although these data are not originally produced for research, they can be prepared and integrated for use and reuse in research which poses methodological challenges to obtain quality and representative linked data. In this sense, CIDACS developed specific algorithms to support data linkage process. The use of large-scale administrative databases and data linkage for population health research initially developed in high-income countries (Global North). Our preliminary literature review on STS and data production for health also shows that studies along these lines have also been developed mainly in high-income countries, different from the Brazilian context.

The research produced at CIDACS is mainly based on cohort studies - a method of epidemiological study that allows populations to be "followed" based on Kalender and Holmberg’s (2019, p. 585)  analysis of this form of epidemiological knowledge production.
The production of this design of research in Brazil has some peculiarities. The first is the large size of population. The amount of data produced in this continental country represents data on more than 100 million people, which gives the Centre’s database the name "The 100 Million Brazilians Cohort".  The STS lens allows us to analytically explore the relationships involved in producing this category of science, among technologies, infrastructure, algorithms, and researchers. This work is "packed", and involves different actors, from research, government, and the general population (the data subjects) and which has also, more recently, taken part in discussions about the very design of scientific investigations at the center initiatives.

As Hoeyer et al. (p. 467) argue, "technologies of counting" operate through specific materialities of creating, curating, storing, distributing, and analyzing them. How data is produced and what is done with it are shaped also by political and economic contexts.

In this post, we would like to draw attention to the potential of exploring the experiences of public health research based on large volumes of data in the Global South, in dialogue with STS publications. This epidemiological scientific production is in parallel with the efforts of the interdisciplinary research field of Social Medicine (or Collective Health, in Brazil), applying the use of data to apprehend "the social" in health and medicine. Brazil also has good quality administrative population registers, which makes this possible. For example, a study that was produced from the infrastructure of these data was the paper “Ethnoracial inequalities and child mortality in Brazil: a nationwide longitudinal study of 19 million newborn babies”, published in The Lancet Global Health, which revealed how child mortality is unequally distributed.
The project with which we are collaborating on directly is the LIFE Zika Study, which involves different cohorts. The project provides for cross-analyses of data produced through Zika Brazilian Cohorts (ZBC) Consortium and the CIDACS ´100 Millions Brazilians Cohort', where we STSers specifically intend to follow the data journeys in public health emergencies.

The “Long-term consequences of Zika virus infections during pregnancy for school-aged children and their families in Brazil” project involves different ways of producing science in health, based on administrative data, traditional cohorts and participatory qualitative research with women, families, and caregivers. Integrating people affected by epidemics into different parts of the research process is one of the strategies for producing a type of research that seeks to translate into benefits for the population. This means participating in the discussion about defining research priorities, preparing appropriate dissemination materials, and supporting the use of the data produced to advocate for political change and social support.
Mariana Pitta Lima is a post-doctoral researcher at CIDACS/Fiocruz working on the LIFE Zika project. She is one of the Assistant Editors in the Backchannels Global South team.

Bethânia Almeida is a civil servant with an academic background. She is a sociologist with a PhD in Public Health. She works at the Centre for Data and Knowledge Integration for Health (Cidacs-Fiocruz), which uses large-scale integrated datasets to generate scientific knowledge and provide evidence to support public policymaking. Her work at CIDACS focuses on data governance for public health research purposes, interdisciplinary research into social determinants of health in Brazil, and ELSI surrounding population data science and Open Science.

Published: 10/30/2023