data management plan

The main data collected for this scientific research are of a public nature: they were freely disseminated both in and by companies (streaming platforms and content aggregators, radio and TV companies, newspapers and magazines, internet portals) and by individuals (journalists, content producers, politicians).

Data collection is carried out through observation on different channels, such as social media sites (Instagram, Facebook, X, Bluesky, LinkedIn, among others), Wiki sites (such as Wikipedia), the Lattes platform, as well as company employee profile pages.

Reports in different media outlets, interviews with journalists and podcast hosts, and opinions expressed by broadcasters on their own programs are also consulted.

If you notice any error in the information collected, or if you wish for information about you to be anonymized and/or deleted, please contact us by email at dgambaro@unicamp.br .

1

Data that will be collected

Podcast profiles (description of podcast series and episodes, tags used, production and distribution date, among others).
Public profiles of podcast production companies, including partners, funders, and political alignment, acquired through exploratory research that will investigate company websites, public company data in systems such as the Federal Revenue Service, and mentions of companies or their directors in journalistic material.
Audience figures, acquired from the number of views of a channel or episode, made available by the streaming service (Spotify, YouTube, etc.) or compiled and published by companies such as “Chartable.com”.
Public profiles of podcast creators: academic background, main professional occupation, declaration of political alignment, speeches and opinions made public by themselves on social media sites, in the podcasts they create, in widely accessible media programs (radio or TV programs, other podcasts and interviews with newspapers and magazines, among others of the same type).

Data that will be produced

All collected data will initially be compiled into Microsoft Excel workbooks. This will allow for tabulation and analysis of the data to identify matches and generate metadata. The data and metadata from this initial relationship will form a database accessible for both human and software reading.

To analyze the podcasts, the audio will be transcribed using software such as Adobe Premiere. The transcripts will not be made public and will be used solely by the researcher for this research. Based on the content analysis of the programs, data will be generated regarding the approaches contained in each program, as illustrated below:

Theme: topics covered, approach, positions and opinions of the presenters;
Language: script structure, sound elements, use of language and oral performance, etc.;
Extra features: quotes and parodies, insertion of songs or other pre-recorded material, etc. This information will be related to the podcasts, when creating another database in Microsoft Excel format.

In addition to Microsoft Excel, the data will be processed using open-license software such as R Studio, FactorMineR, KNime, or similar. These programs allow for multifactor correspondence analysis, generating metadata and, from this, graphs and matrices that aid in the interpretation of the data set.

2

3

Data manipulation during research

The various databases, both structured and unstructured, as well as those dedicated to interpretation and publication, will be stored on the researcher's private network and processed on their personal computer and/or a Unicamp computer, with password protection at all stages of access. Until each stage of the research is completed, only the researcher and the supervisor will have access to the tables, databases, and other generated materials.

Although the data will be stored in a private, password-protected folder on Google's cloud service, linked to the researcher's professional account at Unicamp, we will not use online software to process the data, thus minimizing the chance of leaks and interference from external attacks.

Disclosure of data and results

This website was created to disseminate the partial and final results, as well as the set of databases produced during the investigation. This material will be available for at least five years after the end of the investigation. Its sections include:

Research diary, a daily report on the research steps, the percentage of completion, comments on the phases and constant updates on the methods applied to generate and tabulate data.
Reading recommendations, with links to articles and books that serve as a bibliographic basis for research.
Data collected: This section provides the Excel files containing all the collected data, as well as the graphs generated for interpretation. In addition to the ethical importance of this section, it also aims to allow other researchers to use the same data set.
Results, where the reports and articles produced will be posted, aiming at the broad scientific dissemination of the results.

4

5

Data preservation

All collected and generated data must be stored in the cloud, accessed from the researcher's professional Unicamp Google account. To prevent data destruction and/or loss, and to ensure confidential access, daily backups will be made to an offline hard drive.

The researcher undertakes to keep the collected data, metadata and analyses stored redundantly in four repositories:

Offline hard drive, owned by the investigator, for a minimum period of 10 years after the end of the investigation.
For public access, in Microsoft's cloud storage service (OneDrive) contracted by the researcher, for at least 5 years after the end of the investigation.
For private access, in Google's cloud storage service (Google Drive) contracted by the researcher, for at least 5 years after the end of the investigation.
In Redu, Unicamp's official tool for the storage, preservation, sharing, reuse, and reproducibility of all generated research data.

In the event of termination of cloud services, data storage will be relocated to another, similar, storage location for the remaining period indicated in this plan.

Google DeepMind background images on Unsplash