A dataset of media releases (Twitter, News and Comments, Youtube, Facebook) from Poland related to COVID-19 for open research

Home » Cocreating Eosc » A dataset of media releases (Twitter, News and Comments, Youtube, Facebook) from Poland related to COVID-19 for open research


The widespread use of the traditional media and social media on the Internet provides us with an invaluable source of information on societal dynamics during pandemics. With this dataset, we aim to understand mechanisms of COVID-19 epidemic-related social behavior in Poland deploying methods of computational social science and digital epidemiology. In this study, we look at the question of how understanding communication patterns could possibly support epidemiological harm reduction campaigns:

What? (content (key vocabulary, topics and sentiment) as on risk communication, fake news etc.)

Who? ( categories of senders of information, Who are main actors and communities in discourse)

When? (timeline, how does perception of disease evolve?)

Where? (geography, cross-regional comparison)

How? (providing new information or blocking existing channels, Which factors affect risk perception and adherence to NPI?)

Watch our past EOSC Webinars: Fighting COVID-19 with Open Science

Objectives & Challenges

We attempted to systematise knowledge on a Poland-specific social background using digital footprints. The analysis of electronic Internet media makes it possible to analyse the COVID-19 perception in a given country (Poland) and detect possible behavioral changes associated with the epidemic early, which is crucial for a targeted response and tailored containment scenarios to minimise public health risks. Moreover, collected materials have been labeled and important information retrieved: secondary data analyses of registration data, as media in Poland in connection with the COVID-19 pandemic. We have delivered a unique non-English database of coded (labeled) tweets, to be used in machine learning models for misinformation detection in case of COVID-19 and spread in the European Union. There are plenty of approaches to detect fake news and misinformation (even few for the Polish language), however, up to our knowledge, our labelled dataset is the only open data available for the eastern blank of the EU.

Main Findings

We have collected and analysed the perception of COVID-19 on the Internet in the Polish language between 15/01/2020-31/07/2020 and labeled data quantitatively (Twitter, Youtube, Articles) and qualitatively (Facebook, Articles and Comments of Articles) on the Internet by an infomediological approach:

  • manually labelled 1,449 articles / Facebook posts from Lower Silesia and 111 texts from outside this region;
  • manually labelled the 1000 most popular tweets with categories is_fake topic and sentiment;
  • manually labelled the 500 most popular comments on Youtube during protests with categories is_fake (categorical and numeric) topic and sentiment;
  • extracted 57,306 representative articles in Polish using Eventregitry.org tool in the Polish language and topic "Coronavirus" in article body;
  • extracted 1,015,199 Tweets with #Koronawirus in the Polish language using Twitter API.
  • collected 1,574 videos with keyword: Koronawirus on YouTube and 247,575 comments on them using Google API;
  • we supplemented the media observations with an analysis of 244 social empirical studies till 25/05/2020 on COVID-19 in Poland.

We identified several potential target subpopulations and provide possible regional classification and resource allocation. We distinguished clusters of regions where communication strategies should be taken into consideration: 1) extending campaign reach, common social goods and conformism; 2) individual’s profits and misinformation blocking.

Main Recommendations

Lack of proper communication and wrong allocation of resources caused Poland to lead Europe in excess mortality due to COVID-19. We are signalling the need for profiling and regionalisation in campaigns, and propose possible starting points for protocols for various voivodeships/poviats and subpopulations.

The main deliverables :