key: cord-186031-b1f9wtfn authors: Caldarelli, Guido; Nicola, Rocco de; Petrocchi, Marinella; Pratelli, Manuel; Saracco, Fabio title: Analysis of online misinformation during the peak of the COVID-19 pandemics in Italy date: 2020-10-05 journal: nan DOI: nan sha: doc_id: 186031 cord_uid: b1f9wtfn During the Covid-19 pandemics, we also experience another dangerous pandemics based on misinformation. Narratives disconnected from fact-checking on the origin and cure of the disease intertwined with pre-existing political fights. We collect a database on Twitter posts and analyse the topology of the networks of retweeters (users broadcasting again the same elementary piece of information, or tweet) and validate its structure with methods of statistical physics of networks. Furthermore, by using commonly available fact checking software, we assess the reputation of the pieces of news exchanged. By using a combination of theoretical and practical weapons, we are able to track down the flow of misinformation in a snapshot of the Twitter ecosystem. Thanks to the presence of verified users, we can also assign a polarization to the network nodes (users) and see the impact of low-quality information producers and spreaders in the Twitter ecosystem. Propaganda and disinformation have a history as long as mankind, and the phenomenon becomes particularly strong in difficult times, such as wars and natural disasters. The advent of the internet and social media has amplified and made faster the spread of biased and false news, and made targeting specific segments of the population possible [7] . For this reason the Vice-President of the European Commission with responsibility for policies on values and transparency, Vȇra Yourová, announced, beginning of June 2020, a European Democracy Action Plan, expected by the end of 2020, in which web platforms admins will be called for greater accountability and transparency, since 'everything cannot be allowed online' [16] . Manufacturers and spreaders of online disinformation have been particularly active also during the Covid-19 pandemic period (e.g., writing about Bill Gates role in the pandemics or about masks killing children [2, 3] ). This, alongside the real pandemics [17] , has led to the emergence of a new virtual disease: Covid-19 Infodemics. In this paper, we shall consider the situation in Italy, one of the most affected countries in Europe, where the virus struck in a devastating way between the end of February and the end of April [1] . In such a sad and uncertain time, propaganda [1] In Italy, since the beginning of the pandemics and at time of writing, almost 310k persons have contracted the Covid-19 virus: of these, more than 35k have died. Source: http://www.protezionecivile.gov.it/. Accessed September 28, 2020. has worked hard: One of the most followed fake news was published by Sputnik Italia receiving 112,800 likes, shares and comments on the most popular social media. 'The article falsely claimed that Poland had not allowed a Russian plane with humanitarian aid and a team of doctors headed to Italy to fly over its airspace', the EC Vice-President Yourová said. Actually, the studies regarding dis/mis/information diffusion on social media seldom analyse its effective impact. In the exchange of messages on online platforms, a great amount of interactions do not carry any relevant information for the understanding of the phenomenon: as an example, randomly retweeting viral posts does not contribute to insights on the sharing activity of the account. For determining dis/misinformation propagation two main weapons can be used, the analysis of the content (semantic approach) and the analysis of the communities sharing the same piece of information (topological approach). While the content of a message can be analysed on its own, the presence of some troublesome structure in the pattern of news producer and spreaders (i.e., in the topology of contacts) can be detected only trough dedicated instruments. Indeed, for real in-depth analyses, the properties of the real system should be compared with a proper null model. Recently, entropy-based null models have been successfully employed to filter out random noise from complex networks and focus the attention on non trivial contributions [10, 26] . Essentially, the method consists in defining a 'network benchmark' that has some of the (topological) properties of the real system, but is completely random for all the rest. Then, every observation that does not agree with the model, i.e., cannot be explained by the topological properties of the benchmark, carries non trivial information. Notably, being based on the Shannon entropy, the benchmark is unbiased by definition. In the present paper, using entropy-based null-models, we analyse a tweet corpus related to the Italian debate on Covid-19 during the two months of maximum crisis in Italy. After cleaning the system from the random noise, by using the entropy-based null-model as a filter, we have been able to highlight different communities. Interestingly enough, these groups, beside including several official accounts of ministries, health institutions, and -online and offline -newspapers and newscasts, encompass four main political groups. While at first sight this may sound surprising -the pandemic debate was more on a scientific than on a political ground, at least in the very first phase of its abrupt diffusion -, it might be due to pre-existing echo chambers [18] . The four political groups are found to perform completely different activities on the platform, to interact differently from each other, and to post and share reputable and non reputable sources of information with great differences in the number of their occurrences. In particular, the accounts from the right wing community interact, mainly in terms of retweets, with the same accounts who interact with the mainstream media. This is probably due to the strong visibility given by the mainstream media to the leaders of that community. Moreover, the right wing community is more numerous and more active, even relatively to the number of accounts involved, than the other communities. Interestingly enough, newly formed political parties, as the one of the former Italian prime Minister Matteo Renzi, quickly imposed their presence on Twitter and on the online political debate, with a strong activity. Furthermore, the different political parties use different sources for getting information on the spreading on the pandemics. To detect the impact of dis/misinformation in the debate, we consider the news sources shared among the accounts of the various groups. With a hybrid annotation approach, based on independent fact checking organisations and human annotation, we categorised such sources as reputable and non reputable (in terms of credibility of the published news and the transparency of the sources). Notably, we experienced that a group of accounts spread information from non reputable sources with a frequency almost 10 times higher than that of the other political groups. And we are afraid that, due to the extent of the online activity of the members of this community, the spreading of such a volume of non reputable news could deceit public opinion. We collected circa 4.5M tweets in Italian language, from February 21 st to April 20 th 2020 [2] . Details about the political situation in Italy during the period of data collection can be found in the Supplementary Material, Section 1.1: 'Evolution of the Covid-19 pandemics in Italy'. The data collection was keyword-based, with keywords related the Covid-19 pandemics. Twitter's streaming API returns any tweet containing the keyword(s) in the text of the tweet, as well as in its metadata. It is worth noting that it is not always necessary to have each permutation of a specific keyword in the tracking list. For example, the keyword 'Covid' will return tweets that contain both 'Covid19' and 'Covid-19'. Table 1 lists a subset of the considered keywords and hashtags. There are some hashtags that overlap due to the fact that an included keyword is a sub-string of another one, but we included both for completeness. The left panel of Fig. 1 shows the network obtained by following the projection procedure described in Section 5.1. The network resulting from the projection procedure will be called, in the rest of the paper, validated network. The term validated should not be confused with the term verified, which instead denotes a Twitter user who has passed the formal authentication procedure by the social platform. In order to get the community of verified Twitter users, we applied the Louvain algorithm [5] to the data in the validated network. Such an algorithm, despite being one of the most popular, is also known to be order dependent [19] . To get rid of this bias, we apply it iteratively N times (N being the number of the nodes) after reshuffling the order of the nodes. Finally, we select the partition with the highest modularity. The network presents a strong community structure, composed by four main subgraphs. When analysing the emerging 4 communities, we find that they correspond to 1 Right wing parties and media (in steel blue) 2 Center left wing (dark red) 3 5 Stars Movement (M5S ), in dark orange 4 Institutional accounts (in sky blue) Details about the political situation in Italy during the period of data collection can be found in the Supplementary Material, Section 1.2: 'Italian political situation during the Covid-19 pandemics'. This partition in four subgroups, once examined in more details, presents a richer substructure, described in the right panel of Fig. 1 . Starting from the center-left wing, we can find a darker red community, including various NGOs and various left oriented journalists, VIPs and pundits. A slightly lighter red sub-community turns out to be composed by the main politicians of the Italian Democratic Party (PD), as well as by representatives from the European Parliament (Italian and others) and some EU commissioners. The violet red group is mostly composed by the representatives of Italia Viva, a new party founded by the former Italian prime minister Matteo Renzi (December 2014 -February 2016). In golden red we can find the subcommunity of Catholic and Vatican groups. Finally the dark violet red and light tomato subcommunities consist mainly of journalists. In turn, also the orange (M5S) community shows a clear partition in substructures. In particular, the dark orange subcommunity contains the accounts of politicians, parliament representatives and ministers of the M5S and journalists. In aquamarine, we can find the official accounts of some private and public, national and international, health institutes. Finally, in the Light Slate Blue subcommunity we can find various Italian ministers as well as the Italian police and army forces. Similar considerations apply to the steel blue community. In steel blue, the subcommunity of center right and right wing parties (as Forza Italia, Lega and Fratelli d'Italia). In the following, this subcommunity is going to be called as FI-L-FdI, recalling the initials of the political parties contributing to this group. The sky blue subcommunity includes the national federations of various sports, the official accounts of athletes and sport players (mostly soccer) and their teams. The teal subcommunity contains the main Italian news agencies. In this subcommunity there are also the accounts of many universities. The firebrick subcommunity contains accounts related to the AS Roma football club; analogously in dark red official accounts of AC Milan and its players. The slate blue subcommunity is mainly composed by the official accounts of radio and TV programs of Mediaset, the main private Italian broadcasting company. Finally, the sky blue community is mainly composed by Italian embassies around the world. For the sake of completeness, a more detailed description of the composition of the subcommunities in the right panel of Figure 1 is reported in the Supplementary Material, Section 1.3: 'Composition of the subcommunities in the validated network of verified Twitter users'. Here, we report a series of analyses related to the domain names, hereafter simply called domains, that mostly appear in all the tweets of the validated network of verified users. The domains have been tagged according to their degree of credibility and transparency, as indicated by the independent software toolkit NewsGuard https://www.newsguardtech.com/. The details of this procedure are reported below. As a first step, we considered the network of verified accounts, whose communities and sub-communities are shown in Fig. 1 . On this topology, we labelled all domains that had been shared at least 20 times (between tweets and retweets). Table 2 shows the tags associated to the domains. In the rest of the paper, we shall be interested in quantifying reliability of news sources publishing during the period of interest. Thus, for our analysis, we will not consider those sources corresponding to social networks, marketplaces, search engines, institutional sites, etc. Tags R, ∼ R and NR in Table 2 are used only for news sites, be them newspapers, magazines, TV or radio social channels, and they stand for Reputable, Quasi Reputable, Not Reputable, respectively. Label UNC is assigned to those domains with less than 20 occurrences in ours tweets and rewteets dataset. In fact, the labeling procedure is a hybrid one. As mentioned above, we relied on NewsGuard, a plugin resulting from the joint effort of journalists and software Table 2 Tags used for labeling the domains developers aiming at evaluating news sites according to nine criteria concerning credibility and transparency. For evaluating the credibility level, the metrics consider whether the news source regularly publishes false news, does not distinguish between facts and opinions, does not correct a wrongly reported news. For transparency, instead, the tool takes into account whether owners, founders or authors of the news source are publicly known; and whether advertisements are easily recognizable [3] . After combining the individual scores obtained out of the nine criteria, the plugin associates to a news source a score from 1 to 100, where 60 is the minimum score for the source to be considered reliable. When reporting the results, the plugin provides details about the criteria which passed the test and those that did not. In order to have a sort of no-man's land and not to be too abrupt in the transition between reputability and non-reputability, when the score was between 55 and 65, we considered the source to be quasi reputable, ∼R. It is worth noting that not all the domains in the dataset under investigation were evaluated by NewsGuard at the time of our analysis. For those not evaluated automatically, the annotation was made by three tech-savvy researchers, who assessed the domains by using the same criteria as NewsGuard. Table 3 gives statistics about number and kind of tweets (tw = pure tweet; rt = retweet), the number of url and distinct url (dist url), the number of domains and users in the validated network of verified users. We clarify what we mean by these terms with an example: a domain for us corresponds to the so-called 'second-level domain' name [4] , i.e., the name directly to the left of .com, .net, and any other top-level domains. For instance, repubblica.it, corriere.it, nytimes.com are considered domains by us. Instead, the url maintains here its standard definition [5] and an example is http://www.example.com/index.html. Table 4 shows the outcome of the domains annotation, according to the scores of NewsGuard or to those assigned by the three annotators, when scores were no available from NewsGuard. At a first glance, the majority of the news domains belong to the Reputable category. The second highest percentage is the one of the untagged domains -UNC. In fact, in our dataset there are many domains that occur only few times once. For example, there are 300 domains that appear in the datasets only once. Fig. 2 shows the trend of the number of tweets and retweets, containing urls, posted by the verified users of the validated projection during the period of data [3] NewsGuard rating process: https://www.newsguardtech.com/ratings/rating-process-criteria/ [4] https://en.wikipedia.org/wiki/Domain_name [5] Table 4 Annotation results over all the domains in the whole dataset -validated network of verified users. in [9] . Going on with the analysis, Table 5 shows the percentage of the different types of domains for the 4 communities identified in the left plot of Fig. 1 . It is worth observing that the steel blue community (both politicians and media) is the most active one, even if it is not the most represented: the number of users is lower than the one of the center left community (the biggest one, in terms of numbers), but the number of their posts containing a valid url is almost the double of that of the second more active community. Interestingly, the activity of the verified users of the steel blue community is more focused on content production of (see the only tweets sub-table) than in sharing (see the only retweets sub-table). In fact, retweets represent almost 14.6% of all posts from the media and the right wing community, while in the case of the center-left community it is 34.5%. This effect is observable even in the average only tweets post per verified user: a right-wing user and a media user have an average of 88.75 original posts, against 34.27 for center-left-wing users. These numbers are probably due to the presence in the former community of the Italian most accessed media. They tend to spread their (original) pieces of news on the Twitter platform. Interestingly, the presence of urls from a non reputable source in the steel blue community is more than 10 times higher than the second score in the same field in the case of original tweets (only tweets). It is worth noting that, for the case of the dark orange and sky blue communities, which are smaller both in terms of users and number of posts, the presence of non classified sources is quite strong (it represents nearly 46% of retweeted posts for both the communities), as it is the frequency of posts linking to social network contents. Interestingly enough, the verified users of both groups seem to focus slightly more on the same domains: there are, on average, 1.59 and 1.80 posts for each url domain respectively for the dark orange and sky blue communities, and, on average, 1.26 and 1.34 posts for the steel blue and the dark red communities. The right plot in Fig. 1 report a fine grained division of communities: the four largest communities have been further divided into sub-communities, as mentioned in Subsection 3.1. Here, we focus on the URLs shared in the purely political sub-communities in Table 7 . Broadly speaking, we examine the contribution of the different political parties, as represented on Twitter, to the spread of mis/disinformation and propaganda. Table 7 clearly shows how the vast majority of the news coming from sources considered scarce or non reputable are tweeted and retweeted by the steel blue political sub-community (FI-L-FdI). Notably, the percentage of non reputable sources shared by the FI-L-FdI accounts is more than 4 times the percentage of their community (the steel blue one) and it is more than 30 times the second community in the NR ratio ranking. For all the political sub-communities the incidence of social network links is much higher than in their original communities. Looking at Table 8 , even if the number of users in each political sub-community is much smaller, some peculiar behaviours can be still be observed. Again, the center-right and right wing parties, while representing the least represented ones in terms of users, are much more active than the other groups: each (verified) user is responsible, on average of almost 81.14 messages, while the average is 23.96, 22.12 and 15.29 for M5S, IV and PD, respectively. It is worth noticing that Italia Viva, while being a recently founded party, is very active; moreover, for them the frequency of quasi reputable sources is quite high, especially in the case of only tweets posts. The impact of uncategorized sources is almost constant for all communities in the retweeting activity, while it is particularly strong for the M5S. Finally, the posts by the center left communities (i.e., Italia Viva and the Democratic Party) tend to have more than one url. Specifically, every post containing at least a url, has, on average, 2.05 and 2.73 urls respectively, against the 1.31 of Movimento 5 Stelle and 1.20 for the center-right and right wing parties. To conclude the analysis on the validated network of verified users, we report statistics about the most diffused hashtags in the 4 political sub-communities. Fig. 3 focuses on wordclouds, while Fig. 4 reports the data under an histograms form. Actually, from the various hashtags we can derive important information regarding the communications of the various political discursive communities and their position towards the management of the pandemics. First, it has to be noticed that the M5S is the greatest user of hashtags: their two most used hashtags have been used almost twice the most used hashtags used by the PD, for instance. This heavy usage is probably due to the presence in this community of journalists and of the official account of Il Fatto Quotidiano, a newspaper explicitly supporting the M5S: indeed, the first two hashtags are "#ilfattoquotidiano" and "#edicola" (kiosk, in Italian). It is interesting to see the relative importance of hashtags intended to encourage the population during the lockdown: it is the case of "#celafaremo" (we will make it), "#iorestoacasa" (I am staying home), "#fermiamoloinsieme" (Let's stop it together ): "#iorestoacasa" is present in every community, but it ranks 13th in the M5S verified user community, 29th in the FI-L-FdI community, 2nd in the Italia Viva community and 10th in the PD one. Remarkably, "#celafaremo" is present only in the M5S group, as "#fermiamoloinsieme" can be found in the top 30 hashtags only in the center-right and right wing cluster. The PD, being present in various European institutions, mentions more European related hashtags ("#europeicontrocovid19", Europeans against covid-19 ), in order to ask for a common reaction of the EU. The center-right and right wing community has other hashtags as "#forzalombardia" (Go, Lombardy! ), ranking the 2nd, and "#fermiamoloinsieme", ranking 10th. What is, nevertheless, astonishing, is the presence among the most used hashtags of all communities of the name of politicians from the same group ('interestingly '#salvini" is the first used hashtag in the center right and right wing community, even if he did not perform any duty in the government), TV programs ("#mattino5", "#lavitaindiretta", "#ctcf", "#dimartedì"), as if the main usage of hashtags is to promote the appearance of politicians in TV programs. Finally, the hashtags used by FI-L-FdI are mainly used to criticise the actions of the government, e.g., "#contedimettiti" (Conte, resign! ). Fig. 5 shows the structure of the directed validated projection of the retweet activity network, as outcome of the procedure recalled in Section 3 of the Supplementary Material. As mentioned in Section 4 of the Supplementary Material, the affiliation of unverified users has been determined using the tags obtained by the validated projected network of the verified users, as immutable label for the label propagation of [23] . After label propagation, the representation of the political communities in the validated retweet network changes dramatically with respect to the case of the network of verified users: the center-right and right wing community is the most represented community in the whole network, with 11063 users (representing 21.1% of all the users in the validated network), followed by Italia Viva users with 8035 accounts (15.4% of all the accounts in the validated network). The impact of M5S and PD is much more limited, with, respectively, 3286 and 564 accounts. It is worth noting that this result is unexpected, due to the recent formation of Italia Viva. As in our previous study targeting the online propaganda [8] , we observe that the most effective users in term of hub score [21] are almost exclusively from the center-right and right wing party: Considering the first 100 hubs, only 4 are not from this group. Interestingly, 3 out of these 4 are verified users: Roberto Burioni, one of the most famous Italian virologists, ranking 32nd, Agenzia Ansa, a popular Italian news agency, ranking 61st, and Tgcom24, the popular newscast of a private TV channel, ranking 73rd. The fourth account is an online news website, ranking 88th: this is a not verified account which belongs to a not political community. Remarkably, in the top 5 hubs we find 3 of the top 5 hubs already found when considered the online debate on migrations from northern Africa to Italy [8] : in particular, a journalist of a neo-fascist online newspaper (non verified user), an extreme right activist (non verified user) and the leader of Fratelli d'Italia Giorgia Meloni (verified user), who ranks 3rd in the hub score. Matteo Salvini (verified user), who was the first hub in [8] , ranks 9th, surpassed by his party partner Claudio Borghi, ranking 6th. The first hub in the present network is an extreme right activist, posting videos against African migrants to Italy and accusing them to be responsible of the contagion and of violating lockdown measures. Table 9 shows the annotation results of all the domains tweeted and retweeted by users in the directed validated network. The numbers are much higher than those shown in Table 2 , but the trend confirms the previous results. The majority of urls traceable to news sources are considered reputable. The number of unclassified domains is higher too. In fact, in this case, the annotation was made considering the domains occurring at least 100 times. Table 9 Annotation results over all the domains -directed validated network Table 10 reports statistics about posts, urls, distinct urls, users and verified users in the directed validated network. Noticeably, by comparing these numbers with those of Table 3 , reporting statistics about the validated network of verified users, we can see that here the number of retweets is much more higher, and the trend is the opposite: verified users tend to tweet more than retweet (46277 vs 17190), while users in the directed validated network, which comprehends also non verified users, have a number of retweets 3.5 times higher than the number of their tweets. Fig. 6 shows the trend of the number of tweets containing urls over the period of data collection. Since we are analysing a bigger network than the one considered in Section 3.2, we have numbers that are one order of magnitude greater than those shown in Fig. 2 ; the highest peak, after the discovery of the first cases in Lombardy, corresponds to more than 68,000 posts containing urls, whereas the analogous peak in Fig. 2 corresponds to 2,500 posts. Apart from the order of magnitudes, the two plots feature similar trends: higher traffic before the beginning of the Italian lockdown, and a settling down as the quarantine went on [6] . Table 11 shows the core of our analysis, that is, the distribution of reputable and non reputable news sources in the direct validated network, consisting of both verified and non-verified users. Again, we focus directly on the 4 political sub-communities identified in the previous subsection. Two of the sub-communities are part of the center-left wing community, one is associated to the 5 Stars Movement, the remaining one represents center-right and right wing communities. In line with previous results on the validated network of verified users, the table clearly shows how the vast majority of the news coming from sources considered scarce or non reputable are tweeted and retweeted by the center-right and right wing communities; 98% of the domains tagged as NR are shared by them. As shown in Table 12 , the activity of FI-L-FdI users is again extremely high: on average there are 89.3 retweets per account in this community, against the 66.4 of M5S, the 48.4 of IV and the 21.8 of PD. The right wing contribution to the debate is extremely high, even in absolute numbers, due to the the large number of users in this community. It is worth mentioning that the frequency of non reputable sources in this community is really high (at about 30% of the urls in the only tweets) and comparable with that of the reputable ones (see Table 11 , only [6] The low peaks for February 27 and March 10 are due to an interruption in the data collection, caused by a connection breakdown. Table 11 Domains annotation per political sub-communities -directed validated network tweets). In the other sub-communities, PD users are more focused on un-categorised sources, while users from both Italia Viva and Movimento 5 Stelle are mostly tweeting and retweeting reputable news sources. and users, but also in absolute numbers: out of the over 1M tweets, more than 320k tweets refer to a NR url. Actually, the political competition still shines through the hashtag usage even for the other communities: it is the case, for instance, of Italia Viva. In the top 30 hashtags we can find '#salvini', '#lega', but also '#papeete' [7] , '#salvinisciacallo' (Salvini jackal ) and '#salvinimmmerda' (Salvini asshole). On the other hand, in Italia Viva hashtags supporting the population during the lockdown are used: '#iorestoacasa', '#restoacasa' (I am staying home), '#restiamoacasa' (let's stay home). Criticisms towards the management of Lombardy health system during the pandemics can be deduced from the hashtag '#commissariamtelalombardia' (put Lombardy under receivership) and '#fontana' (the Lega administrator of the Lombardy region). Movimento 5 Stelle has the name of the main leader of the opposition '#salvini', as first hashtag and supports criticisms to the Lombardy Administration with the hashtags '#fontanadimettiti' (Fontana, resign! ) and '#gallera', the Health and Welfare Minister of the Lombardy Region, considered the main responsible for the bad management of the pandemics. Nevertheless, it is possible to highlight even some hashtags encouraging the population during the lock down, as the above mentioned '#iorestoacasa', '#restoacasa' and '#restiamoacasa'. It is worth mentioning that the government measures, and the corresponding M5S campaigns, are accompanied specific hashtags: '#curaitalia' is the name of one of the decree of the prime minister to inject liquidity in the Italian economy, '#acquistaitaliano' (buy Italian products! ), instead, advertise Italian products to support the national economy. As a final task, over the whole set of tweets produced or shared by the users in the directed validated network, we counted the number of times a message containing a url was shared by users belonging to different political communities, although without considering the semantics of the tweets. Namely, we ignored whether the urls were shared to support or to oppose the presented arguments. Table 14 shows the most tweeted (and retweeted) NR domains shared by the political communities presented in Table 7 , the number of occurrences is reported next to each domain. The first NR domains for FI-L-FdI in Table 14 are related to the right, extreme right and neo-fascist propaganda, as it is the case of imolaoggi.it, ilprimatonazionale.it and voxnews.info, recognised as disinformation websites by NewsGuard and by the two main Italian debunker websites, bufale.net and BUTAC.it. As shown in the table, some domains, although in different number of occurrences, are present under more than one column, thus shared by users close to different political communities. This could mean, for some subgroups of the community, a retweet with the aim of supporting the opinions expressed in the original tweets. However, since the semantics of the posts in which these domains are present were not investigated, the retweets of the links by more than one political community could be due to contrast, and not to support, the opinions present in the original posts. Despite the fact that the results were achieved for a specific country, we believe that the applied methodology is of general interest, being able to show trends and peculiarities whenever information is exchanged on social networks. In particular, when analysing the outcome of our investigation, some features attracted our attention: 1 Persistence of clusters wrt different discussion topics: In Caldarelli et al. [8] , we focused on tweets concerned with immigration, an issue that has been central in the Italian political debate for years. Here, we discovered that the clusters and the echo chambers that have been detected when analysing tweets about immigration are almost the same as those singled out when considering discussions concerned with Covid-19. This may seem surprising, because a discussion about Covid-19 may not be exclusively political, but also medical, social, economic, etc.. From this we can argue that the clusters are political in nature and, even when the topic of discussion changes, users remain in their cluster on Twitter. (Indeed, journalists and politicians use Twitter for information and political propaganda, respectively). The reasons political polarisation and political vision of the world affect so strongly also the analysis of what should be an objective phenomenon is still an intriguing question. 2 Persistence of online behavioral characteristics of clusters: We found that the most active, lively and penetrating online communities in the online debate on Covid-19 are the same found in [8] , formed in a almost purely political debate such as the one represented by the right of migrants to land on the Italian territory. 3 (Dis)Similarities amongst offline and online behaviours of members and voters of parties: Maybe less surprisingly, the political habits is also reflected in the degree of participation to the online discussions. In particular, among the parties in the centre-left-wing side, a small party (Italia Viva) shows a much more effective social presence than the larger party of the Italian centre-left-wing (Partito Democratico), which has many more active members and more parliamentary representation. More generally, there is a significant difference in social presence among the different political parties, and the amount of activity is not at all proportional to the size of the parties in terms of members and voters. 4 Spread of non reputable news sources: In the online debate about Covid-19, many links to non reputable (defined such by NewsGuard, a toolkit ranking news website based on criteria of transparency and credibility, led by veteran journalists and news entrepreneurs) news sources are posted and shared. Kind and occurrences of the urls vary with respect to the corresponding political community. Furthermore, some of the communities are characterised by a small number of verified users that corresponds to a very large number of acolytes which are (on their turn) very active, three times as much as the acolytes of the opposite communities in the partition. In particular, when considering the amount of retweets from poorly reputable news sites, one of the communities is by far (one order of magnitude) much more active than the others. As noted already in our previous publication [8] , this extra activity could be explained by a more skilled use of the systems of propaganda -in that case a massive use of bot accounts and a targeted activity against migrants (as resulted from the analysis of the hub list). Our work could help in steering the online political discussion around Covid-19 towards an investigation on reputable information, while providing a clear indication of the political inclination of those participating in the debates. More generally, we hope that our work will contribute to finding appropriate strategies to fight online misinformation. While not completely unexpected, it is striking to see how political polarisation affects also the Covid-19 debate, giving rise to on-line communities of users that, for number and structure, almost closely correspond to their political affiliations. This section recaps the methodology through which we have obtained the communities of verified users (see Section 3.1). This methodology has been designed in Saracco et al. [25] and applied in the field of social networks for the first time in [4, 8] . For the sake of completeness, the Supplementary Material, Section 3, recaps the methodology through which we have obtained the validated retweet activity network shown in Section 3.3. In Section 4 of the Supplementary Material, the detection of the affiliation of unverified users is described. In the Supplementary Material, the interested reader will also find additional details about 1) the definition of the null models (Section 5); 2) a comparison among various label propagation for the political affiliation of unverified users (Section 6); and 3) a brief state of the art on fact checking organizations and literature on false news detection (Section 7). Many results in the analysis of online social networks (OSN) shows that users are highly clustered in group of opinions [1, 11-15, 22, 28, 29] ; indeed those groups have some peculiar behaviours, as the echo chamber effects [14, 15] . Following the example of references [4, 8] , we are making use of this users' clustering in order to detect discursive community, i.e. groups of users interacting among themselves by retweeting on the same (covid-related) subjects. Remarkably, our procedure does not follow the analysis of the text shared by the various users, but is simply related on the retweeting activity among users. In the present subsection we will examine how the discursive community of verified Twitter users can be extracted. On Twitter there are two distinct categories of accounts: verified and unverified users. Verified users have a thick close to the screen name: the platform itself, upon request from the user, has a procedure to check the authenticity of the account. Verified accounts are owned by politicians, journalists or VIPs in general, as well as the official accounts of ministers, newspapers, newscasts, companies and so on; for those kind of users, the verification procedure guarantees the identity of their account and reduce the risk of malicious accounts tweeting in their name. Non verified accounts are for standard users: in this second case, we cannot trust any information provided by the users. The information carried by verified users has been studied extensively in order to have a sort of anchor for the related discussion [4, 6, 8, 20, 27] To detect the political orientation we consider the bipartite network represented by verified (on one layer) and unverified (on the other layer) accounts: a link is connecting the verified user v with the unverified one u if at least one time v was retweeted by u, or viceversa. To extract the similarity of users, we compare the commonalities with a bipartite entropy-based null-model, the Bipartite Configuration Model (BiCM [24] ). The rationale is that two verified users that share many links to same unverified accounts probably have similar visions, as perceived by the audience of unverified accounts. We then apply the method of [25] , graphically depicted in Fig. 8 , in order to get a statistically validated projection of the bipartite network of verified and unverified users. In a nutshell, the idea is to compare the amount of common linkage measured on the real network with the expectations of an entropy-based null model fixing (on average) the degree sequence: if the associated p-value is so low that the overlaps cannot be explained by the model, i.e. such that it is not compatible with the degree sequence expectations, they carry non trivial information and we project the related information on the (monopartite) projection of verified users. The interested reader can find the technical details about this validated projection in [25] and in the Supplementary Information. The data that support the findings of this study are available from Twitter, but restrictions apply to the availability of these data, which were used under license 1 Italian socio-political situation during the period of data collection In the present subsection we present some crucial facts for the understanding of the social context in which our analysis is set. This subsection is divided into two parts: the contagion evolution and the political situation. These two aspects are closely related. A first Covid-19 outbreak was detected in Codogno, Lodi, Lombardy region, on February, 19th [1] . In the very next day, two cases were detected in Vò, Padua, Veneto region. On February, 22th, in order to contain the contagions, the national government decided to put in quarantine 11 municipalities, 10 in the area around Lodi and Vò, near Padua [2] . Nevertheless, the number of contagions raised to 79, hitting 5 different regions; one of the infected person in Vò died, representing the first registered Italian Covid-19 victim [3] . On February, 23th there were already 229 confirmed cases in Italy. The first lockdown should have lasted until the 6th of March, but due to the still increasing number of contagions in northern Italy, the Italian Prime Minister Giuseppe Conte intended to extend the quarantine zone to almost all the northern Italy on Sunday, March 8th [4] : travel to and from the quarantine zone were limited to case of extreme urgency. A draft of the decree announcing the expansion of the quarantine area appeared on the website of the Italian newspaper Corriere della Sera on the late evening of Saturday, 7th, causing some panic in the interested areas [5] : around 1000 people, living in Milan, but coming from southern regions, took trains and planes to reach their place of [1] Prima Lodi, ""Paziente 1", il merito della diagnosi va diviso... per due", 8th June 2020 [2] Italian Gazzetta Ufficiale, "DECRETO-LEGGE 23 Febbraio 2020, n. 6". The date is intended to be the very first day of validity of the decree. [3] Il Fatto Quotidiano, "Coronavirus,è morto il 78enne ricoverato nel Padovano. 15 contagiati in Lombardia, un altro in Veneto", 22nd February 2020. [4] BBC News, "Coronavirus: Northern Italy quarantines 16 million people", 8th March 2020" [5] The Guardian, "Leaked coronavirus plan to quarantine 16m sparks chaos in Italy", 8th March 2020 origins [6] [7] . In any case, the new quarantine zone covered the entire Lombardy and partially other 4 regions. Remarkably, close to Bergamo, Lombardy region, a new outbreak was discovered and the possibility of defining a new quarantine area on March 3th was considered: this opportunity was later abandoned, due to the new northern Italy quarantine zone of the following days. This delay seems to have caused a strong increase in the number of contagions, making the Bergamo area the most affected one, in percentage, of the entire country [8] ; at time of writing, there are investigations regarding the responsibility of this choice. On March, 9th, the lockdown was extended to the whole country, resulting in the first country in the world to decide for national quarantine [9] . Travels were restricted to emergency reason or to work; all business activities that were not considered as essentials, as pharmacies and supermarkets, had to be closed. Until the 21st of March lockdown measures became progressively stricter all over the country. Starting from the 14th of April, some retails activities as children clothing shops, reopened. A first fall in the number of deaths was observed on the 20th of April [10] . A limited reopening started with the so-called "Fase 2" (Phase 2 ) on the 4th of May [11] . From the very first days of March, the limited capacity of the intensive care departments to take care of covid-infected patients, took to the necessity of a re-organization of Italian hospitals, leading, e.g., to the opening of new intensive care departments [12] . Moreover, new communication forms with the relatives of the patients were proposed, new criteria for the intubating patients were developed, and, in the extreme crisis, in the most infected cases, the emergency management took to give priority to the hospitalisation to patients with a higher probability to recover [13] . Outbreaks were mainly present in hospitals [19] . Unfortunately, healthcare workers were contaminated by the Covid [14] . This contagion resulted in a relative high number of fatalities: by the 22nd of April, 145 Covid deaths were registered among doctors. Due to the pressure on the intensive care capacity, even the healthcare personnel was subject to extreme stress, especially in the most affected zones [15] . On August 8th, 2019, the leader of Lega, the main Italian right wing party, announced to negate the support to the government of Giuseppe Conte, which was formed after a post-election coalition between the Renzi formed a new center-left party, Italia Viva (Italy alive, IV), due to some discord with PD; despite the scission, Italia Viva continued to support the actual government, having some of its representatives among the ministers and undersecretaries, but often marking its distance respect to both Pd and M5S. Due to the great impact that Matteo Salvini and Giorgia Meloni -leader of Fratelli d'Italia, a right wing party-have on social media, they started a massive campaign against the government the day after its inauguration. The regions of Lombardy, Veneto, Piedmont and Emilia-Romagna experienced the highest number of contagions during the pandemics; among those, the former 3 are administrated by the right and center-right wing parties, the fourth one by the PD. The disagreement in the management of the pandemics between regions and the central government was the occasion to exacerbate the political debate (in Italy, regions have a quite wide autonomy for healthcare). The regions administrated by the right wing parties criticised the centrality of the decisions regarding the lock down, while the national government criticises the health management (in Lombardy the healthcare system has a peculiar organisation, in which the private sector is supported by public funding) and its non effective measure to reduce the number of contagions. The debate was ridden even at a national level: the opposition criticized the financial origin of the support to the various economic sectors. Moreover, the role of the European Union in providing funding to recover Italian economics after the pandemics was debated. Here, we detail the composition of the communities shown in Figure 1 of the main text. We remind the reader that, after applying the Leuven algorithm to the validated network of verified Twitter users, we could observe 4 main communities, that correspond to 1 Right wing parties and media (in steel blue) 2 Center left wing (dark red) 3 5 Stars Movement (M5S ), in dark orange 4 Institutional accounts (in sky blue) Starting from the center-left wing, we can find a darker red community, including various NGOs (the Italian chapters of UNICEF, Medecins Sans Frontieres, Action Aid, Emergency, Save the Children, etc.), various left oriented journalists, VIPs and pundits [16] . Finally, we can find in this group political movements ('6000sardine') and politicians on the left of PD (as Beppe Civati, Pietro Grasso, Ignazio Marino) or on the left current of the PD (Laura Boldrini, Michele Emiliano, Stefano Bonaccini). A slightly lighter red sub-community turns out to be composed by the main politicians of the Italian Democratic Party (PD), as well as by representatives from the European Parliament (Italian and others) and some EU commissioners. The violet red group is mostly composed by the representatives of the newly founded Italia Viva, by the former Italian prime minister Matteo Renzi (December 2014 -February 2016) and former secretary of PD. In golden red we can find the subcommunity of Catholic and Vatican groups. Finally the dark violet red and light tomato subcommunities are composed mainly by journalists. Interestingly enough, the dark violet red contains also accounts related to the city of Milan (the major, the municipality, the public services account) and to the spoke person of the Chinese Minister of Foreign Affair. In turn, also the orange (M5S) community shows a clear partition in substructures. In particular, the dark orange subcommunity contains the accounts of politicians, parliament representatives and ministers of the M5S and journalists and the official account of Il Fatto Quotidiano, a newspaper supporting the Movement 5 Stars. Interestingly, since one of the main leaders of the Movement, Luigi Di Maio, is also the Italian Minister of Foreign Affairs, we can find in this subcommunity also the accounts of several Italian embassies around the world, as well as the account of the Italian representatives at NATO, OCSE and OAS. In aquamarine, we can find the official accounts of some private and public, national and international, health institutes (as the Italian Istituto Superiore di Sanità, literally the Italian National Institute of Health, the World Health Organization, the Fondazione Veronesi) the Minister of Health Roberto Speranza, and some foreign embassies in Italy. Finally, in the Light Slate Blue subcommunity we can find various Italian ministers as well as the Italian police and army forces. Similar considerations apply to the steel blue community. In steel blue, the subcommunity of center right and right wing parties (as Forza Italia, Lega and Fratelli d'Italia). The presidents of the regions of Lombardy, Veneto and Liguria, administrated by center right and right wing parties, can be found here. (In the following this subcommunity is going to be called as FI-L-FdI, recalling the initials of the political parties contributing to this group.) The sky blue subcommunity includes the national federations of various sports, the official accounts of athletes and sport players (mostly soccer) and their teams, as well as sport journals, newscasts and journalists. The teal subcommunity contains the main Italian news agencies, some of the main national and local newspapers, [16] As the cartoonists Makkox and Vauro, the singers Marracash, FrankieHiNRG, Ligabue and emphil Volo vocal band, and journalists from Repubblica (Ezio Mauro, Carlo Verdelli, Massimo Giannini), from La7 TV channel (Ricardo Formigli, Diego Bianchi). newscasts and their journalists. In this subcommunity there are also the accounts of many universities; interestingly enough, it includes also the all the local public service local newscasts. The firebrick subcommunity contains accounts related to the AS Roma football club; analogously in dark red official accounts of AC Milan and its players. The slate blue subcommunity is mainly composed by the official accounts of radio and TV programs of Mediaset, the main private Italian broadcasting company, together with singers and musicians. Other smaller subcommunities includes other sport federations, and sports pundits. Finally, the sky blue community is mainly composed by Italian embassies around the world. The navy subpartition contains also the official accounts of the President of the Republic, the Italian Minister of Defense and the one of the Commissioner for Economy at EU and former prime minister, Paolo Gentiloni. In the study of every phenomenon, it is of utmost importance to distinguish the relevant information from the noise. Here, we remind a framework to obtain a validated monopartite retweet network of users: the validation accounts the information carried by not only the activity of the users, but also by the virality of their messages. We represented pictorially the method in Fig. 1 . We define a directed bipartite network in which one layer is composed by accounts and the other one by the tweets. An arrow connecting a user u to a tweet t represents the u writing the message t. The arrow in the opposite direction means that the user u is retweeting the message t. To filter out the random noise from this network, we make use of the directed version of the BiCM, i.e. the Bipartite Directed Configuration Model (BiDCM [15] ). The projection procedure is then, analogous to the one presented in the previous subsection: it is pictorially displayed in the Fig. 1 . Briefly, consider the couple of users u 0 and u 1 and consider the number of message written by u 0 and shared u 1 . Then, calculate which is the distribution of the same measure according with the BiDCM: if the related p-value is statistically significant, i.e. if the number of u 0 's tweets shared by u 1 is much more than expected by the BiDCM, we project a (directed) link from u 0 to u 1 . Summarising, the comparison of the observation on the real network with the BiDCM permits to uncover all contributions that cannot originate from the constraints of the null-model. Using the technique described in Subsection 5.1 of the main text, we are able to assign to almost all verified users a community, based on the perception of the unverified users. Due to the fact that the identity of verified users are checked by Twitter, we have the possibility of controlling our groups. Indeed, as we will show in the following, the network obtained via the bipartite projection provides a reliable description regarding the closeness of opinions and role in the social debate. How can we use this information in order to infer the orientation of non verified users? In the reference [6] we used the tags obtained for both verified and unverified users in the bipartite network described in Subsection 5.1 of the main Real Network c) e) Figure 1 Schematic representation of the projection procedure for bipartite directed network. a) an example of a real directed bipartite network. For the actual application, the two layers represent Twitter accounts (turquoise) and posts (gray). A link from a turquoise node to a gray one represents that the post has been written by the user; a link in the opposite direction represents a retweet by the considered account. b) the Bipartite Directed Configuration Model (BiDCM) ensemble is defined. The ensemble includes all the link realisations, once the number of nodes per layer has been fixed. c) we focus our attention on nodes i and j and count the number of directed common neighbours (in magenta both the nodes and the links to their common neighbours), i.e., the number of posts written by i and retweeted by j. Subsequently, d) we compare this measure on the real network with the one on the ensemble: if this overlap is statistically significant with respect to the BiDCM, e) we have a link from i to j in the projected network. text and propagated those labels accross the network. In a recent analysis, we observed that other approaches are more stable [16] : in the present manuscript we make use of the most stable algorithm. We use the label propagation as proposed in [22] on the directed validated network. Indeed, the validated directed network In the present appendix we remind the main steps for the definition of an entropy based null model; the interested reader can refer to the review [8] . We start by revising the Bipartite Configuration Model [23] , that has been used for detecting the network of similarities of verified users. We are then going to examine the extension of this model to bipartite directed networks [15] . Finally, we present the general methodology to project the information contained in a -directed or undirected-bipartite network, as developed in [24] . Let us consider a bipartite network G * Bi , in which the two layers are L and Γ. Define G Bi the ensemble of all possible graphs with the same number of nodes per layer as in G * Bi . It is possible to define the entropy related to the ensemble as [20] : where P (G Bi ) is the probability associated to the instance G Bi . Now we want to obtain the maximum entropy configuration, constraining some relevant topological information regarding the system. For the bipartite representation of verified and unverified user, a crucial ingredient is the degree sequence, since it is a proxy of the number of interactions (i.e. tweets and retweets) with the other class of accounts. Thus in the present manuscript we focus on the degree sequence. Let us then maximise the entropy (1), constraining the average over the ensemble of the degree sequence. It can be shown, [24] , that the probability distribution over the ensemble is where m iα represent the entries of the biadjacency matrix describing the bipartite network under consideration and p iα is the probability of observing a link between the nodes i ∈ L and α ∈ Γ. The probability p iα can be expressed in terms of the Lagrangian multipliers x and y for nodes on L and Γ layers, respectively, as In order to obtain the values of x and y that maximize the likelihood to observe the real network, we need to impose the following conditions [13, 26]        where the * indicates quantities measured on the real network. Actually, the real network is sparse: the bipartite network of verified and unverified users has a connectance ρ 3.58 × 10 −3 . In this case the formula (3) can be safely approximated with the Chung-Lu configuration model, i.e. where m is the total number of links in the bipartite network. In the present subsection we will consider the case of the extension of the BiCM to direct bipartite networks and highlight the peculiarities of the network under analysis in this representation. The adjancency matrix describing a direct bipartite network of layers L and Γ has a peculiar block structure, once nodes are order by layer membership (here the nodes on L layer first): where the O blocks represent null matrices (indeed they describe links connecting nodes inside the same layer: by construction they are exactly zero) and M and N are non zero blocks, describing links connecting nodes on layer L with those on layer Γ and viceversa. In general M = N, otherwise the network is not distinguishable from an undirected one. We can perform the same machinery of the section above, but for the extension of the degree sequence to a directed degree sequence, i.e. considering the in-and out-degrees for nodes on the layer L, (here m iα and n iα represent respectively the entry of matrices M and N) and for nodes on the layer Γ, The definition of the Bipartite Directed Configuration Model (BiDCM, [15] ), i.e. the extension of the BiCM above, follows closely the same steps described in the previous subsection. Interestingly enough, the probabilities relative to the presence of links from L to Γ are independent on the probabilities relative to the presence of links from Γ to L. If q iα is the probability of observing a link from node i to node α and q iα the probability of observing a link in the opposite direction, we have where x out i and x in i are the Lagrangian multipliers relative to the node i ∈ L, respectively for the out-and the in-degrees, and y out α and y in α are the analogous for α ∈ Γ. In the present application we have some simplifications: the bipartite directed network representation describes users (on one layer) writing and retweeting posts (on the other layer). If users are on the layer L and posts on the opposite layer and m iα represents the user i writing the post α, then k in α = 1 ∀α ∈ Γ, since each message cannot have more than an author. Notice that, since our constraints are conserved on average, we are considering, in the ensemble of all possible realisations even instances in which k in α > 1 or k in α = 0, or, otherwise stated, non physical; nevertheless the average is constrained to the right value, i.e. 1. The fact that k in α is the same for every α allows for a great simplification of the probability per link on M: where N Γ is the total number of nodes on the Γ layer. The simplification in (9) is extremely helpful in the projected validation of the bipartite directed network [2] . The information contained in a bipartite -directed or undirected-network, can be projected onto one of the two layers. The rationale is to obtain a monopartite network encoding the non trivial interactions among the two layers of the original bipartite network. The method is pretty general, once we have a null model in which probabilities per link are independent, as it is the case of both BiCM and BiDCM [24] . The first step is represented by the definition of a bipartite motif that may capture the non trivial similarity (in the case of an undirected bipartite network) or flux of information (in the case of a directed bipartite network). This quantity can be captured by the number of V −motifs between users i and j [11, 23] , or by its direct extension (note that V ij = V ji ). We compare the abundance of these motifs with the null models defined above: all motifs that cannot be explained by the null model, i.e. whose p-value are statistically significance, are validated into the projection on one of the layers [24] . In order to assess the statistically significance of the observed motifs, we calculate the distribution associated to the various motifs. For instance, the expected value for the number of V-motifs connecting i and j in an undirected bipartite network is where p iα s are the probability of the BiCM. Analogously, where in the last step we use the simplification of (9) [2] . In both the direct and the undirect case, the distribution of the V-motifs or of the directed extensions is Poisson Binomial one, i.e. a binomial distribution in which each event shows a different probability. In the present case, due to the sparsity of the analysed networks, we can safely approximate the Poisson-Binomial distribution with a Poisson one [14] . In order to state the statistical significance of the observed value, we calculate the related p-values according to the relative null-models. Once we have a p-value for every detected V-motif, the related statistical significance can be established through the False Discovery Rate (FDR) procedure [3] . Respect to other multiple test hypothesis, FDR controls the number of False Positives. In our case, all rejected hypotheses identify the amount of V-motifs that cannot be explained only by the ingredients of the null model and thus carry non trivial information regarding the systems. In this sense, the validated projected network includes a link for every rejected hypothesis, connecting the nodes involved in the related motifs. In the main text, we solved the problem of assigning the orientation to all relevant users in the validated retweet network via a label propagation. The approach is similar, but different to the one proposed in [6] , the differences being in the starting labels, in the label propagation algorithm and in the network used. In this section we will revise the method employed in the present article, as compared it to the one in [6] and evaluate the deviations from other approaches. First step of our methodology is to extract the polarisation of verified users from the bipartite network, as described in Section 5.1 of the main text, in order to use it as seed labels in the label propagation. In reference [6] , a measure of the "adherence" of the unverified users towards the various communities of verified users was used in order to infer their orientation, following the approach in [2] , in turn based on the polarisation index defined in [4] . This approach was extremely performing when practically all unverified users interact at least once with verified one, as in [2] . While still having good performances in a different dataset as the one studied in [6] , we observed isolated deviations: it was the case of users with frequent interactions with other unverified accounts of the same (political) orientation, randomly retweeting a different discursive community verified user. In this case, focusing just on the interaction with verified accounts, those nodes were assigned a wrong orientation. The labels for the polarisation of the unverified users defined [6] were subsequently used as seed labels in the label propagation. Due to the possibility described above of assigning wrongly labels to unverified accounts, in the present paper, we consider only the tags of verified users, since they pass a strict validation procedure and are more stable. In order to compare the results obtained with the various approaches, we calculated the Variation of Information (VI, [17] ). V I considers exactly the different in information contents captured by two different partition, as consider by the Shannon entropy. Results are reported in the matrix in Figure 2 for the 23th of February (results are similar for other days). Even when using the weighted retweet network as "exact" result, the partition found by the label propagation of our approach has a little loss of information, comparable with the one of using an unweighted approach. Indeed, the results found by the various community detection algorithms show little agreement with the label propagation ones. Nevertheless, we still prefer the label propagation procedure, since the validated projection on the layer of verified users is theoretically sound and has a non trivial interpretation. The main result of this work quantifies the level of diffusion on Twitter of news published by sources considered scarcely reputable. Academy, Governments, and News Agencies are working hard to classify information sources according to criteria of credibility and transparency of published news. This is the case, for example, of NewsGuard, which we used for the tagging of the most frequent domains in the direct validated network obtained according to the methodology presented in the previous sections. As introduced in Subsection 3.2 of the main text, the NewsGuard browser extension and mobile app [19] offers a reliability result for the most popular newspapers in the world, summarizing with a numerical score the level of credibility and journalistic transparency of the newspaper. With the same philosophy, but oriented towards US politics, the fact-checking site PolitiFact.com reports with a 'truth meter' the degree of truthfulness of original claims made by politicians, candidates, their staffs, and, more, in general, protagonists of US politics. One of the eldest fact-checking websites dates back to 1994: snopes.com, in addition to political figures, is a fact-checker for hoaxes and urban legends. Generally speaking, a fact-checking site has behind it a multitude of editors and journalists who, with a great deal of energy, manually check the reliability of a news, or of the publisher of that news, by evaluating criteria such as, e.g., the tendency to correct errors, the nature of the newspaper's finances, and if there is a clear differentiation between opinions and facts. Thus, it is worth noting that recent attempts tried to automatically find articles worthy of being fact-checked. For example, work in [1] uses a supervised classifier, based on an ensemble of neural networks and Support Vector Machines, to figure out which politicians' claims need to be debunked, and which have already been debunked. Despite the tremendous effort of stakeholders to keep the fact-checking sites up to date and functioning, disinformation resists debunking due to a combination of factors. There are psychological aspects, like the quest for belonging to a community and getting reassuring answers, the adherence to one's viewpoint, a native reluctance to change opinion [28, 29] , the formation of echo chambers [10] , where people polarize their opinions as they are insulated from contrary perspectives: these are key factors for people to contribute to the success of disinformation spreading [7, 9] . Moreover, researchers demonstrate how the spreading of false news is strategically supported by the massive and organized use of trolls and bots [25] . Despite the need to educate the user to a conscious fruition of online information through means also different from those represented by technological solutions, there are a series of promising works that exploit classifiers based on machine learning or on deep learning to tag a news as credible or not. One interesting approach is based on the analysis of spreading patterns on social platforms. Monti et al. recently provide a deep learning framework for detection of fake news cascades [18] . A ground truth is acquired by following the example by Vosoughi et al. [27] collecting Twitter cascades of verified false and true rumors. Employing a novel deep learning paradigm for graph-based structures, cascades [19] https://www.newsguardtech.com/ are classified based on user profile, user activity, network and spreading, and content. The main result of the work is that 'a few hours of propagation are sufficient to distinguish false news from true news with high accuracy'. This result has been confirmed by other studies too. Work in [30] , by Zhao et al. examine diffusion cascades on Weibo and Twitter: focusing on topological properties, such as the number of hops from the source and the heterogeneity of the network, the authors demonstrate that networks in which fake news are diffused feature characteristics really different from those diffusing genuine information. Diffusion networks investigation appear to be a definitive path to follow for fake news detection. This is also confirmed by Pierri et al. [21] : also here, the goal is to classifying news articles pertaining to bad and genuine information' by solely inspecting their diffusion mechanisms on Twitter'. Even in this case, results are impressive: a simple Logistic Regression model is able to correctly classify news articles with a high accuracy (AUROC up to 94%). The political blogosphere and the 2004 U.S. election: divided they blog 2020) Coronavirus: 'deadly masks' claims debunked Coronavirus: Bill Gates 'microchip' conspiracy theory and other vaccine claims fact-checked Extracting significant signal of news consumption from social networks: the case of Twitter in Italian political elections Fast unfolding of communities in large networks Influence of fake news in Twitter during the 2016 US presidential election How does junk news spread so quickly across social media? Algorithms, advertising and exposure in public life The role of bot squads in the political propaganda on Twitter Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set The Statistical Physics of Real-World Networks Political polarization on twitter Predicting the political alignment of twitter users Partisan asymmetries in online political activity Echo Chambers: Emotional Contagion and Group Polarization on Facebook Mapping social dynamics on Facebook: The Brexit debate 2020) Tackling COVID-19 disinformation -Getting the facts right ) Speech of vice president Věra Jourová on countering disinformation amid COVID-19 -from pandemic to infodemic Filter Bubbles, Echo Chambers, and Online News Consumption Community detection in graphs Finding users we trust: Scaling up verified Twitter users using their communication patterns Opinion dynamics on interacting networks: Media competition and social influence Near linear time algorithm to detect community structures in large-scale networks Randomizing bipartite networks: the case of the World Trade Web Inferring monopartite projections of bipartite networks: An entropy-based approach Maximum-entropy networks. Pattern detection, network reconstruction and graph combinatorics Journalists on Twitter: self-branding, audiences, and involvement of bots Emotional dynamics in the age of misinformation Debunking in a world of tribes Coronavirus, a Milano la fuga dalla "zona rossa": folla alla stazione di Porta Garibaldi Coronavirus, l'illusione della grande fuga da Milano. Ecco i veri numeri degli spostamenti verso sud Coronavirus: Italian army called in as crematorium struggles to cope with deaths Coronavirus: Italy extends emergency measures nationwide Italy sees first fall of active coronavirus cases: Live updates Coronavirus in Italia, verso primo ok spostamenti dal 4/5, non tra Regioni Italy's Health Care System Groans Under Coronavirus -a Warning to the World Negli ospedali siamo come in guerra. A tutti dico: state a casa Coronavirus: Ordini degli infermieri, 4 mila i contagiati Automatic fact-checking using context and discourse information Extracting significant signal of news consumption from social networks: the case of Twitter in Italian political elections Controlling the false discovery rate: a practical and powerful approach to multiple testing Users polarization on Facebook and Youtube Fast unfolding of communities in large networks The role of bot squads in the political propaganda on Twitter The psychology behind fake news The Statistical Physics of Real-World Networks Fake news: Incorrect, but hard to correct. the role of cognitive ability on the impact of false information on social impressions Echo Chambers: Emotional Contagion and Group Polarization on Facebook Graph Theory (Graduate Texts in Mathematics) Resolution limit in community detection Maximum likelihood: Extracting unbiased information from complex networks. Phys Rev E -Stat Nonlinear On computing the distribution function for the Poisson binomial distribution Reconstructing mesoscale network structures The contagion of ideas: inferring the political orientations of twitter accounts from their connections Comparing clusterings by the variation of information Fake news detection on social media using geometric deep learning At the Epicenter of the Covid-19 Pandemic and Humanitarian Crises in Italy: Changing Perspectives on Preparation and Mitigation. Catal non-issue content 20 Near linear time algorithm to detect community structures in large-scale networks Randomizing bipartite networks: the case of the World Trade Web Inferring monopartite projections of bipartite networks: An entropy-based approach The spread of low-credibility content by social bots Analytical maximum-likelihood method to detect patterns in real networks A question of belonging: Race, social fit, and achievement Cognitive and social consequences of the need for cognitive closure Fake news propagate differently from real news even at early stages of spreading Analysis of online misinformation during the peak of the COVID-19 pandemics in Italy Supplementary Material Guido Caldarelli 1,2,3* † , Rocco De Nicola 3 † , Marinella Petrocchi 4 † , Manuel Pratelli 3 † and Fabio Saracco 3 † There is another difference in the label propagation used here against the one in [6] : in the present paper we used the label propagation of [22] , while the one in [6] was quite home-made. As in reference [22] , the seed labels of [6] are fixed, i.e. are not allowed to change [17] . The main difference is that, in case of a draw, among the labels of the first neighbours, in [22] a tie is removed randomly, while in the algorithm of [6] the label is not assigned and goes into a new run, with the newly assigned labels. Moreover, the updated of labels in [22] is asynchronous, while it is synchronous in [6] . We opted for the one in [22] for being actually a standard in the label propagation algorithms, being stable, more studied, and faster [18] . Finally, differently from the procedure in [6] , we applied the label propagation not to the entire (undirected version of the) retweet network, but on the (undirected version of the) validated one. (The intent of choosing the undirected version is that in both case in which a generic account is significantly retweeting or being retweeted by another one, they do probably share some vision of the phenomena under analysis, thus we are not interested in the direction of the links, in this situation.) The rationale in using the validated network is to reduce the calculation time (due to the dimensions of the dataset), while obtaining an accurate result. While the previous differences from the procedure of [6] are dictated by conservativeness (the choice of the seed labels) or by the adherence to a standard (the choice of [22] ), this last one may be debatable: why choosing the validated network should return "better" results than the ones calculated on the entire retweet network? We consider the case of a single day (in order to reduce the calculation time) and studied 6 different approaches:1 a Louvain community detection [5] on the undirected version of the validated network of retweets; 2 a Louvain community detection on the undirected version of the unweighted retweet network; 3 a Louvain community detection on the undirected version of the weighted retweet network, in which the weights are the number of retweets from user to user; 4 a label propagation a la Raghavan et al. [22] on the directed validated network of retweets; 5 a label propagation a la Raghavan et al. on the (unweighted) retweet network; 6 a label propagation a la Raghavan et al. on the weighted retweet network, the weights being the number of retweets from user to user. Actually, due to the order dependence of Louvain [12] , we run several times the Louvain algorithm after reshuffling the order of the nodes, taking the partition in communities that maximise the modularity. Similarly, the label propagation of [22] has a certain level of randomness: we run it several times and choose the most frequent label assignment for every node.