Introduction↩︎

Digital platforms have become increasingly central in shaping political discourse [1]–[3]. Political campaigns are often tested online and, if successful, promoted through legacy mass media or covered as native digital events in their own right [4]. This growing societal tendency to communicate in online settings offers researchers unprecedented access to quantitative traces [5], through which social behavior can be studied on different scales [6]. Due to early data availability, their popularity and centrality in political discourse, much of the literature has focused on Facebook and X (formerly Twitter). Various studies have shown that political discussion on these platforms often produces tightly clustered user communities, called echo chambers, in which like-minded individuals reinforce their prior beliefs, often in opposition to external views [7]–[12]. However, whether such patterns generalize to other platforms remains an open question, especially since each social media features its own form of user interaction [13], [14]. For instance, studies on Reddit show a weaker presence or complete absence of echo chambers [11], [15], [16], possibly depending on the resolution scale of the analysis [17], [18]. This outcome is particularly surprising, as the structural design of Reddit is expected to facilitate, rather than hinder, the formation of echo chambers. Indeed, unlike timeline-based platforms like Facebook and X, Reddit is structured into thematic subreddits – user-defined forums devoted to specific interests or topics [19]. On the other hand, technically Reddit is not a true social network, as it does not allow users to select others as friends and receive information from them, but only to subscribe to subreddits of interest. This means that all users accessing a given subreddit simultaneously are exposed to the same content, unlike what happens on Facebook or Twitter (especially when echo chambers are present). These specific characteristics of Reddit may render existing analytical approaches inadequate for detecting polarization dynamics on the platform. Here we argue that statistical validation techniques are necessary to identify meaningful structures by filtering out the high level of underlying noise, caused by the multitude of user interactions that produce high-dimensional and densely connected systems [20].

In this work, we study how political discourse evolved on Reddit from 2013 to 2017, using the Reddit Politosphere dataset – which contains all posts and comments from around 500 political-themed subreddits. This comprehensive data allows us to analyze three key aspects of online political behavior of users: i) patterns of ideological alignment, ii) habits of information consumption, and iii) dynamics of political polarization and potential presence of echo chambers. We adopt a dual-layered network approach to uncover such features. First, we build a “user-interaction” network connecting users to the subreddits they contribute to, which allows us to track the time evolution of interactions within communities. Second, we build an “information-diet” network linking subreddits to external domains through news links shared within their posts, which allows us to reveal information dissemination patterns. To extract statistically reliable structures from such complex data, we employ a validation framework based on maximum-entropy null models [21]. This approach allows for the identification of genuine structural features arising from interactions, rather than artifacts of individual heterogeneity and random fluctuations, and has been successfully applied in the past across domains including economic, innovation, and financial systems [22]–[28], as well as online social platforms like Twitter [12], [20], [29]–[33]. We use these techniques in combination with subreddit-level tags assigned from metadata, which enables us to trace users’ alignment shifts over time. In particular, by analyzing both interaction-based validated communities and groups defined via shared thematic tags, we can track patterns of polarization by labeling users according to their participation within and across these groups.

Our analysis focuses on political discourse during Trump’s 2016 campaign and post-election period. We find that overall political polarization on Reddit declines between 2013 and 2017. However, certain communities – especially Democratic and Conservative groups (with the latter term referring to broader center-right movements beyond the Republican Party) – remain consistently polarized. Furthermore, users from banned communities, which initially aligned with far-right groups, shift toward conservative subreddits during the 2016 elections, continuing to influence conservative discourse even after the ban of their original communities. Linguistic analysis reveals that users adopt increasingly similar language patterns within ideological groups over time. Moreover, communities sharing similar information are also those with similar user base, reinforcing echo chamber formation where users interact within closed ideological environments while consuming identical information sources. Finally, despite increased cross-group interactions during major political events, declining comment scores between opposing groups suggest that greater exposure paradoxically coincides with increased antagonism rather than reduced polarization.

Figure 1: Methodological framework. This figure summarizes the main approaches used in this study, analyzing two dimensions–news domains and user interactions–to reveal the presence of echo chambers. a Structure of Reddit data: conversation trees with root posts and comments underneath, belonging to different politics-related subreddits. b Construction of the “user-interaction” network: from the bipartite network of users interacting on subreddits to the validated network of significant similarities among subreddits (in terms of common user base). c “Information-diet” approach: from the bipartite network of news domain shared by subreddits to the validated network of significant similarities among subreddits (in terms of commonly shared domains). d Matching between the communities of subreddits in the two validated networks reveals echo chambers: closed discussion forums characterized by overlapping user and news source patterns. — Figure 1: **Methodological framework**. This figure summarizes the main approaches used in this study, analyzing two dimensions–news domains and user interactions–to reveal the presence of echo chambers. a Structure of Reddit data: conversation trees with root posts and comments underneath, belonging to different politics-related subreddits. b Construction of the “user-interaction” network: from the bipartite network of users interacting on subreddits to the validated network of significant similarities among subreddits (in terms of common user base). c “Information-diet” approach: from the bipartite network of news domain shared by subreddits to the validated network of significant similarities among subreddits (in terms of commonly shared domains). d Matching between the communities of subreddits in the two validated networks reveals echo chambers: closed discussion forums characterized by overlapping user and news source patterns.

Results↩︎

The structure of a conversation on Reddit consists of an initial post (often containing a question, opinion, news item, picture, video, etc.), followed by a tree-like series of users’ comments to the original post or to other comments in the thread (Figure 1a). Our dataset comprises 120.4M comments made by 1.9M Reddit users on a set of 498 politically oriented subreddits from January 2013 to December 2017, obtained from the Reddit Politosphere dataset [34], as well as the corresponding 16.8M original submissions (posts) obtained from the Pushshift Reddit Dataset [35], sharing in total a set of 208.3K news domains (e.g., cnn.com, foxnews.com, theguardian.com; see Supplementary Information, S1 for further details and general temporal trend of the data). We use this information to build two types of network, capturing the similarity among subreddits in terms of user interaction and news consumption patterns, respectively (see Methods). The first, the “user-interaction” projection, focuses on how users co-participate in communities, highlighting patterns of ideological alignment. The second, the “information-diet” projection, captures similarities in news consumption across communities.

In the “user-interaction” approach (see Figure 1b), we build a bipartite network of users and subreddits, with weighted links measuring how many times users contributed to each subreddit. We then project it on the users layer to obtain a monopartite network of subreddits, connected according to how many users participate in both their discussions. In the “information-diet” approach (Figure 1c), we build a bipartite network of news domains and subreddits, with weighted links measuring how many times domains are shared within the posts of the various subreddits. We then project it on the domains layer to obtain a monopartite network of subreddits, this time connected according to how many times they contain posts sharing the same domain. In both cases, raw co-occurrences are biased by user activity, subreddit and domain popularity. To correct for these effects, we apply rigorous statistical filtering based on maximum-entropy null models, which retain only the most significant, non-random connections (see Methods and Supplementary Information, S2). This step ensures that the resulting projections retain only statistically significant co-occurrence patterns, filtering out random noise.

After such statistical validation, we perform community detection using the Louvain algorithm [36]. Indeed, the validation process substantially sparsifies the networks, improving the signal-to-noise ratio and allowing for the identification of robust communities of similar subreddits. For instance in the user–subreddit projection, modularity increases and density decreases by roughly an order of magnitude with respect to the unfiltered network, while removing only 16% of nodes. Note that communities of the statistically validated networks are robust with respect to the choice of the community detection method [37], [38] (see Supplementary Information, S3), while without validation communities become unstable and highly sensitive to parameter choices (see Supplementary Information, S2). According to Ref. [11], echo chambers emerge when patterns of homophily, i.e. users holding similar opinions, align with patterns of polarization, measured through the similarity in the news domains linked in their shared content. Translating this operational definition to our context here we detect echo chambers as the overlaps between communities detected with the two approaches described above, i.e. the “user-interaction” and “information-diet” ones. In practice, in the present application, echo chambers are groups of subreddits populated by the same set of users who share contents from the same set of information sources (see Figure1d).

In Politosphere, subreddits are tagged as “Democratic” or “Conservative” according to their political leaning, while “Banned” subreddits receive a dedicated tag. We extend this scheme using additional tags that we manually assign to subreddits, based on their titles, public descriptions, and moderator notes (see Supplementary Materials S1). These tags further enable group-level analyses, as communities are assigned tags according to the tag frequencies among their constituent subreddits. In the following plots, each tag is assigned a base color; the color of a community is then computed as the average of the colors of its constituent tags (see Supplementary Materials S1).

User-interaction network: Communities of subreddits with similar user bases↩︎

Reddit’s political communities, based on user–interaction analysis, organize into ideologically and thematically aligned clusters, with persistent groups alongside others showing partial mixing or fragmentation over time (see Figure 2a). In 2013 we observe distinct clusters: a Far-Left group (light blue), a Conservative–Libertarian cluster (pink), News and Social Justice subreddits (green and orange), together with thematic communities focused on Geopolitics (khaki), Canada (yellow), and the UK/Europe (brown) (see panel 2b for the corresponding communities highlighted in the network). In 2014, coinciding with the U.S. midterm elections, a similar partition persists, with the first appearance of a joint Ban/Far-Right cluster (dark purple), a distinct Libertarian group (pink), and the consolidation of Social Justice and Political Talk communities. Democratic and Conservative subreddits belong with a mixed cluster. By 2015, the structure becomes richer: the Far-Left cluster remains stable, Geopolitics separates from News communities, and the Canada group persists. Democratic and Conservative subreddits still co-exist within the same cluster, though their separation becomes increasingly visible (see Figure 2 c).

The 2016 U.S. presidential election represents a turning point (see Figure 2 c). A Democratic cluster (blue) emerges more clearly, closely connected with News and Geopolitics communities. Conservative subreddits (red) form their own cluster, often in proximity to Libertarian, Gun Rights, and Economic groups (e.g., r/Elections, r/Conservative, r/KasichForPresident, r/progun, r/CatholicPolitics, r/climateskeptics). Hybrid communities also appear, merging pro-Trump Conservatives, Far-Right groups, banned subreddits (e.g., r/WhiteRights, r/fascism, r/The_Donald), and even some Social Justice spaces – reflecting the alignment of distinct ideological currents within shared spaces. In 2017, the division in Democratic and Conservative clusters further consolidate. Democrats form a stable community, while Conservatives, already split in 2016, remain divided across clusters: some align with more moderate or mixed political subreddits, while others persist within Trump-centric and hybrid communities that had emerged the year before. Ban and Far-Right groups continue to coalesce, overlapping with both Social Justice and Conservative subreddits. Meanwhile, smaller but persistent clusters focused on Geopolitics, Far-Left politics, and non-U.S. topics (Canada, Australia, UK) maintain continuity throughout the period. These shifts are discussed in further detail in Supplementary Materials S4 and details on community topic labeling are provided in S1.

Figure 2: Temporal evolution of subreddits’ communities with similar user base. a) Sankey diagram where the width of the flows is proportional to the number of subreddits moving among communities. Community colors reflect the composite hues of their underlying subreddit tags, shown in the legend. Label “Not Validated” refers to subreddits that are not connected to others by statistically validated links. b) Evolution of the main communities of the validated subreddits, with several communities exhibiting strong tag homogeneity. c) Focus on Democratic, Conservative, and Banned subreddits. As elections approach, both Democratic and Conservative subreddits increase in number, while Banned subreddits exhibit more frequent co-occurrences with Conservative ones. — Figure 2: **Temporal evolution of subreddits’ communities with similar user base**. a) Sankey diagram where the width of the flows is proportional to the number of subreddits moving among communities. Community colors reflect the composite hues of their underlying subreddit tags, shown in the legend. Label “Not Validated” refers to subreddits that are not connected to others by statistically validated links. b) Evolution of the main communities of the validated subreddits, with several communities exhibiting strong tag homogeneity. c) Focus on Democratic, Conservative, and Banned subreddits. As elections approach, both Democratic and Conservative subreddits increase in number, while Banned subreddits exhibit more frequent co-occurrences with Conservative ones.

Polarization, banned users and cross interactions↩︎

Having established the validated community structure of political interactions, we analyze polarization patterns and the evolution of community composition over time. We compute polarization from two complementary perspectives: (i) tag-based communities, i.e., groups of subreddits sharing the same thematic tag; and (ii) network-detected communities, obtained from validated projections of the user–subreddit interaction network. To study the level of polarization of the political debate, we label users with the index of the community where they are relatively more active (see Methods for further details).

Figure 3a shows donut charts for selected tag-based groups (Far-Left, Democratic, Conservative, and Far-Right) in 2014 and 2016, mapping the full composition of user labels within each group. Figure 3b displays the corresponding distributions for a set of detected network communities (composition examined in terms of user tags) that most closely align with the same categories. Additional years and full details on the construction of these charts are reported in Supplementary Information, S4. Both perspectives reveal that most communities are dominated by one or a few user labels, indicating that the same groups of users tend to concentrate activity within specific forums. The comparison between 2014 and 2016 shows that although overall polarization tends to decline over time, certain groups remain persistently polarized – particularly Democratic-aligned communities and banned subreddits, including conservative factions associated with Trump. These patterns point to sustained divisive dynamics among conservative-aligned groups, with banned users playing a disproportionately influential role in shaping their evolution. We also observe a temporal realignment involving Far-Right and Banned users: initially, Far-Right groups overlap mainly with Banned tags, but over time they become increasingly embedded within mixed Conservative–Banned–Social Justice communities (see Supplementary Information, S4 for the full set of donut charts, including Banned-tag distributions).

Additional evidence of polarization is provided in Figure 3c, in which we analyzed selected communities in tag-based and interaction-based approaches. We consider the polarization index \(\rho \in [0,1]\) which measures the extent to which a community’s user base is self-focused, with higher values indicating stronger polarization (see Methods). Polarization emerges from highly mixed communities combining Far-Right, Banned, Conservative, and Social Justice subreddits, with these heterogeneous clusters remaining strongly polarized despite their diverse composition. Banned communities display particularly sharp increases in polarization over time–rising nearly sixfold–while Conservative communities maintain high polarization levels regardless of explicit labeling.

Figure 3d reports average comment scores normalized by subreddit size for 2014 and 2016, computed separately for tag-based communities (top) and interaction-based communities (bottom). Each comment’s score is first normalized by the number of active users in the corresponding subreddit to correct for visibility bias, and then aggregated within each community as the total normalized score divided by the total number of comments As elections approach, Democratic, Conservative, and Banned communities exhibit declining scores even as cross-community interactions intensify. Groups such as Social Justice and Far-Right show higher scores when clearly distinct but experience sharp drops as ideological boundaries blur, whereas Far-Left communities remain comparatively stable, reflecting a more consistent ideological stance. Overall, these trends suggest that increasing cross-group participation coincides with reduced internal cohesion and diminished group distinctiveness. Figure 3e shows linguistic similarity patterns based on cosine similarity of text embeddings. Subreddits sharing the same tags exhibit high internal similarity, particularly during election years, while in 2016 a convergence in language use between Democratic, Conservative, and Banned subreddits indicates increasingly blurred ideological boundaries despite persistent within-group cohesion. A more rigorous analysis, reported in Supplementary Section S6, confirms these results and reveals a statistically significant rise in linguistic similarity over time, especially among Conservative, Democratic, and Far-Left subreddits.

Figure 3: Polarization, engagement and linguistic similarity across subreddit communities. a Donuts charts showing the distribution of user labels within the communities of the validated “user-interaction” network projections, for 2014 and 2016. b Donuts charts for tag-based communities, for 2014 and 2016. In both cases, polarization decreases in time, with some exceptions (e.g., Democratics and Banned). c Bar plots showing annual polarization levels across a selection of communities, with solid bars computed for all users and dashed bars computed excluding "Banned" users. This distinction allows to capture the strong polarizing role of banned subreddits, especially within conservative clusters. Early polarization is most pronounced among Far-Right and Democratic groups, with major shifts observed in the Conservative category. Over time, overall polarization decreases, except for a notable rise among banned communities. d Average scores of comments for 2014 and 2016, revealing a general decline in user engagement, particularly in Democratic, Conservative, and Banned subreddits. e Cosine similarity between posts within topic-based subreddit groups, indicating strong intra-group similarity and increasing inter-group similarity over time, especially between Democratic and Conservative communities. — Figure 3: **Polarization, engagement and linguistic similarity across subreddit communities**. a Donuts charts showing the distribution of user labels within the communities of the validated “user-interaction” network projections, for 2014 and 2016. b Donuts charts for tag-based communities, for 2014 and 2016. In both cases, polarization decreases in time, with some exceptions (e.g., Democratics and Banned). c Bar plots showing annual polarization levels across a selection of communities, with solid bars computed for all users and dashed bars computed excluding "Banned" users. This distinction allows to capture the strong polarizing role of banned subreddits, especially within conservative clusters. Early polarization is most pronounced among Far-Right and Democratic groups, with major shifts observed in the Conservative category. Over time, overall polarization decreases, except for a notable rise among banned communities. d Average scores of comments for 2014 and 2016, revealing a general decline in user engagement, particularly in Democratic, Conservative, and Banned subreddits. e Cosine similarity between posts within topic-based subreddit groups, indicating strong intra-group similarity and increasing inter-group similarity over time, especially between Democratic and Conservative communities.

Information Ecosystem and Echo Chambers↩︎

We now take the “information-diet” approach to understand how Reddit communities consume and share news. Similarly to the “user-interaction” network, the validated projection has a marked community structure (see Figure4a), witnessing the existence of groups of subreddits that tend to rely on similar sets of news sources (see Supplementary Information, S7 for network statistics). The flow diagram of Figure 4a shows how domain-sharing communities evolve year by year. For instance, Democratic- and Conservative-aligned communities emerge clearly, particularly after the 2016 election, when they begin to rely on increasingly distinct domains. Communities also form around geographic interests, including Canada, Australia, and UK. A group of “Banned” subreddits, initially close to far-right spaces, moves closer to Conservative communities in the years following the election. On the left side, different subgroups emerge with specific focuses: one on social justice issues, another on Marxism and Anarchism, and a third centered on meme-sharing. Additional analyses on domain-level polarization and label propagation across domain categories are provided in Supplementary Section S7, with full details on the networks and community structures for both approaches in S8. Notably, several of these groups match those observed in the “user-interaction” networks (see Figure 4b).

Figure 4: Patterns of News Consumption and Echo Chambers. a: Sankey diagram where the width of the flows is proportional to the number of subreddits moving among communities, defined according to similarity of shared news sources among subreddits. Some communities remain highly homogeneous in terms of tags, while a merging of themes occurs in certain years. For example, Democrats and Conservatives initially coexist in the same community, but beginning with the election year, they gradually separate into distinct groups, eventually forming separate Democrats and Conservatives/Banned communities by 2017. b: Chord diagrams depicting the overlap (in terms of common subreddits) between communities identified through the “user-interaction” and “information-diet” analyses, for 2014 and 2016 respectively. The color of the flows represents the average of the colors of the source and destination communities. The coherent structure of the communities defined by the two approaches is indicative of the presence of echo chambers in political debate. Indeed the strongest overlaps tend to occur between communities with similar political or thematic orientations. Additionally, the figure shows the validated networks derived from the chord diagrams. — Figure 4: **Patterns of News Consumption and Echo Chambers**. a: Sankey diagram where the width of the flows is proportional to the number of subreddits moving among communities, defined according to similarity of shared news sources among subreddits. Some communities remain highly homogeneous in terms of tags, while a merging of themes occurs in certain years. For example, Democrats and Conservatives initially coexist in the same community, but beginning with the election year, they gradually separate into distinct groups, eventually forming separate Democrats and Conservatives/Banned communities by 2017. b: Chord diagrams depicting the overlap (in terms of common subreddits) between communities identified through the “user-interaction” and “information-diet” analyses, for 2014 and 2016 respectively. The color of the flows represents the average of the colors of the source and destination communities. The coherent structure of the communities defined by the two approaches is indicative of the presence of echo chambers in political debate. Indeed the strongest overlaps tend to occur between communities with similar political or thematic orientations. Additionally, the figure shows the validated networks derived from the chord diagrams.

To further explore the alignment between “user-interaction” and “information-diet” communities at the user level, we construct a bipartite network where each node represents a community from one of the two approaches, while links are weighted by the number of active users (see Methods). We can then assess statistically significant overlaps using the BiWCM null model (see Methods). Although community matches emerge in correspondence with similar topics, even with very low \(p\)-values, few entries remain statistically significant once correcting for multiple hypothesis testing with the false discovery rate (see heatmaps of community matches in Supplementary Materials S9).

A more fine-grained analysis on sub-communities – obtained by using the Louvain algorithm again on individual communities – reveals clearer patterns. Communities that focus on the same topics (see Supplementary Materials S1), such as gun rights or economic policy, are much more likely to share both users and information sources, as confirmed by the statistically significant overlaps (see Tab. [tab:edgelist] for \(p\)-values). These correspondences suggest that ideological coherence is particularly strong in more focused environments.

These patterns challenge the notion that echo chambers, on Reddit or elsewhere, represent only a marginal phenomenon involving a small fraction of users [12], [17], [18]. Here, we find many groups of subreddits that significantly share both user bases and news sources. Between 40% and 60% of users belong to such overlapping communities, while the proportion of users active within validated sub-community matches increases from about 20% in 2013–2014 to 36% in 2015, remaining stable in 2016 (28%) and 2017 (34%). This pattern highlights a marked consolidation of echo-chamber structures, particularly during politically salient years such as 2014 and 2016. Together, these findings highlight how user engagement and information sharing reinforce each other in shaping Reddit’s political landscape. Full details on community structure, overlap sub-communities patterns, and temporal dynamics are available in the Supplementary Information, S8-S9.

Focus on Democratic, Conservative, and Banned Groups↩︎

We finally perform a closer investigation of the political discourse between Democratic and Conservative communities, as these factions emerge distinctly across the network approaches we considered. Figure 5a shows yearly distances between the linguistic patterns of these groups, computed from subreddit-level text embeddings (see Methods). We observe a decrease in the distance (from \(0.8\) to \(0.7\)) between pairs of Democrats, Conservatives, and banned users during the election cycle. As benchmarks, we consider a random model (mean similarity of randomly assembled communities of the same size) and a heterogeneous model (mean similarity of all pairwise subreddits within a community), which confirm that the observed convergence exceeds what is expected from random mixing or compositional effects (see Methods). The convergence between Democrats and Conservatives is particularly evident in 2016, likely driven by the shared focus on election-related narratives (see Figure 5a).

We then consider the average distance between these communities (see Methods) computed on the statistically validated “user–interaction” network. Figure 5b shows a marked increase in the average network distance between Democratic and Conservative subreddits starting in 2015 (the year preceding Trump’s election) and peaking in 2016. This separation is particularly pronounced for Democratic subreddits, which grow increasingly distant from their Conservative counterparts. Statistical tests confirm the significance of this trend when comparing Democrat–Conservative pairs with previous years and with overall network distances. A similar pattern is reflected in the average distance computed on the “information-diet” validated networks, shown in Figure 5c: Democratic and Conservative communities initially share many domains, but their media ecosystems gradually diverge in the lead-up to the election. Conservatives remain consistently closer to Banned subreddits than Democrats throughout the period.

To better understand cross-group dynamics, we track user-level interactions by filtering the original dataset of comments, assigning labels to users through propagation from subreddit to user activity, and then retaining only comments authored by users tagged as Democratic, Conservative, or Banned, regardless of the subreddit. Figure 5d shows the proportion of comments exchanged within and between these groups (here reported for 2014 and 2016), measured as the normalized number of comments exchanged between groups relative to the total number of comments produced by users of each group. Over the period, the total number of comments by these groups grows substantially, reaching nearly \(10^6\) in 2016, about ten times more than in 2014–while cross-group commenting steadily increases but is accompanied by a marked decline in comment scores. Average scores drop by 93.4 % in Democratic subreddits and by 85.6 % in Conservative ones, with the decline most severe in candidate-centered communities such as r/SandersForPresident, r/HillaryClinton, and r/The_Donald, while remaining comparatively stable in less polarizing spaces (e.g., r/KasichForPresident). As shown in Figure 5e, cross-group interactions are evaluated even more negatively than within-group ones. Notably, the number of Democratic and Conservative users engaging in each other’s communities rises over time, independently of overall volume, suggesting growing exposure to opposing viewpoints even as their reception worsens (a trend consistent with prior literature [15], [17], [18]). Full results across all years, including this breakdown, and further analyses are provided in Supplementary Information S10.

Figure 5: Engagement and Temporal Dynamics among Democratic, Conservative, and Banned Subreddits. a: Yearly textual distances between Democratic, Conservative, and Banned subreddits. We report yearly text-embedding cosine distances between Democrats and Conservatives, which decrease in 2016, particularly among politically active users. Both groups also converge linguistically toward Banned subreddits over time, although Democrats remain comparatively more distant. As benchmarks, we include a random model (the mean similarity of randomly assembled communities of the same size) and a heterogeneous model (the mean of all pairwise similarities among a community’s constituent subreddits, thereby controlling for internal composition heterogeneity). b–c: Distances in user- and domain-based subreddit networks. We show average pairwise distances (normalized by yearly network averages) in statistically validated interaction networks, computed using the harmonic mean. A widening gap emerges between Democrats and Conservatives, with Banned subreddits consistently closer to Conservatives. Domain-sharing networks reveal a similar pattern: Democrats and Conservatives diverge sharply after 2016, while Banned subreddits remain aligned with Conservatives. d–e: Heatmaps of yearly proportions of comments and scores exchanged within and between user groups (shown for 2014 and 2016). They indicate increasing cross-faction interaction over time, but consistently lower comment scores for inter-group exchanges compared to intra-group ones. — Figure 5: **Engagement and Temporal Dynamics among Democratic, Conservative, and Banned Subreddits.**
a: Yearly textual distances between Democratic, Conservative, and Banned subreddits. We report yearly text-embedding cosine distances between Democrats and Conservatives, which decrease in 2016, particularly among politically active users. Both groups also converge linguistically toward Banned subreddits over time, although Democrats remain comparatively more distant. As benchmarks, we include a random model (the mean similarity of randomly assembled communities of the same size) and a heterogeneous model (the mean of all pairwise similarities among a community’s constituent subreddits, thereby controlling for internal composition heterogeneity). **b–c**: Distances in user- and domain-based subreddit networks. We show average pairwise distances (normalized by yearly network averages) in statistically validated interaction networks, computed using the harmonic mean. A widening gap emerges between Democrats and Conservatives, with Banned subreddits consistently closer to Conservatives. Domain-sharing networks reveal a similar pattern: Democrats and Conservatives diverge sharply after 2016, while Banned subreddits remain aligned with Conservatives. **d–e**: Heatmaps of yearly proportions of comments and scores exchanged within and between user groups (shown for 2014 and 2016). They indicate increasing cross-faction interaction over time, but consistently lower comment scores for inter-group exchanges compared to intra-group ones.

Discussions↩︎

Online social networks data often come with a high level of noise, due to the multifacet nature of social interactions, their large heterogeneity and the intrinsic randomness of human behavior [20]. To increase the signal-to-noise ratio, we applied a statistical filtering approach to isolate the core structure of political discourse, enabling reliable detection of persistent patterns of political polarization and echo chambers on Reddit [14], [39]. We focused on the US political debate from 2013 to 2017, uncovering a robust architecture of ideological communities that endures over time. Despite widespread interaction across groups, polarization remains deeply embedded – offering novel perspective into how echo chambers sustain themselves even amid moments of heightened engagement and debate.

Within the observed clusters, users are repeatedly exposed to a narrow set of information sources and engage with the same topics and individuals, often using highly similar language, suggesting that echo chambers are not limited to fringe groups and are present even during periods of apparent openness. Remarkably, partisan communities showed a decline in average comment scores, especially in candidate-focused forums during election periods, a trend also observed in r/The_Donald [40], suggesting increasing internal heterogeneity even within apparently cohesive groups. Finally, during election periods, linguistic convergence across Democrat, mainstream Conservative, and even Banned communities suggested intensified cross-camp debate without genuine integration, as increasingly similar language hinted at shared rhetorical frames across ideological boundaries [41].

Our findings are mirrored in real-world political tensions, as the clustering of communities and semantic distances across topic groups reveal systematic contrasts between Democratic and Conservative spheres. Remarkably, both factions remain equally detached from economics- and social-justice-related topics – echoing post-election critiques [42] and concerns over the marginalization of economically “left-behind” U.S. communities [43].

We tested our procedures on several dimensions to detect possible weaknesses. First, our approach to user classification, based primarily on participation metrics, rather than direct textual ideological content, may introduce some misalignments. Such discrepancies can arise when participation patterns evolve rapidly, thereby reducing their correspondence with users’ underlying political orientations. Tracking these users across subsequent years revealed frequent shifts toward Conservative or Banned communities, followed in some cases by partial returns to Democratic spaces – a pattern likely driven by adversarial interactions rather than genuine ideological shifts (see Supplementary Sections S2, S10). Second, differences in temporal resolution may influence the granularity of observed dynamics, since zooming into shorter periods can reveal transient variations in community composition and polarization intensity. We verified the robustness of our results using four-month aggregation windows instead of yearly ones, obtaining consistent community structures and polarization trends, with minor differences (see Supplementary Information S11). Third, differences in level of aggregation may also account for discrepancies across studies. For instance, while our analysis operates at the level of domains and subreddits – thus, focusing on aggregated community behavior – Ref. [12] examines news consumption and polarization directly at the level of individual users and URLs. This user-centric perspective could capture interaction patterns at a different scale, thereby also leading to variations in the measured strength of echo-chamber effects across systems.

Future developments of our research should enhance classification accuracy by integrating sentiment and textual analyses of user comments, moving beyond participation metrics alone — an increasingly relevant step given the growing presence and human-like behavior of automated and AI-driven accounts in online discussions [44]–[46]. Investigating the mechanisms behind linguistic similarity during election cycles – particularly their links with linguistic complexity and user involvement within echo chambers [47] – represents a promising direction for understanding the evolution of political discourse.

Furthermore, dedicated analyses of highly polarized communities, such as those explicitly linked to Trump, may uncover distinctive interaction patterns. Moreover, additional work is needed to clarify the influence of Banned users within Conservative-aligned clusters and to assess the effectiveness of moderation strategies – particularly subreddit bans – in mitigating long-term polarization [48], [49].

Understanding echo chambers as persistent, self-sustaining structures – rather than transient anomalies – shifts the focus from merely facilitating cross-group exposure to designing interactions that promote reflection over reaction. Whether platforms will embrace such responsibility remains an open question, but our results indicate that current dynamics are unlikely to change spontaneously.

Methods↩︎

Dataset Description↩︎

We used the Reddit Politosphere dataset [34], which collects all comments from a large set of politically oriented subreddits between 2013 and 2017, and complemented it with the corresponding submissions from the Pushshift Reddit dataset [35]. Together, these data comprise millions of comments and posts produced by hundreds of subreddits and domains over the five-year period (see Supplementary Information, Section S1, for details and visualizations).

Each record comes with rich metadata. For comments, available fields include user and submission identifiers, the subreddit in which the comment appears, parent comment ID, timestamps, scores, and the comment text. In addition, the dataset creators provide curated metadata, such as information about the subreddit (e.g., whether it was later banned, or associated with Democratic or Conservative communities), along with other annotations. For submissions, metadata include author and submission IDs, timestamps, scores, textual content, and, when applicable, the external URL and the domain of the linked content. Subreddit-level metadata about political alignment were further extended into a set of topic tags covering 17 categories, allowing for broader classification across all subreddits. The manual tagging procedure was validated using GPT-4. We also used the public description of each subreddit to assist in the manual labeling of subreddits into the corresponding topic tags (see Supplementary Information, S12).

Network validation↩︎

The validation framework starts from two weighted bipartite networks representing Reddit interactions: (i) users commenting on subreddits, and (ii) domains shared within subreddits. Here we discuss the case of the users-subreddits network, but the discussion applies to the domains-subreddit network as well.

We extract statistically meaningful projections onto the subreddit layer by identifying significant co-occurrence patterns [22], [23]. Prior to validation, we binarize networks using the Revealed Comparative Advantage (RCA) [50], which compares the observed weight of each link to the expected activity of the two nodes involved, thus correcting for heterogeneity in activity levels. Only links with \(RCA \ge 1\) are retained in the binarized network. After binarization, we project onto the subreddit layer by counting common users between subreddits. In particular, if \(b_{i\alpha}\) is the generic entry of the (binarized) biadjacency matrix between subreddit \(i\) and user \(\alpha\), the expected number of common users between two subreddits \(i\) and \(j\) is \(V_{ij} = \sum_{\alpha} b_{i\alpha} b_{j\alpha}\). However, this raw overlap is biased by subreddit popularity and user activity, requiring further statistical validation.

To correct for such biases, we apply statistical validation based on maximum entropy null models [21]. In particular, we use the Bipartite Configuration Model (BiCM) [51] based on constraining degrees of both classes of nodes (subreddits and users). According to BiCM, co-occurrences \(V_{ij}\) are modeled with a Poisson–Binomial distribution, i.e., the sum of independent Bernoulli trials with different success probabilities - which depend on the degress of the the two subreddits and of the individual users involved. The corresponding \(p\)-value is \[p[V_{ij}] = 1 - \sum_{x=0}^{V_{ij}-1} \pi(x \mid i,j),\] where \(\pi(x \mid i,j)\) denotes the probability of observing exactly \(x\) shared neighbors according to the BiCM null model.

We validate co-occurrences by applying the False Discovery Rate (FDR) correction [52] to the set of computed \(p\)-values. Let \(D\) be the number of hypotheses tested and \(\alpha\) the desired significance level. Ordering the \(p\)-values in increasing order, we select the largest one satisfying \(p\text{-value}_i \le i \alpha / D\), where \(i\) is the rank of the \(p\)-value. All links with \(p\)-values below this threshold are retained in the validated projection, filtering out spurious co-occurrences and isolating statistically robust similarities between subreddits [23].

Polarization index↩︎

For a given partition of subreddits into communities, we label users according to where they are relatively more active, correcting for heterogeneity in community size. Specifically, for each user \(i\) and community \(c\), let \(n_{ic}\) be the number of subreddits in \(c\) where \(i\) has commented and \(S_c\) the total number of subreddits in \(c\). The user label is \(u_i = \arg\max_{c}\!\left(n_{ic}/S_c\right)\) [29].

Let \(U_c\) be the set of users labeled as \(c\), and \(W_c\) the set of users who commented at least once on subreddits in community \(c\). The polarization index of community \(c\) is \(\rho_c = |W_c \cap U_c|/|W_c| \in [0,1]\), i.e., the fraction of active users in community \(c\) whose main activity is concentrated in \(c\). Higher values indicate stronger polarization.

In addition to this index, we also analyze the full composition of communities, i.e., how users with different labels populate each community. This information is visualized in the donut charts of the main text and corresponds to the full polarization matrices described in Supplementary Information S4.

Text Preprocessing and Similarity Analysis↩︎

To compute textual similarity, the texts of Reddit posts are first preprocessed by converting all content to lowercase, removing stopwords, and applying lemmatization. From this cleaned corpus, we constructed a vocabulary of unigrams, bigrams, and trigrams. To ensure comparability across subreddits, we retained only those n-grams that appeared more than once in the overall corpus. To represent textual content numerically, we used the standard Term Frequency–Inverse Document Frequency (TF-IDF) representation [53]. This approach weighs terms by how often they appear within a subreddit while reducing the weight of terms that are frequent across many subreddits, thereby highlighting words that are more distinctive and informative for each community. Following vectorization, textual similarity between documents was computed using cosine similarity: \[\text{CosineSim}(A, B) = \frac{A \cdot B}{|A| \cdot |B|}\] where \(A\) and \(B\) are the TF-IDF vectors of two documents. This metric, commonly used in information retrieval [53], captures how aligned two documents are in the vector space, independently of length. Note that in Fig. 5 we report cosine distances, defined as \(1 - \text{CosineSim}(A,B)\). This choice was made to obtain a true distance metric, allowing for direct comparison with network-based distances (e.g., path lengths), thus making the cross-modal analysis more consistent.

Analysis and validation of Echo Chambers↩︎

To analyze echo chambers, we compare subreddit communities derived from “user-interaction” networks and those obtained from “information-diet” patterns. Specifically, we construct overlap matrices where rows correspond to interaction-based communities and columns to domain-based communities. Each matrix entry reflects the number of subreddits shared between the corresponding pair of communities. To assess the statistical significance of these overlaps, we constructed a bipartite network in which nodes on each layer represent the communities from the two classifications. A link connects two nodes if the corresponding matrix entry is nonzero, and its weight is given by the number of users active in the set of shared subreddits, capturing user engagement in the shared subreddits, a suitable parameter for studying overlaps between groups in the context of echo chambers.

This bipartite structure was validated using the Bipartite Weighted Configuration Model (BiWCM) [54], a maximum entropy null model for bipartite weighted networks that constrain values of node strengths. To control for multiple hypothesis testing, we applied the False Discovery Rate (FDR) correction [52] introduced above to the p-values obtained from the BiWCM validation. To increase the resolution of the analysis and address the limitations imposed by the relatively small number of original communities, we repeated the validation procedure using subcommunities. Specifically, for each community in the validated subreddit networks, we extract the corresponding subgraph and identified finer-grained substructures using again the Louvain algorithm. These subcommunities were then used to construct new bipartite overlap matrices, subsequently validated using the BiWCM model.

The entries found to be statistically significant in these higher-resolution matrices are further tested using the two-sided Mann–Whitney U test, comparing the distribution of values in validated entries against all other matrix entries: the test confirms that validated overlaps follow a significantly different distribution, supporting the robustness of the results.

Network distance between communities↩︎

To define a distance between two communities in the validated subreddit networks, we compute the average pairwise distance between all subreddit pairs belonging to the two communities and present in the statistically validated projection. Given the possibility that some pairs of subreddits may belong to disconnected components of the validated network, we define the distance between two nodes (subreddits) \(i\) and \(j\) as the reciprocal of the shortest path length \(\text{sp}(i,j)\) between them: \(d_{ij} = 1/\text{sp}(i,j)\) if a path exists between them, and \(d_{ij} = 0\) otherwise. The average distance between two communities \(c\) and \(c'\), defined by sets of subreddits \(S_c\) and \(S_{c'}\), is then computed as the harmonic mean of all pairwise distances: \[D_{C C'} = \frac{|S_{C}| \cdot |S_{C'}|}{\textstyle \sum_{i \in S_C} \sum_{j \in S_{C'}} d^{-1}_{ij} }\]

To account for variations in network structure over years, each distance \(D_{C C'}\) is normalized by the average \(\langle D \rangle\) of all \(d_{ij}\) values (for all pairs of connected nodes) in the validated network for that year: \(\hat{D}_{C C'} = D_{C C'}/\langle D \rangle\). Finally to assess the statistical significance of changes in distance between communities, we use a two-sided Kolmogorov–Smirnov (KS) test. Specifically, for each year, we compare the distribution of pairwise distances \((d_{ij}\) between two communities with the same distribution in the previous year, and the full distribution of all pairwise distances in the current network.

Declarations↩︎

Data Availability: Reddit conversation data used in this study include comments from the Politosphere dataset [34], originally collected via the Pushshift API (https://www.reddit.com/r/pushshift/), and posts retrieved from the same API in 2022).
Code Availability: The code used to reproduce the analysis is available on GitHub. For inquiries or additional material, please contact D. Cirulli at daniele.cirulli@cref.it.
Acknowledgements A.D. and D.C. acknowledge the helpful discussions on Reddit analysis with Anna Mancini and Riccardo Di Clemente. G.C. acknowledges support from “Deep ’N Rec” Progetto di Ricerca di Ateneo of University of Rome Tor Vergata. F.S. was partially supported by the project “CODE – Coupling Opinion Dynamics with Epidemics”, funded under PNRR Mission 4 “Education and Research” - Component C2 - Investment 1.1 - Next Generation EU “Fund for National Research Program and Projects of Significant National Interest” PRIN 2022 PNRR, grant code P2022AKRZ9. The work of F.S. is part of the joint initiative CREF-SONY.
Author Contributions: D.C. and A.D. gathered the data. D.C. performed the analysis. D.C. and A.D. made the figures. F.S. and G.C designed the analysis and supervised the project. All authors discussed the results and contributed to the final manuscript.
Competing Interests: The authors declare no competing interests.
This work makes extensive use of open-source libraries and tools. For data handling and analysis, we used Pandas[55], NumPy[56], SciPy[57], NLTK[58] and Scikit-learn[53]. For network analysis and visualization, we relied on NetworkX [59], iGraph[60], and graph-tool[61], while Gephi[62] was employed for network visualization and manual exploration. Plots and visual representations were generated using Matplotlib[63], D3Blocks[64] for chord diagrams, and Plotly[65] for Sankey diagrams. Finally, we employed NEMtropy [54] to implement Maximum Entropy Null Models for the statistical validation of Reddit’s complex networks, and the weighted network model from [66] for validating echo chambers.
Tag consistency was validated by comparing manual annotations with labels generated by GPT-4 through the OpenAI API [67] (see Supplementary Section S12).

**Supplementary Information**

1 Tag distribution and dataset statistics↩︎

The Reddit activity captured in the Politosphere dataset shows a steady intensification of political engagement over time, with increasing numbers of users, comments, and politically oriented subreddits throughout 2013–2017. This growth is punctuated by clear surges during U.S.election years—most notably in 2016, when activity peaks in correspondence with the presidential campaign (Fig. 6a). The creation and removal of subreddits by tag further indicate that many Democratic- and Conservative-aligned communities emerge around these periods, but often disappear shortly afterward. At the same time, the number of “Banned” groups rises sharply from 2016 onward, many of which vanish in subsequent years (Fig. 6b).

Although Reddit remains predominantly U.S.-based (over 50% of total traffic according to SimilarWeb [68]), European participation displays a gradual upward trend, contributing to a progressively more diversified user base. Together, these temporal dynamics define the empirical foundation for the structural and semantic analyses developed in the following sections.

This dataset constitutes the basis for all subsequent analyses, comprising \(120{,}429{,}663\) comments from \(1{,}889{,}317\) users across \(498\) politically oriented subreddits (January 2013–December 2017), obtained from the Reddit Politosphere dataset [34], together with \(16{,}778{,}792\) original submissions from the Pushshift Reddit dataset [35], covering \(208{,}316\) unique domains.

In our analyses, we rely on a core set of metadata fields. For comments, these include the comment, submission, and parent identifiers, scores, and subreddit-level information such as ban status and inferred political alignment. For submissions, we retain the text, submission ID, user and subreddit identifiers, scores, and domains, which enable us to examine both the linguistic content and the interaction networks underlying political discourse on Reddit.

A subsequent key step involves defining a concise yet informative tagging scheme for subreddit communities, extending subreddit-level metadata — including inferred political alignment — with a set of manually curated topic tags covering 17 categories. Public subreddit descriptions were also used to assist in the manual labeling of communities into topic tags.

We also prepared the textual material for linguistic analysis, in order to make structural (interaction-based) and semantic (content-based) components directly comparable within a unified analytical framework. Figure 7 summarizes this methodological workflow—from the raw conversational structure of Reddit data (panel a), to the extraction of descriptive and textual content used for tag assignment and language processing (panel b), to qualitative examples of subreddits with multiple tags and pairwise textual similarity among them (panel c). This process provides the foundation for both network-level and semantic analyses discussed in later sections.

For clarity of presentation, detected network communities are assigned names and colors that reflect the distribution of tags among their constituent subreddits. Rather than adopting only the most frequent tag, we construct composite labels: the dominant tag is reported first, and additional tags are included whenever their frequency reaches at least half that of the preceding one. In this way, a community can receive a compound name such as “Far-Left/Dem” when both affiliations are strongly represented. The generic tag “Politics” is not allowed as a primary label, as it represents an umbrella category without specific partisan meaning. Each tag is also associated with a base color, and the color of a community is then obtained as the average of the colors of the tags appearing in its label. This procedure ensures that community visualizations capture both the dominant and secondary topical affiliations of the underlying subreddits. The resulting tagging scheme—an extension of the original Politosphere labels—is detailed in Table 1, and the complete list of tagged subreddits is reported in Table 2.

Figure 7: Semantic analysis and tag assignment pipeline. Panel a: Example of Reddit conversation data, including posts and comment threads. Panel b: Separation between subreddit descriptions (used for manual tag assignment) and post texts (used for linguistic analysis). Panel c: Qualitative examples of tag assignment (e.g., “Far Left”, “News”) and content similarity across subreddits based on cosine distance. This figure summarizes how subreddit metadata and text content were jointly processed to extract thematic tags and compute linguistic similarities. The detailed description of the linguistic analysis is provided in Supplementary Section S6. — Figure 7: **Semantic analysis and tag assignment pipeline.** **Panel a**: Example of Reddit conversation data, including posts and comment threads. **Panel b**: Separation between subreddit descriptions (used for manual tag assignment) and post texts (used for linguistic analysis). **Panel c**: Qualitative examples of tag assignment (e.g., “Far Left”, “News”) and content similarity across subreddits based on cosine distance. This figure summarizes how subreddit metadata and text content were jointly processed to extract thematic tags and compute linguistic similarities. The detailed description of the linguistic analysis is provided in Supplementary Section S6.

Table 1: Color scheme used throughout the figures to represent subreddit categories.
Color	Abbreviation	Full name
	Geop	Geopolitics
	News	News
	Far-Left	Far Left
	Econ	Economics
	PolTalk	Political Talk
	Politic	Politics (general)
	Dem	Democrats
	Lib	Liberals
	Cons	Conservatives
	Gun	Gun Rights
	Far-Right	Far Right
	Ban	Banned
	SocialJustice	Social Justice
	UK	United Kingdom
	Eur	Europe
	Canada	Canada
	Austr	Australia

Table 2: Subreddits and associated tags.
Subreddit	Tags	Subreddit	Tags	Subreddit	Tags
Subreddit	Tags	Subreddit	Tags	Subreddit	Tags
2012Elections	Dem, Cons	2016Elections	Dem, Cons	_elections	Dem, Cons
2ALiberals	Lib, Gun	AOC	Politic, Dem	Abortiondebate	SocialJustice
ActiveMeasures	Geop, News	AgainstHateSubreddits	SocialJustice	AgainstTheChimpire	Ban
Agorism	Econ	Albertapolitics	Canada	AlexandriaOcasio	Dem
AltRightChristian	Ban	AmalaNetwork	Politic	AmericanPolitics	Politic
AnCap101	Far-Right	Anarchism	Far-Left	AnarchismOnline	Far-Left
AnarchistNews	News, Far-Left	Anarcho_Capitalism	Far-Left	Anarchy101	Far-Left
AntiSemitismInReddit	Far-Right	AntiTrumpAlliance	Dem	AnticommieCringe	Far-Right
AntifascistsofReddit	Far-Left	AnybodyButHillary	Dem	AnythingGoesNews	News
AprogressiveParty	Politic, SocialJustice	ArabIsraeliConflict	Geop	ArrestedCanadaBillC16	Canada
AsAGunOwner	Gun	AskALiberal	Lib	AskBernieSupporters	Dem
AskConservatives	Cons	AskDemocrats	Dem	AskEconomics	Econ
AskFeminists	SocialJustice	AskFemmeThoughts	SocialJustice	AskLibertarians	Lib
AskSocialScience	Politic	AskThe_Donald	Cons	AskTrumpSupporters	Cons
Ask_Politics	Politic	Ask_TheDonald	Cons	Askpolitics	Politic
AusPol	Austr	AustraliaLeftPolitics	Austr	AustralianPolitics	Austr
AutoNewspaper	News	BCpolitics	Politic	BDS	Geop
BadSocialScience	Politic	BannedFromThe_Donald	Cons	BasicIncome	Politic
BeardTube	PolTalk, Politic	BerlinTruckAttack	News	BernieSanders	Dem
BernieSandersSucks	Cons	Beto2020	Dem	BitcoinDiscussion	Econ, Politic
BlueMidterm2018	Dem	BreakingNews24hr	News	BritishPolitics	UK
CAVDEF	Politic	CNNmemes	News, Politic	COMPLETEANARCHY	Far-Left
California_Politics	Politic	CanadaPolitics	Canada	CanadianPolitics	Canada
Capitalism	Far-Left	CapitalismVSocialism	Far-Left	Cascadia	Politic
CatholicPolitics	Politic	China_Debate	Geop	Classical_Liberals	Lib
ColoradoPolitics	Politic	Communalists	Far-Left	CommunismWorldwide	Far-Left
Conservative	Cons	ConservativeLounge	Cons	ConservativeMeta	Cons
ConservativeNewsWeb	News, Cons	ConservativesOnly	Cons	Conservatives_R_Us	Cons
DNCleaks	News	DankLeft	Far-Left	DarkEnlightenment	Far-Right
Dave_Rubin	PolTalk	DeathtoAmeriKKKa	Far-Right	DebateAltRight	Ban
DebateAnarchism	Far-Left	DebateCommunism	Far-Left	DebateFascism	Ban
DebateaCommunist	Far-Left	DemocraticSocialism	Far-Left	Democrats2020	Dem
DescentIntoTyranny	Politic	Digital_Manipulation	Politic	DitchMitch	Dem, SocialJustice
DonaldTrumpWhiteHouse	Cons	Donald_Trump	Cons	DrainTheSwamp	Cons
Drumpf	Politic	ENLIGHTENEDCENTRISM	Politic	EarthStrike	SocialJustice
EcoInternet	SocialJustice	Economics	Econ	EducatingLiberals	Lib
Egalitarianism	SocialJustice	ElizabethWarren	Dem	EmergingRisks	News, Econ
EndFPTP	Politic	EndlessWar	Politic	EnoughCapitalistSpam	Far-Left
EnoughCommieSpam	Far-Right	EnoughIDWspam	News, Politic	EnoughLibertarianSpam	Lib
EnoughObamaSpam	Cons	EnoughPaulSpam	Dem	EnoughTrumpSpam	Dem
Enough_AOC_Spam	Dem	Enough_Sanders_Spam	Cons	EuropeMeta	Eur
EuropeanFederalists	Eur	EuropeanSocialists	Eur	ExplainBothSides	Politic
FLgovernment	Politic	FULLCOMMUNISM	Far-Left	FULLDISCOURSE	Far-Left, Politic
FakeProgressives	Far-Left	FeMRADebates	SocialJustice	FemraMeta	SocialJustice
FoxFiction	News, Politic	Foxhidesinfo	News, Politic	FreeEuropeNews	Eur
FreePolDiscussion	PolTalk, Politic	FreeSpeech	News	FreeSpeechWorld	News, Ban
Freethought	News	FriendsofthePod	Dem	Fuckthealtright	Far-Left, Dem
Full_news	News	GAPol	Politic	GBPolitics	UK
GUARDIANauto	News	GaryJohnson	Lib	GenZedong	Far-Left
GeneralStrikeUSA	SocialJustice	GoldandBlack	Far-Left	Government_is_lame	Far-Left
GrassrootsSelect	Dem	GreenAndPleasant	Far-Left, UK	GreenNewDeal	SocialJustice
GreenParty	SocialJustice	GunsAreCool	Gun	HBDstats	Ban
HanAssholeSolo	Far-Right	HateSubsInAction	SocialJustice	HeadlineCorrections	Politic
Hillary	Dem	HillaryForAmerica	Dem	HillaryForPrison	Cons
HongKongProtest	Geop, News	IAMALiberalFeminist	SocialJustice	IDontLikeRPolitics	Ban
IWG	Politic	IWW	SocialJustice	Identitarians	Ban, SocialJustice
ImABlue	Ban	Impeach_Trump	Dem	ImpeachmentWatch	Politic
IndoPakDialogue	Geop, Politic	IntellectualDarkWeb	Politic	Intelligence	Politic
IntelligenceNews	News	InternationalNews	News	IronFrontUSA	Far-Left, SocialJustice
IslamUnveiled	Ban	Israel	Geop	IsraelPalestine	Geop
IsraelSubredditWatch	Geop	Israel_Palestine	Geop	JamesDamore	SocialJustice
JoeBiden	Dem	JordanPeterson	Politic	Jreg	Politic
Kamala	Dem	KasichForPresident	Cons	KeepOurNetFree	Politic
Keep_Track	Politic	KochWatch	Econ	Kossacks_for_Sanders	Dem
Labour	News, UK	LabourUK	News, UK	LateStageCommunism	Far-Left, Politic
LateStageImperialism	Geop, Far-Left	LateStageSocialism	Far-Left	Le_Pen	Far-Right, Eur
LeftWingMaleAdvocates	Politic	LeftWithoutEdge	Far-Left	LeftieZ	Far-Left
LeftistHotTakes	Far-Left	Lessig2016	Dem	LevantineWar	Geop
LibDem	Dem, Lib	Liberal	Lib	Liberalist	Lib
LiberalsvsNazis	Lib	Libertarian	Lib	LibertarianDebates	Lib
LibertarianFreeState	Lib	LibertarianLeft	Far-Left, Lib	LibertarianPartyUSA	Politic, Lib
LibertarianUncensored	Lib	LouderWithCrowder	PolTalk	MENAConflicts	Geop
MarchAgainstNazis	Far-Left, SocialJustice	MarchAgainstTrump	Dem	MarchForNetNeutrality	Politic
MarchForScience	Politic	Marco_Rubio	Cons	MarketAnarchism	Far-Left
Marxism	Far-Left	MassachusettsPolitics	Politic	MedicareForAll	SocialJustice
MensRightsMeta	SocialJustice	MetaRepublican	Cons	MideastPeace	Geop, Politic
MissouriPolitics	Politic	MobilizedMinds	Politic	ModernPropaganda	News, Politic
MoreTankieChapo	Far-Left, Ban	Mueller	Geop	MurderedByAOC	Dem
NATOrussianconflict	Geop	NOWTTYG	Gun	NSALeaks	News
NationalSocialism	Far-Right, Ban	NegaRedditRedux	Politic	NeutralPolitics	Politic
NeutralTalk	Politic	NeverTrump	Dem	NewJerseyuncensored	Politic
NewPatriotism	Politic, Far-Right	New_Jersey_Politics	Politic	NewsWhatever	News
Newsy	News	NoFilterNews	News	NoNetNeutrality	Politic
NonAustrianEconomics	Econ	NorthKoreaNews	Geop, News	Objectivism	Politic
Occupy	News, Far-Left, SocialJustice	OntarioPolitics	Canada	OperationPullRyan	Cons
Oregon_Politics	Politic	OurPresident	Dem	Our_Politics	Politic
POLITIC	Politic	POTUSWatch	Politic	Palestine	Geop
PalestineCircleJerk	Geop	PalestineIntifada	Geop	PanicHistory	Politic
PeoplesPartyofCanada	Canada	Pete_Buttigieg	Dem	Physical_Removal	Far-Right, Ban
Polcompball	Politic	Policy2011	UK	PoliticalCompass	Politic
PoliticalCompassMemes	Politic	PoliticalDiscussion	Politic, SocialJustice	PoliticalHorrorStory	Politic
PoliticalHumor	Politic	PoliticalHumour	Politic	PoliticalMemes	Politic
PoliticalOpinions	Econ, Politic	PoliticalPhilosophy	Politic	PoliticalVideo	Politic
Political_Revolution	Far-Left, Dem, SocialJustice	Political_Tumor	Politic	Political_Tweets	Politic
Postleftanarchism	Far-Left	PragerUrine	Lib	Pragmatism	Politic
PraxAcceptance	Politic, Lib, Ban	PresidentWarren	Dem	PresidentialRaceMemes	Politic
PublicLands	News, Politic	Pyongyang	Geop	QualitySocialism	Far-Left
RedEnsign	UK, Canada	ReddLineNews	News	RedditCensors	Politic
RedsKilledTrillions	Far-Right	Reform_The_DNC	Dem	Republican	Cons
RepublicanValues	Cons	RepublicansForSanders	Cons	Right_Wing_Politics	Cons, Ban
RightwingLGBT	Far-Right, Ban, SocialJustice	RiseUPP	SocialJustice	Romney	Cons
RsocialismMeta	Far-Left	RussiaLago	Geop	SRSDiscussion	SocialJustice
SandersForPresident	Dem	SargonofAkkad	Far-Right, UK	SelfAwarewolves	Politic
SethRich	Ban	ShitLiberalsSay	Far-Left	ShitNeoconsSay	Ban
ShitPoliticsSays	Politic	ShitPoppinKreamSays	Politic	ShitRConservativeSays	Dem
ShitThe_DonaldSays	Dem	Shitstatistssay	Politic	ShittyDebateCommunism	Politic
Sino	News, Far-Left	SmugIdeologyMan	Politic	SocialDemocracy	Politic
SocialismVCapitalism	Far-Left	Socialism_101	Far-Left	Sorosforprison	Ban
SpeechFree	News	StillSandersForPres	Dem	StormfrontorSJW	Politic
SyndiesUnited	SocialJustice	SyrianRebels	Geop	TYT	PolTalk
TedCruzForPresident	Cons	TennesseePolitics	Politic	TexasPolitics	Politic
ThanksObama	Dem	The3rdPosition	Far-Right, Ban	TheColorIsBlue	News, Politic
TheColorIsRed	News, Politic	TheLeftCantMeme	Cons	TheMajorityReport	News
TheMotte	Politic	TheNewRight	Far-Right, Ban	TheNewsFeed	News
TheRecordCorrected	News	The_Cabal	Ban	The_Congress	Politic
The_Donald	Cons, Ban	The_Donald_CA	Cons	The_Europe	Ban, Eur
The_Farage	Far-Right, UK	The_Leftorium	Far-Left	The_Mueller	Politic
The_MuellerMeltdown	Politic	ThisButUnironically	Politic	ThreeArrows	Far-Left
TiADiscussion	News, Politic	TimCanova	Dem	TommyRobinson	Far-Right, UK
TraaButNoCommies	Far-Right, SocialJustice	TrueCatholicPolitics	Politic	TruePoliticalHumor	Politic
TrueReddit	News, Politic	TrueTrueReddit	News, Politic	True_AskAConservative	Cons
TrumpCriticizesTrump	Cons	TrumpForPrison	Dem	Trump_Watch	Cons
Trumpgret	Cons	Trumpgrets	Cons	UK_Politics	UK
UMukhasimAutoNews	News	USNEWS	News	UkrainianConflict	Geop
Ultraleft	Far-Left	UnbiasedCanada	Canada	VaushVidya	Far-Left, SocialJustice
VirginiaPolitics	Politic	VoteBlue	Dem	Vote_Trump	Cons
WatchRedditDie	Politic	WayOfTheAloha	Dem	WayOfTheBern	Dem
WhatsMyIdeology	Politic	WhereAreTheChildren	SocialJustice	WhereIsAssange	SocialJustice, Austr
WhiteRights	Far-Right, Ban	WikiInAction	Politic	WikiLeaks	News
YangForPresident	Dem	YangForPresidentHQ	Dem	YangGang	Dem
YemeniCrisis	Geop	Zionism	Geop	abetterworldnews	Geop, News
accidentallycommunist	Far-Left	acteuropa	Eur	actualconspiracies	Politic
actualliberalgunowner	Lib, Gun	agitation	SocialJustice	alltheleft	Far-Left
altnewz	News	altright	Far-Right, Ban	anarcho_primitivism	Far-Left
anarchocommunism	Far-Left	anarchomemes	Far-Left	anarchy	Far-Left
antifa	Far-Left, Ban	antifapassdenied	Far-Left	antiwar	SocialJustice
arizonapolitics	Politic	askaconservative	Cons	askhillarysupporters	Dem
atheismplus	Politic	austrian_economics	Econ	badeconomics	Econ
badgovnofreedom	Politic	badpolitics	Politic	benshapiro	Cons
bernie	Dem	bernieblindness	Dem	besteurope	Eur
betternews	News	brealism	Far-Left, SocialJustice	brexit	UK
calexit	Politic	canadaleft	Far-Left, Canada	capitalism_in_decay	Far-Left
censorship	Politic, SocialJustice	centerleftpolitics	Far-Left	centrist	Politic
chinareddits	Geop	chomsky	Far-Left, SocialJustice	climateskeptics	Politic
communism	Far-Left	communism101	Far-Left	communists	Far-Left
conservativecartoons	Cons	conservatives	Cons	conspiracyfact	Politic
conspiratocracy	Politic	craftofintelligence	News	cyberlaws	Politic
daverubin	PolTalk, Cons	debateAMR	SocialJustice	debatepoliticalphil	Politic
democraticparty	Dem	democrats	Dem	demsocialist	Far-Left
demsocialists	Far-Left	dirtbagcenter	Politic	distributism	Econ
donaldtrump	Cons, Ban	dsa	Far-Left	econmonitor	Econ
economy	Econ	enoughpetersonspam	UK	enoughsandersspam	Dem
esist	News	europeannationalism	Far-Right, Ban, Eur	europeans	Eur
europeanunion	Eur	europes	Eur	evolutionReddit	Politic
exlibertarian	Lib	fascist	Far-Right, Ban	feminismformen	SocialJustice
fivethirtyeight	Politic	foreignpolicy	Geop	fullstalinism	Far-Left
futuristparty	Politic	geopolitics	Geop	georgism	Politic
globalistshills	Politic	googoogahgah	Politic	government	Politic
gravelforpresident	Dem	greed	Econ, Politic	gue	Politic
gulag	Far-Left	guncontrol	Gun	gunpolitics	Gun
hillaryclinton	Dem	historicalrage	Politic	holocaust	Ban
illinoispolitics	Politic	indianmuslims	Geop	inslee2020	Dem
inthemorning	PolTalk	inthenews	News	iranpolitics	Geop
irishpolitics	Geop, Eur	irredeemables	Cons, Ban	israelexposed	Geop
jillstein	Dem, SocialJustice	jimmydore	PolTalk, Dem	justicedemocrats	Dem
killthosewhodisagree	Politic	labor	Econ, SocialJustice	law	Politic
leftcommunism	Far-Left	liberalgunowners	Lib, Gun	libertarianaustralia	Lib, Austr
libertarianmeme	Lib	libtard	Ban	marxism_101	Far-Left
media_criticism	Politic	metaNL	Lib	metanarchism	Far-Left
militant	Far-Left	mmt_economics	Econ	moderatepolitics	Politic
monarchism	Politic	mormonpolitics	Politic	mutualism	Far-Left
ndp	Dem, Canada	neoconNWO	Cons	neoliberal	Lib
neoprogs	SocialJustice	neutralnews	News	nevadapolitics	Politic
neveragainmovement	Gun	new_right	Far-Right, Ban	nonmorons	Politic
nra	Gun	nrxn	Politic	nyspolitics	Politic
obama	Dem	obamacare	Dem	occupywallstreet	Far-Left
onguardforthee	Canada	overpopulation	Politic	pol	Politic, Ban
politicalcartoons	Politic	politicalfactchecking	News, Politic	politicalhinduism	Geop
politics	Politic	politicsdebate	Politic	postnationalist	SocialJustice
prochoice	SocialJustice	progressive	SocialJustice	progun	Gun
prolife	Cons, SocialJustice	propaganda	News	qualitynews	News
race	Ban	randpaul	Cons	realworldpolitics	News
redacted	News	republicans	Cons	restorethefourth	Far-Left, SocialJustice
revolution	Far-Left	rojava	Geop	ronpaul	Lib, Cons
samharris	Dem	scotus	Politic	secondamendment	Gun
seculartalk	PolTalk	shitfascistssay	Far-Left	shitguncontrollerssay	Gun
shitleftistssay	Cons	shitneoliberalismsays	Far-Left	shittankiessay	Far-Left
slatestarcodex	Politic	slatestarcodex_cw	Politic	smuggies	Ban
socialanarchism	Far-Left	socialism	Far-Left	socialjustice101	Far-Left, SocialJustice
stevencrowder	PolTalk, Cons	stopadvertising	News	stupidpol	Far-Left
syriancivilwar	Geop	syrianconflict	Geop	taxmarch	SocialJustice
terrorism	Politic	the_meltdown	Politic	thedavidpakmanshow	PolTalk
thenewcoldwar	Politic	thenewsrightnow	News	theredpillright	Far-Right
thomasjefferson	Politic	tories	Cons, UK	trollfare	Geop
trump	Cons	trump16	Cons	trumptweets	Cons
tuesday	Cons	tulsi	Dem	tytonreddit	PolTalk
ukipparty	UK	ukpolitics	UK	ukright	Far-Right, UK
uncensorednews	News, Ban	unfilter	News, Politic	union	Far-Left, SocialJustice
usanews	News	uspolitics	Politic	venezuelancivilwar	Geop
wakinguppodcast	PolTalk	walkaway	Cons	wexit	Canada
willis7737_news	News	worldevents	Geop, News	worldpolitics	Geop, News
worldtoday	Geop, News	yimby	Politic

2 Effectiveness and advantages of statistical validation↩︎

Our approach employs the bipartite configuration model (BiCM; see Methods) to filter out random co-occurrences and correct for degree bias before projection onto the subreddit layer. As detailed in the main text, the validated projection is obtained after applying the Revealed Comparative Advantage (RCA) filter, retaining only links with \(\text{RCA} \geq 1\) [50]. For each projected link, \(p\)-values are computed analytically using a Poisson approximation of the bipartite connection probability [23], [54]. Statistical significance is then assessed using a threshold \(\alpha = 10^{-4}\), with multiple-hypothesis correction through the False Discovery Rate (FDR) procedure [52].

Figure 8 illustrates the effect of validation by comparing bipartite degree with projected degree in 2016. Without validation, degree correlations remain near-perfect and the adjacency matrix (ordered by degree) exhibits a nested, uninformative structure. After validation, the trivial correlation is broken, yielding a structure with clear community separation. A visual comparison of the two adjacency matrices and the corresponding partitions is shown in Fig. 9.

For 2016, we examined the variation of modularity \(Q\) and density \(\delta\) under different filtering strategies. As shown in Fig. 10, validated projections obtained with the BiCM display a smooth, monotonic increase in \(Q\) and a corresponding decrease in \(\delta\) as the significance threshold \(\alpha\) becomes more stringent. Importantly, the resulting community structure remains stable and informative even at relatively low values of \(\alpha\).

By contrast, in the unvalidated case—illustrated in Fig. 11 for the user–subreddit projection—the signal is much weaker. Modularity never reaches values comparable to the validated networks, and reducing density requires discarding large portions of the network. Even at the modularity peak(\(C=6.50\)), the resulting structure remains unstable and fails to provide coherent communities.

The community partitions resulting from these procedures are compared in Fig. 12. Validated networks yield stable partitions across thresholds, whereas in the unvalidated case, even at the threshold corresponding to the modularity peak (\(C=6.50\)), the detected clustering differs substantially from that observed at lower cutoffs. For consistency, we matched validated and unvalidated cases at the same network size, using \(\alpha = 10^{-2},10^{-4},10^{-6},10^{-8}\), corresponding to \(C=1.65,2.57,2.80,3.15\).

Finally, to assess partition stability, we generated pairs of projections (validated vs.non-validated) with the same number of subreddits and compared their Louvain partitions using information–theoretic measures [25], [69]–[71]. Given a partition \(C\) with \(K\) clusters, the Shannon entropy is defined as: \[S(C) = - \sum_{\kappa=1}^{K} P(\kappa) \log P(\kappa), \quad \text{where} \quad P(\kappa) = \frac{n_{\kappa}}{n}.\] Here, \(n_{\kappa}\) is the size of cluster \(\kappa\), and \(n\) is the total number of elements.

The mutual information (MI) quantifies the similarity between two partitions \(C\) and \(C'\) of the same data by measuring the shared information between them: \[MI(C , C') = S(C) + S(C') - S(C, C').\]

To normalize MI, we use the Normalized Mutual Information (NMI): \[NMI(C, C') = \frac{2 \, MI(C, C')}{S(C) + S(C')}.\]

A more refined measure, Adjusted Mutual Information (AMI), accounts for expected MI between random partitions: \[AMI(C, C') = \frac{MI(C, C') - \mathbb{E}[MI(C, C')]}{\max[S(C), S(C')] - \mathbb{E}[MI(C, C')]}.\]

Lastly, we computed the Variation of Information (VI) [71], which quantifies the distance between two partitions: \[VI(C, C') = S(C) + S(C') - 2 \, MI(C, C').\] Since \(VI \le \log(n)\), we normalized the score by \(\log(n)\) for comparability across datasets of different sizes.

To ensure comparability across validated and non-validated cases, we grouped in the validated projections all subreddits that did not survive statistical filtering into a single residual cluster. Summarizing, we computed Shannon entropy \(S(C)\), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI), and the Variation of Information (VI, normalized by \(\log n\)).

Figure 13 shows that validated networks consistently yield high NMI, AMI, and 1-NVI (one minus the normalized Variation of Information) across thresholds, indicating robust and repeatable partitions. In contrast, unvalidated networks produce unstable results, with large fluctuations across thresholds and generally lower similarity between partitions.

Figure 9: Comparison of subreddit networks in 2016, with the same number of nodes. In the unvalidated case, the adjacency matrix (nodes ordered by bipartite degree) exhibits a nested and uninformative structure, and the resulting partition lacks clear modularity. By contrast, the validated network reveals a block structure in the adjacency matrix and yields clearer, more coherent communities when modularity is maximized.

Figure 10: Effect of statistical validation on projected subreddit networks (2016), derived from both user–subreddit and domain–subreddit analyses. As the significance threshold \alpha becomes more stringent, validated projections display a monotonic increase in modularity Q and a corresponding decrease in density \delta. — Figure 10: Effect of statistical validation on projected subreddit networks (2016), derived from both user–subreddit and domain–subreddit analyses. As the significance threshold \(\alpha\) becomes more stringent, validated projections display a monotonic increase in modularity \(Q\) and a corresponding decrease in density \(\delta\).

Figure 11: Effect of weight thresholding without validation for the 2016 user–subreddit projection. Modularity Q and density \delta fluctuate erratically as the cutoff C = \log(co-occurrence weight) increases, and even at the peak of modularity (C=6.50) the resulting partitions remain unstable. — Figure 11: Effect of weight thresholding without validation for the 2016 user–subreddit projection. Modularity \(Q\) and density \(\delta\) fluctuate erratically as the cutoff \(C = \log(\)co-occurrence weight\()\) increases, and even at the peak of modularity (\(C=6.50\)) the resulting partitions remain unstable.

Figure 12: Flowcharts of subreddit communities in 2016. (a) *Unvalidated*: partitions obtained at increasing thresholds \(C\), defined as the logarithm of the co-occurrence weight in the projected network. For the first four cutoffs, the numbers in parentheses indicate the reference significance levels \(\alpha\) used in the validated case to enable like-for-like comparison. The last partition corresponds to the modularity peak at \(C=6.50\). (b) *Validated*: partitions obtained at different significance thresholds \(\alpha\). Validated communities are more stable across thresholds and achieve higher modularity than their unvalidated counterparts..

3 Community detection algorithm comparison↩︎

Communities in our user–subreddit and domain–subreddit networks were first detected by maximizing modularity using the Louvain algorithm [36].

Modularity \(Q\) quantifies the quality of a network division into modules, with higher values indicating dense intra‐module connections and sparse inter‐module links. It is defined as \[Q = \frac{1}{2m} \sum_{i j} \bigl(A_{ij} - \tfrac{k_i k_j}{2m}\bigr)\,\delta(C_i, C_j),\] where \(A_{ij}\) is the weight of the edge between nodes \(i\) and \(j\), \(k_i\) and \(k_j\) are their degrees, \(m\) is the total edge weight, and \(\delta(C_i, C_j)=1\) if nodes \(i\) and \(j\) share the same community, 0 otherwise. The Louvain procedure begins by assigning each node to its own community, then iteratively moves nodes to neighboring communities whenever such moves increase \(Q\). Once no single‐node moves can improve modularity, communities are aggregated into “super‐nodes” and the process repeats until a global maximum is reached.

To assess the robustness of these partitions, we compared Louvain results with those obtained from the Infomap algorithm [37], [72], which frames community detection as an information‐theoretic optimization. Infomap simulates random walks on the network under the assumption that walkers spend more time within communities than between them. It employs the “map equation” to measure the theoretical code length \(L(M)\) required to describe a random‐walk trajectory given a community partition: \[L(M) \;=\; q_{\curvearrowright}\,S(\mathcal{Q}) \;+\; \sum_{i} p_{\circlearrowright}^{i}\,S(P^{i}),\] where \(q_{\curvearrowright}\) is the probability of exiting a community, \(S(\mathcal{Q})\) is the entropy of inter‐community transitions, \(p_{\circlearrowright}^{i}\) is the probability of taking steps within community \(i\), and \(S(P^{i})\) is the entropy of intra‐community movements. Infomap iteratively refines the partition to minimize \(L(M)\), with the best partition yielding the shortest code length.

For the 2016 data, we generated networks at multiple significance thresholds and a non-validated baseline using a fixed interaction cutoff (see Methods and SI Section S3). We then applied both Louvain and Infomap to each network and quantified agreement via normalized mutual information (NMI), adjusted mutual information (AMI), and variation of information (VI). As shown in Fig. 14, although both algorithms produce highly similar partitions, statistically validated networks achieve consistently higher NMI and AMI (and lower VI) than the non-validated baseline, demonstrating stronger consistency between methods. These improvements are accompanied by a systematic increase in modularity and a corresponding decrease in code length as the validation threshold becomes more stringent, further indicating that statistical filtering sharpens community structure.

Finally, to provide a complementary comparison, we also employed a stochastic block model (SBM) approach [38], [61], which infers hierarchical block structure by optimizing a minimum-description-length objective over possible partitions. This model accommodates nested communities and automatically selects the number of blocks. Figure 14 also reports the comparison between Louvain and SBM partitions: here too, validated networks yield higher concordance between algorithms, demonstrating the stability of our community assignments across different detection paradigms.

Figure 14: Agreement between community detection algorithms on validated and non-validated networks. The left panel compares Louvain and Infomap partitions, while the right panel compares Louvain and stochastic block model (SBM) partitions. Agreement is quantified using normalized mutual information (NMI), adjusted mutual information (AMI), and variation of information (VI). In both cases, statistically validated networks yield higher NMI, AMI and 1-NVI (and 1-\mathrm{VI} (equivalently, lower VI)) than the non-validated baseline, demonstrating stronger consistency across detection paradigms. — Figure 14: Agreement between community detection algorithms on validated and non-validated networks. The left panel compares Louvain and Infomap partitions, while the right panel compares Louvain and stochastic block model (SBM) partitions. Agreement is quantified using normalized mutual information (NMI), adjusted mutual information (AMI), and variation of information (VI). In both cases, statistically validated networks yield higher NMI, AMI and 1-NVI (and \(1-\mathrm{VI}\) (equivalently, lower VI)) than the non-validated baseline, demonstrating stronger consistency across detection paradigms.

4 Polarization and user labels in interaction-based partitions↩︎

To analyze polarization in more detail, we constructed polarization matrices that capture how users labeled with a given affiliation populate different subreddit communities. Formally, for each community \(c'\) we consider the set of its active users \(W_{c'}\), and for each possible label \(c\) we compute the fraction of \(W_{c'}\) whose main label is \(c\), dividing by the total number of users in \(W_{c'}\) (see Methods). The resulting entry \(P_{c'c}\) therefore represents the proportion of users active in community \(c'\) that are labeled as \(c\), so that each row of the polarization matrix is normalized to 1. Diagonal entries \(P_{cc}\) define the polarization index of community \(c\), quantifying the extent to which its activity is self-focused, with higher values indicating stronger polarization. These matrices provide a general framework to study polarization, which we apply at different levels of aggregation—from tag-based groups to network-detected communities.

(i) Tag communities.

First, we constructed polarization matrices where rows correspond to topic tags assigned to subreddits and columns to tags propagated onto users (\(N \times N\); see Methods). Users are labeled a priori according to the main tag of the subreddits in which they comment. The resulting matrices are strongly diagonal, indicating clear within-tag alignment. Annual polarization matrices (Fig. 15) and donut charts (Fig. 16) provide direct visualizations of these distributions. Mann–Whitney \(U\) tests confirm the statistical significance of this diagonal structure, showing robust polarization within tag-defined communities (Table 3).

(ii) Network communities.

Second, we applied the Louvain algorithm to the user–subreddit interaction networks, yielding communities of subreddits (\(C \times C\) matrices). In this case, users are labeled by their assigned network community. The corresponding polarization matrices (Fig. 17) and donut charts (Fig. 18) again reveal highly diagonal structures, validating that network partitions capture strong polarization patterns.
(iii) Community–tag cross analysis.

Finally, we examined the internal makeup of network communities by comparing them with user tags, producing mixed \(C \times N\) matrices. These matrices measure how each detected community is populated by users with different propagated tags. Results are summarized in Fig. 19, while normalized Shannon entropies (Table 4, see also Eq. 1 ) quantify the degree of topical diversity within each community in terms of user-tag composition.

For each community detected in the subreddit network, we computed the entropy of its user-tag distribution. Each user is assigned a tag, and for a given community we consider the empirical distribution \(\{p_i\}\) over tags \(i\). The normalized Shannon entropy is defined as \[\label{eq:entropy} H_{\text{norm}} = - \frac{1}{\log_2 N_{\text{tags}}} \sum_{i=1}^{N_{\text{tags}}} p_i \, \log_2 p_i \,,\tag{1}\] where \(N_{\text{tags}}\) is the total number of distinct tags. By construction, \(H_{\text{norm}} \in [0,1]\): values close to \(0\) indicate communities dominated by a single tag (low diversity), while values close to \(1\) correspond to communities where tags are more evenly distributed (high diversity).

To track how user labels evolve over time, we generated annual Sankey diagrams (Fig. 20), illustrating flows of users between label categories. These flows reveal a marked rise in Democratic-labeled users from 2015 onward, with both Conservative and Banned groups expanding significantly around the 2016 elections. In particular, we observe strong transitions into Banned: in 2015–2016 about 11.9% of Conservatives and 15.6% of Democrats moved into Banned, while in 2016–2017 these percentages rose to 21.9% and 3.9%, respectively. Cross-flows between Democrats and Conservatives are also visible (4.5% and 4.9% in 2015–2016; 8.4% and 1.8% in 2016–2017). The baseline group sizes for Democrats and Conservatives, which serve as reference values for these flows, are reported in Table 5. Multi-label assignment can introduce fluctuations: for instance, a large cohort of Democrats appears in 2015 (58.5k equivalent users), and about 24.4% of those moving into Ban later return to Democrats in 2016–2017.

Together with the entropy scores and flow dynamics, these results highlight the heterogeneity and volatility of political communities on Reddit, with polarization intensifying around election cycles and moderation contributing to the emergence of Banned groups.

Table 3: Mann–Whitney \(U\) test results confirming that within-community similarity (diagonal values) is significantly higher than cross-community similarity. Comparisons are shown both across network and tag communities.
Year	Communities \(p\)-value	Tags \(p\)-value
2013	\(1.42 \times 10^{-5}\)	\(3.12 \times 10^{-10}\)
2014	\(5.73 \times 10^{-7}\)	\(2.82 \times 10^{-12}\)
2015	\(5.73 \times 10^{-7}\)	\(2.51 \times 10^{-12}\)
2016	\(7.23 \times 10^{-5}\)	\(3.96 \times 10^{-12}\)
2017	\(6.42 \times 10^{-5}\)	\(3.35 \times 10^{-12}\)

Table 4: Entropy of subreddit communities identified in the subreddit networks, computed according to the distribution of user tags populating each community.
Year	Community	Entropy
Year	Community	Entropy
2013	Geop	0.432
2013	News/SocialJustice/Politic	0.740
2013	Far-Left	0.711
2013	UK	0.043
2013	Cons/Lib/Politic	0.673
2013	SocialJustice/Politic	0.693
2013	Canada	0.071
2014	Geop/News/Politic	0.656
2014	Far-Left	0.700
2014	Lib	0.623
2014	UK/Eur/Politic	0.608
2014	Cons/Gun/Dem/Econ	0.753
2014	SocialJustice	0.569
2014	Ban/Far-Right	0.375
2014	PolTalk	0.031
2014	Canada	0.045
2015	Geop	0.504
2015	News/Politic	0.653
2015	Far-Left	0.778
2015	Econ/Lib/Far-Left/Politic	0.599
2015	Cons/Dem	0.669
2015	SocialJustice/Politic	0.738
2015	Ban/UK/Far-Right	0.350
2015	Canada	0.017
2015	Austr	0.028
2016	News/Geop/Politic	0.846
2016	Far-Left	0.796
2016	Dem/Politic	0.692
2016	Cons	0.706
2016	Ban/UK/SocialJustice/Cons/Far-Right/Eur	0.511
2016	Austr	0.043
2017	Geop	0.548
2017	News/Politic	0.856
2017	Far-Left	0.766
2017	Dem/Politic	0.671
2017	Cons/Lib/Politic	0.808
2017	Ban/SocialJustice/Far-Right/Cons	0.507

Figure 20: Flows of users across tag-defined groups, 2013–2017. (a) Full flowchart including all transitions; (b) restricted view including only flows representing at least 5% of the source community.

Table 5: Weighted number of users in Democratic and Conservative groups across years.These quantities represent the baseline sizes of the two groups that generate the flows in Fig. 20.Values are fractional (non-integers) because multi-label assignment is possible, i.e., the same user can contribute to multiple groups.
Year	2013	2014	2015	2016	2017
Democrats	3 241.0	3 473.0	58 558.5	130 357.0	126 993.5
Conservatives	12 257.8	12 397.5	12 274.0	52 936.0	46 958.5

5 Impact of tag removal on the polarization index↩︎

To uncover interdependencies among tag-based communities, we performed a leave-one-tag-out analysis: each tag was removed in turn from the set of possible subreddit labels and we recomputed the polarization index \(\rho\) for the remaining tags following the procedure described in Results. This approach highlights indirect relationships driven by shared user bases and the mutual reinforcement of community coherence.

For each focal tag \(t\) and year \(y\), we quantified sensitivity by sequentially omitting every other tag \(s \neq t\) and measuring the resulting deviations in \(\rho_t\). These deviations were standardized as \[z_{t,y}(s) \;=\; \frac{\rho_t^{(-s)}(y)\;-\;\mu_t(y)}{\sigma_t(y)}\,,\] where \(\rho_t^{(-s)}(y)\) is the polarization of \(t\) after removing \(s\), while \(\mu_t(y)\) and \(\sigma_t(y)\) are the mean and standard deviation across all such exclusions in year \(y\). Equivalently, the color scale reports how many standard deviations each value deviates from the annual across–tag mean for the focal tag.

In the bar plot of Figure 3 (main text), we show polarization values for each tag together with shaded bars indicating the effect of excluding “Banned,” benchmarked against the across–exclusion baseline.

Figure 21 expands this view, reporting heatmaps for five focal tags: rows correspond to the excluded tag \(s\), columns to years \(y\), and colors encode \(z_{t,y}(s)\) in units of \(\sigma\). Positive values indicate that removing \(s\) increases the focal tag’s polarization relative to its annual mean; negative values indicate a decrease. These patterns reveal how other tags impact the focal tag through overlap and coupling of user populations. Consistent with the main text, omitting “Banned” produces a pronounced drop in the polarization of Conservative and Democratic groups, while excluding “Far Right” substantially reduces the polarization of Banned subreddits, especially in the pre–2016 election period.

6 Textual patterns and shifts in similarity↩︎

In this section, we examine how cosine-based textual similarity evolves within communities defined by both topic tags and network-derived interactions. Each subreddit’s textual corpus was aggregated into a single document, combining all posts and comment texts over the observation period. The resulting documents were represented as vector embeddings of terms, built from the vocabulary described in Methods. Pairwise cosine similarity between these vectors provides a measure of linguistic proximity among subreddits.

Similarities were then averaged within tag-based groups and within network-based communities, yielding two complementary perspectives on linguistic cohesion. Figures 22 and 23 report the annual cosine-similarity matrices for 2013–2017, showing pairwise similarities among tag-based and interaction-based communities, respectively. In both cases, diagonal entries (within-community similarity) are systematically higher than off-diagonal values, and their growth over time indicates an overall increase in textual cohesion within groups. At the same time, cross-community distances also tend to decrease, pointing to a progressive convergence in language use across different subreddit clusters.

To assess the robustness of this signal, we conducted Mann–Whitney \(U\) tests comparing each diagonal entry with the distribution of off-diagonal values. The results, summarized in Table 6, confirm that within-community similarity is significantly stronger than cross-community similarity.

We then examined temporal shifts in textual similarity while controlling for the increasing number of subreddits over time. Restricting the analysis to the subset of subreddits active from 2014 through 2016, we observed a significant average increase in cosine similarity for the Far Left, Democratic, News, and Conservative communities. Finally, we validated these distributional changes using two-sample Kolmogorov–Smirnov tests [57], directly comparing each year with its predecessor and contrasting 2014 with 2016. The resulting distributions, illustrated in Fig. 24, confirm a systematic upward shift in textual cohesion over time.

Beyond this general rise in linguistic cohesion, qualitative inspection of the cosine-similarity matrices reveals meaningful cross-group patterns. Before the 2016 elections, Banned subreddits display linguistic proximity to both Far-Right and Social Justice communities, reflecting mixed ideological influences. During the election year, however, the strongest convergence occurs between Democratic, Conservative, and Banned communities, indicating a temporary alignment of discourse around shared political narratives. This convergence is more evident in textual similarity than in interaction-based networks, where these groups remain structurally separated. Together, these results provide additional evidence for a gradual softening of linguistic boundaries, despite the ideological divisions observed in the main text.

Table 6: Mann–Whitney \(U\) test \(p\)-values for diagonal similarity significance, reported by year.Since only one set of tests was performed, the same values apply to both community- and tag-based comparisons.
Year	Communities \(p\)-value	Tags \(p\)-value
2013	\(2.53 \times 10^{-5}\)	\(6.99 \times 10^{-7}\)
2014	\(2.33 \times 10^{-5}\)	\(4.30 \times 10^{-10}\)
2015	\(9.75 \times 10^{-6}\)	\(8.68 \times 10^{-11}\)
2016	\(3.50 \times 10^{-4}\)	\(1.53 \times 10^{-10}\)
2017	\(3.11 \times 10^{-3}\)	\(9.52 \times 10^{-10}\)

Figure 24: Joyplots showing distributional shifts in cosine similarity across years, restricted to subreddits active throughout 2014–2016. Two-sample Kolmogorov–Smirnov tests confirm significant upward shifts in textual similarity, consistent with results from Fig. 5 (main text).

Table 7: Significance of pairwise comparisons across years for each community group.\(p\)-values are truncated to two decimals. Significance thresholds: \(0.05\), \(0.01\), \(0.001\).
Tag	KS \(p\)-value 2014 \(\rightarrow\) 2015	KS \(p\)-value 2015 \(\rightarrow\) 2016	KS \(p\)-value 2014 \(\rightarrow\) 2016
Far-Left	(0.0013) ++	(\(<10^{-14}\)) +++	(\(<10^{-28}\)) ***
Dem	(0.51)	(0.21)	(0.018) *
News	(0.09)	(0.00015) +++	(0.045) *
Cons	(0.041) +	(0.0081) ++	(0.0000042) ***
Far-Right	(1.00)	(0.18)	(0.18)

In this section, we examine the subreddit–domain bipartite networks to study how political orientations propagate through news sharing and how domains contribute to the overall polarization structure. Community tags assigned to subreddits are propagated to the domains they share, allowing us to characterize the political composition of the information ecosystem.

Figure 25 (e) illustrates the distribution of subreddit tags across domains in 2017 through donut charts, highlighting how news outlets reflect the partisan alignment of the communities that circulate them. The overall patterns broadly mirror those observed in the interaction-based analysis: Far-Right participation within Banned communities is visible before the elections but then declines, while Banned participation in Conservative domains grows during the electoral period, though less sharply than in the interaction analysis.

Based on these mappings, we recalculated the polarization index (see Methods in the main text) for selected domain categories. As shown in Figure 26, polarization levels are generally lower than in the interaction-based analysis. A notable exception is represented by Far-Right subreddits, which display a marked decline in polarization when examined in terms of their shared domains over time, while Banned subreddits exhibit a more moderate increase compared to the stronger polarization observed in the interaction-based analysis.

Table 8 complements these results by reporting, for each year, the top 30 news domains most frequently shared by subreddits labeled as Far-Left, Democratic, Conservative, Far-Right, or Banned. Some domains clearly align with the propagated labels, reflecting the expected partisan orientation of the communities.

A few widely shared platforms, such as youtube.com or reddit.com, are instead labeled as Far-Right after normalization. This may occur because these mainstream domains are broadly present across multiple large communities, but their relative weight can appear amplified within smaller groups (particularly Far-Right subreddits), once normalization is applied during the label propagation process (see Methods). This does not substantially distort the partisan composition of Far-Right domains, which remains consistent with the patterns observed in the interaction-based analysis. This overview provides a concise longitudinal perspective on domain-level sharing behavior across 2013–2017.

Figure 26: Polarization indices of news domains from 2013 to 2017. Each panel reports the diagonal elements of the polarization matrix (see Methods), capturing the extent to which domains inherit partisan alignment from the subreddits that share them.

Table 8: Top 30 news domains shared across subreddit communities from 2013 to 2017. Labels are assigned through label propagation from subreddits to domains. The table reports the most frequently shared domains associated with Far-Left, Democratic, Conservative, Far-Right, and Banned communities.
Far-Left	Democrats	Conservatives	Far-Right	Banned
Far-Left	Democrats	Conservatives	Far-Right	Banned

libcom.org	boldprogressives.org	hotair.com	youtube.com	revolutionarycommunist.org
anarchistnews.org	obamacare.healthinsuranceexchangeenvoy.com	frugal-cafe.com	reddit.com	bluevirginia.us
news.infoshop.org	correntewire.com	redstate.com	nytimes.com	atheism.about.com
325.nostate.net	au.org	newser.com	i.imgur.com	news.outlookindia.com
climateandcapitalism.com	genelalor.com	unitedliberty.org	huffingtonpost.com	themuslimissue.wordpress.com
anarkismo.net	egbertowillies.com	fireandreamitchell.com	washingtonpost.com	mobile.bbc.co.uk
crimethinc.com	thepoliticalpragmatic.blogspot.com	cagle.com	imgur.com	thebiglead.com
therealmovement.wordpress.com	wapo.st	lifenews.com	bbc.co.uk	everydayfeminism.com
indybay.org	fdlaction.firedoglake.com	powerlineblog.com	reuters.com	peopleslawoffice.com
spiritofcontradiction.eu	pensitoreview.com	cleveland.com	cnn.com	tealeafnation.com
anti-imperialism.com	politi.co	blogs.rollcall.com	guardian.co.uk	wlrn.org
readingisforsnobs.com	cir.ca	volokh.com	news.yahoo.com	kleinonline.wnd.com
fightbacknews.org	imageshack.us	rollcall.com	thinkprogress.org	careandwashingofthebrain.blogspot.com
rosswolfe.wordpress.com	poy.time.com	randpaulreview.com	rawstory.com	patriotaction.net
labornotes.org	truthernews.wordpress.com	fivethirtyeight.blogs.nytimes.com	youtu.be	freedomportal.net
marxist.com	wegoted.com	yidwithlid.blogspot.com	theguardian.com	ozconservative.blogspot.com
gonzotimes.com	4wheeledlefty.com	jammiewf.com	alternet.org	rinocracy.com
kasamaproject.org	boompopmedia.com	foxbusiness.com	politico.com	anepigone.blogspot.com
signalfire.org	act.boldprogressives.org	rare.us	salon.com	k0nsl.org
voluntaryvirtues.com	allthingsdemocrat.com	publicpolicypolling.com	rt.com	sunnyisright.com
wearemany.org	bobcesca.thedailybanter.com	people-press.org	foxnews.com	yadadarcyyada.wordpress.com
propagandalalaland.blogspot.com	elections.firedoglake.com	us.cnn.com	cbc.ca	alchemyoftheword.net
bqbrew.com	postimg.org	politicalwire.com	usatoday.com	asstr.org
libertarian-labyrinth.blogspot.com	ezkool.com	newslineusa.com	bloomberg.com	crisisrepublic.com
ashevillefm.org	images.politico.com	fox19.com	npr.org	streetcarnage.com
theredplebeian.wordpress.com	act.credoaction.com	mainwashed.com	abcnews.go.com	coastalcoed.wordpress.com
moufawad-paul.blogspot.com	illuminate.newsvine.com	youngcons.com	breitbart.com	gettingworse.co.uk
litostpublishing.org	nashvillescene.com	whitehousedossier.com	telegraph.co.uk	nationstates.net
prisoncensorship.info	eaglerising.com	variety.com	latimes.com	returntothepit.com
democracyatwork.info	theobamadiary.com	freedomworks.org	dailymail.co.uk	buzzle.com
2014
marxists.org	pensitoreview.com	msnbc.com	youtube.com	robhoey127.blogspot.ca
anarchistnews.org	snopes.com	hotair.com	nytimes.com	mrc.org
labornotes.org	ezkool.com	mediaite.com	reddit.com	jsmstateofmind.com
bayareaintifada.wordpress.com	killingthebreeze.com	newsbusters.org	bbc.co.uk	onourselvesandothers.com
pslweb.org	colbertnation.com	politicalticker.blogs.cnn.com	washingtonpost.com	grizzom.blogspot.com.br
ashevillefm.org	aquidneckinquirer.typepad.com	newsmax.com	reuters.com	morningstarnews.org
submedia.tv	messagepresident.com	powerlineblog.com	en.itar-tass.com	premiumtimesng.com
sooperarticles.com	greencarreports.com	tampabay.com	i.imgur.com	contextflorida.com
325.nostate.net	acasignups.net	therightscoop.com	huffingtonpost.com	dnj.com
hq-law.com	pharmpro.com	ace.mu.nu	upi.com	peoplebranch.wordpress.com
socialistworld.net	barackobama.com	twinsopinion.com	npr.org	conservativeinfidel.com
en.internationalism.org	globalnewsnetwork.us	timesdispatch.com	rt.com	kwiksurveys.com
apsense.com	mtcowgirl.com	appalachianareanews.com	online.wsj.com	modiforpm.org
climateandcapitalism.com	postcrescent.com	blog.heritage.org	cnn.com	themedicalbag.com
thepiratebay.se	progresscentral.x10.mx	reviewjournal.com	imgur.com	thenewage.co.za
dwardmac.pitzer.edu	theconservativepundit.net	finance.townhall.com	ibtimes.com	androidhippo.com
blog.designs.codes	allthingsdemocrat.com	m.townhall.com	aljazeera.com	m.france24.com
thetechcult.com	datab.us	unitedliberty.org	np.reddit.com	news.siteintelgroup.com
litostpublishing.org	palmerreport.com	host.madison.com	news.yahoo.com	static3.businessinsider.com
detroitinquiry.org	codebluepolitics.com	conventionofstates.com	breitbart.com	atlantamuslim.com
en.contrainfo.espiv.net	enewspf.com	video.foxnews.com	bbc.com	montgomerynews.com
ic.pics.livejournal.com	greencarcongress.com	minx.cc	youtu.be	newsweekpakistan.com
isreview.org	healthinsurance.org	media.townhall.com	foxnews.com	noliesradio.org
kasamaproject.org	music.yahoo.com	libertymindsbreakfree.com	twitter.com	pushbacknow.net
fightimperialism.org	blog.pfaw.org	nbcnewyork.com	salon.com	vho.org
rdwolff.com	consumerreports.org	m.washingtonexaminer.com	thinkprogress.org	culturewars.com
thenorthstar.info	globegazette.com	jammiewf.com	telegraph.co.uk	oi61.tinypic.com
signalfire.org	jaybookman.blog.ajc.com	buzzpo.com	dailymail.co.uk	pi-news.net
avtonom.org	politicaloutcast.com	stevengoddard.wordpress.com	hosted.ap.org	ushmm.org
revolutionarycommunist.org	softballpolitics.com	joannenova.com.au	theatlantic.com	winstonsmithministryoftruth.blogspot.co.uk
2015
libcom.org	ezkool.com	cnn.com	youtube.com	imgur.com
marxists.org	samuel-warde.com	msnbc.com	bbc.co.uk	israelandstuff.com
earthfirstjournal.org	liberalvaluesblog.com	hotair.com	i.imgur.com	truthrevolt.org
theanarchistlibrary.org	climatecrocks.com	politifact.com	twitter.com	elections.huffingtonpost.com
fireworksbayarea.com	politi.co	realclearpolitics.com	np.reddit.com	dcwhispers.com
crimethinc.com	nationalmemo.com	usnews.com	breitbart.com	trunews.com
workers.org	vtdigger.org	yahoo.com	rt.com	forum.codoh.com
new-compass.net	eoinhiggins.blogspot.com	newsbusters.org	thehill.com	ulstermanbooks.com
revolutionaryds.wordpress.com	i2.cdn.turner.com	redstate.com	news.yahoo.com	mrconservative.com
iww.org	democraticunderground.com	desmoinesregister.com	independent.co.uk	en.shafaqna.com
en.squat.net	reachoutjobsearch.com	twinsopinion.com	zerohedge.com	theconservativepundit.net
havanatimes.org	dandygoat.com	imgflip.com	vox.com	electoral-vote.com
socialistworld.net	monmouth.edu	onpolitics.usatoday.com	thinkprogress.org	gospelherald.com
blackrosefed.org	store.berniesanders.com	bostonherald.com	foxnews.com	theendofzion.com
anti-imperialism.com	bernie2016events.org	lifenews.com	hosted.ap.org	langerresearch.com
thenib.com	electberniesanders2016.blogspot.com.br	publicpolicypolling.com	washingtontimes.com	desertpeace.files.wordpress.com
inter.kke.gr	n.pr	video.foxnews.com	vdare.com	indiafacts.co.in
ashevillefm.org	ontheissues.org	conservativereview.com	abcnews.go.com	iop.harvard.edu
leftvoice.org	2016.democracyforamerica.com	thefiscaltimes.com	slate.com	profootballtalk.nbcsports.com
indybay.org	pac.petitions.moveon.org	spectator.org	nationalreview.com	en.metapedia.org
blog.designs.codes	politickernj.com	media.townhall.com	usatoday.com	nbc.com
mer-rsm.ca	wikipedia-sucks-badly.blogspot.com	hughhewitt.com	nbcnews.com	writing.wikinut.com
fightimperialism.org	act.credoaction.com	wmur.com	bigstory.ap.org	talkoakland.org
isreview.org	gravismarketing.com	republicandojo.com	dailycaller.com	asadmahmudexposingislam.blogspot.ie
thenorthstar.info	kspr.com	politicsandfinance.blogspot.com	reason.com	fundrazr.com
pflp.ps	pplswar.wordpress.com	quinnipiac.edu	forbes.com	projectveritas.com
insurrectionnewsworldwide.wordpress.com	suffolk.edu	laprogressive.com	arstechnica.com	dailynewsservice.co.uk
ocap.ca	bet.com	us.reddit.com	washingtonexaminer.com	humansofjudaism.com
chuangcn.org	declass3.com	rushlimbaugh.com	americanthinker.com	vibe.com
existentialcomics.com	nationalnursesunited.org	steynonline.com	buzzfeed.com	aish.com
2016
jornada.unam.mx	thehill.com	tampabay.com	youtube.com	video.foxnews.com
libcom.org	commondreams.org	themoralofthestory.us	reddit.com	dailywire.com
marxists.org	inquisitr.com	therightscoop.com	i.sli.mg	westernjournalism.com
theanarchistlibrary.org	alternet.org	thepoliticalinsider.com	i.redd.it	redflagnews.com
rodong.rep.kp	democracynow.org	dcwhispers.com	independent.co.uk	ibankcoin.com
submedia.tv	usuncut.com	m.washingtontimes.com	i.imgur.com	refugeeresettlementwatch.wordpress.com
thenorthstar.info	truthdig.com	theresurgent.com	youtu.be	conservativedailypost.com
anarchistnews.org	celebritybabies.people.com	270towin.com	cnn.com	thetruthdivision.com
cpp.ph	elespectador.com	spectator.org	sli.mg	thedailysheeple.com
crimethinc.com	berniesanders.com	rushlimbaugh.com	foxnews.com	christianpost.com
communismgr.blogspot.gr	us.blastingnews.com	newsninja2012.com	np.reddit.com	savemysweden.com
youcaring.com	go.berniesanders.com	endingthefed.com	sputniknews.com	nowtheendbegins.com
earthfirstjournal.org	stylenews.people.com	youngcons.com	usatoday.com	thecommonsenseshow.com
insurrectionnewsworldwide.com	rightwingwatch.org	trumpimg.com	nbcnews.com	trump-conservative.com
links.org.au	rollcall.com	weaselzippers.us	archive.is	milo.yiannopoulos.net
existentialcomics.com	patheos.com	americafans.com	cbsnews.com	rense.com
en.granma.cu	m.dailykos.com	legitgov.org	nypost.com	christiantoday.com
social-ecology.org	scribd.com	nationalmemo.com	zerohedge.com	superstation95.com
cdn.thedailybeast.com	elections.huffingtonpost.com	americasfreedomfighters.com	bbc.com	articles.latimes.com
xenagoguevicene.com	pastemagazine.com	ijreview.com	dailycaller.com	newswithviews.com
bunkermag.org	reverbpress.com	natmonitor.com	rt.com	joeforamerica.com
cpusa.org	trofire.com	downstreampolitics.com	latimes.com	tammybruce.com
new-compass.net	greatideas.people.com	politistick.com	wikileaks.org	barstoolsports.com
ndfp.org	dailynewsbin.com	rightwingnews.com	hosted.ap.org	jookos.com
anarchism.pageabode.com	bluenationreview.com	fox6now.com	wsj.com	discoverthenetworks.org
chomsky.info	secure.actblue.com	wapo.st	yahoo.com	theburningplatform.com
prisoncensorship.info	sfchronicle.com	conservativeread.com	thegatewaypundit.com	americanfreepress.net
s1.webmshare.com	caucus99percent.com	gop.com	thedailybeast.com	beltwaytimes.com
anti-imperialism.com	opednews.com	100percentfedup.com	espn.com	supload.com
blackrosefed.org	cc.com	ace.mu.nu	washingtonexaminer.com	openmedianews.com
2017
libcom.org	fivethirtyeight.com	patriotpost.us	i.redd.it	theguardian.com
earthfirstjournal.org	mediaite.com	therightscoop.com	reddit.com	nytimes.com
itsgoingdown.org	politicususa.com	mynewsguru.com	youtube.com	foxnews.com
anti-imperialism.org	aol.com	theblacksphere.net	twitter.com	imgur.com
theanarchistlibrary.org	medpagetoday.com	conservativestories.com	miamiherald.com	washingtontimes.com
muraselon.com	thinkadvisor.com	ibleedredwhiteblue.com	reuters.com	nypost.com
crimethinc.com	justice.gov	pacificpundit.com	abcnews.go.com	breitbart.com
antifascistnews.net	esquire.com	arcamax.com	independent.co.uk	npr.org
pics.me.me	c-span.org	politichicks.com	cnn.com	i.reddituploads.com
socialistalternative.org	mic.com	theodysseyonline.com	i.imgur.com	i.magaimg.net
mronline.org	gq.com	trump-conservative.com	youtu.be	oann.com
wsm.ie	mediamatters.org	comicincorrect.wpengine.netdna-cdn.com	forbes.com	bangkokpost.com
internationalist.org	civilbeat.org	trumpmovements.com	nbcnews.com	rt.com
viewpointmag.com	billmoyers.com	toledoblade.com	hosted.ap.org	dailymail.co.uk
78.media.tumblr.com	georgiapol.com	lifeandabout.com	philly.com	israelnationalnews.com
philippinerevolution.info	amp.cnn.com	chicksontheright.com	telegraph.co.uk	iol.co.za
unicornriot.ninja	epa.gov	stream.org	washingtonexaminer.com	timesofisrael.com
dsausa.org	socialistworker.org	prageru.com	smh.com.au	thegatewaypundit.com
linksunten.indymedia.org	people.com	westernfreepress.com	archive.is	wnd.com
anarchistnews.org	sierraclub.org	patriotretort.com	dailycaller.com	newsweek.com
kuow.org	dailydot.com	thedailynewscycle.com	japantimes.co.jp	dailywire.com
monthlyreview.org	biologicaldiversity.org	hollywoodintoto.com	bloomberg.com	time.com
raddit.me	palmerreport.com	freshmedianews.com	sciencedaily.com	google.com
leftcom.org	bustle.com	thehayride.com	yahoo.com	lifezette.com
borderedbysilence.noblogs.org	extranewsfeed.com	lidblog.com	nationalreview.com	infowars.com
redguardsaustin.wordpress.com	thenevadaindependent.com	foxsports.com	facebook.com	nydailynews.com
redspark.nu	us.blastingnews.com	usainfonow.com	zerohedge.com	i.sli.mg
pics.mcclatchyinteractive.com	climatecentral.org	donsurber.blogspot.com	apnews.com	freebeacon.com
existentialcomics.com	wnyc.org	energy.gov	wsj.com	insider.foxnews.com
countercurrents.org	washingtonmonthly.com	floppingaces.net	pbs.twimg.com	vdare.com

8 Communities and subcommunities in subreddit networks↩︎

In this section, we provide an overview of the validated subreddit networks, starting with basic network statistics. Tables 9 and 10 report the structural properties of the user-based and domain-based subreddit networks (2013–2017), including their bipartite origins and the corresponding validated projections. In both cases, validation leads to higher modularity and lower density than the non-validated networks, consistently sharpening community structure across years (see also SI2).

To further characterize the identified communities, Figures 27 and 28 display radar plots summarizing their internal composition. Each polygon corresponds to one community, with axes representing topical categories. These visualizations highlight differences in purity across communities: while some clusters are dominated by a single topical label—such as the Far-Left community in the user-based networks—others exhibit a more heterogeneous composition, for instance the mixed clusters combining Democratic and Conservative subreddits together with Gun-rights or general Politics forums. The radar plots derived from user-based and domain-based networks show broadly similar patterns, reinforcing the robustness of the detected community structures.

Looking at specific cases, in the early years Democratic and Conservative subreddits often appear within the same community, although Democrats are sometimes closer to News-oriented forums, while Conservatives co-occur more frequently with subreddits on guns, libertarianism, or economics—a pattern especially clear in 2015. Starting from 2016 in the user-based networks, partisan blocs are more sharply defined: one community groups Democrats and several Conservative subreddits, including candidate-specific forums such as r/KasichForPresident and libertarian pages, while another cluster brings together Banned forums and Trump-related subreddits, most notably r/The_Donald, alongside extremist spaces such as r/NationalSocialism, r/WhiteRights, and European nationalist forums like r/Le_Pen. Among such cases, an illustrative hybrid is r/RepublicansForSanders, which consistently clusters with Democratic communities despite its partisan label, highlighting clear overlaps in user activity across partisan boundaries.

In the domain-based networks, Democrats and Conservatives remain within the same large community during the electoral years, albeit positioned at opposite ends of it; they eventually split into distinct clusters in 2017. Several subreddits, such as r/prolife, r/AltRightChristian, or r/climateskeptics, appear within Conservative or Banned communities across both user- and domain-based networks, underlining their bridging role between mainstream conservative and more extremist discourse.

following the approach outlined in the Results section, we applied the Louvain algorithm a second time to the induced subgraphs of each primary community in order to uncover fine-grained subcommunities for the echo-chamber analysis (see Results and SI8). Figure 39 highlights two representative cases from the user-based networks in 2016 (density = 0.139, average modularity = 0.44622 \(\pm\) 1.9\(\times\)10\(^{-4}\)) and 2017 (density = 0.180, average modularity = 0.35132 \(\pm\) 1.2\(\times\)10\(^{-4}\)) both characterized by high modularity and strong internal cohesion. These partitions reveal distinct Conservative/Banned and Far-Right/Banned components, as well as SocialJustice-related clusters. Notably, in both years the forum r/SargonOfAkkad emerges as a bridging node, consistently linking otherwise distant communities. These higher-resolution partitions illustrate the presence of well-defined substructures nested within broader partisan clusters.

Table 9: Network statistics for the bipartite subreddit–user networks and their subreddit projections (2013–2017).
	Bipartite (Subreddit–User)				Projection (unvalidated)			Projection (validated)
Year	Subr. nodes	User nodes	Links	Density	Nodes	Density	Avg. Modularity	Nodes	Density	Avg. Modularity
2013	214	387 691	562 139	7.47e-06	214	0.549	0.048950 (\(\pm\)9.2e-05)	167	0.101	0.37234 (\(\pm\)2.5e-04)
2014	246	308 382	480 322	1.01e-05	246	0.574	0.048520 (\(\pm\)7.0e-05)	203	0.0806	0.43829 (\(\pm\)2.7e-04)
2015	296	424 172	673 957	7.48e-06	296	0.681	0.043548 (\(\pm\)3.2e-05)	246	0.0780	0.45288 (\(\pm\)1.6e-04)
2016	413	888 444	1 941 907	4.92e-06	413	0.728	0.037757 (\(\pm\)3.9e-05)	363	0.0868	0.42752 (\(\pm\)3.7e-04)
2017	481	1 027 388	2 129 530	4.03e-06	479	0.696	0.043539 (\(\pm\)3.1e-05)	417	0.0788	0.41670 (\(\pm\)3.1e-04)

Table 10: Network statistics for the bipartite subreddit–domain networks and their subreddit projections (2013–2017).
	Bipartite (Subreddit–Domain)				Projection (unvalidated)			Projection (validated)
Year	Subr. nodes	Domain nodes	Links	Density	Nodes	Density	Avg. Modularity	Nodes	Density	Avg. Modularity
2013	216	68 809	122 284	5.13e-05	216	0.749	0.045021 (\(\pm\)7.4e-05)	170	0.0731	0.41534 (\(\pm\)5.2e-04)
2014	248	45 594	112 438	1.07e-04	248	0.738	0.051909 (\(\pm\)1.5e-05)	197	0.0711	0.47419 (\(\pm\)3.5e-04)
2015	284	45 443	128 874	1.23e-04	284	0.827	0.034889 (\(\pm\)2.9e-05)	229	0.0868	0.36036 (\(\pm\)3.2e-04)
2016	399	67 837	200 068	8.59e-05	399	0.786	0.041765 (\(\pm\)1.7e-05)	342	0.0806	0.35139 (\(\pm\)2.5e-04)
2017	461	65 877	200 643	9.12e-05	461	0.829	0.027294 (\(\pm\)5.8e-05)	350	0.0656	0.35983 (\(\pm\)2.8e-04)

Figure 27: Radar plots of validated community structures from user-based subreddit networks (2013–2017). Each polygon corresponds to a community, with axes representing topical categories. The plots illustrate the degree of topical purity or heterogeneity across communities..

Figure 28: Radar plots of validated community structures from domain-based subreddit networks (2013–2017). Each polygon corresponds to a community, with axes representing topical categories. Patterns broadly mirror those observed in user-based networks, confirming the robustness of the detected structures..

Figure 29: Validated community structure in the user-based subreddit network, 2013.

Figure 30: Validated community structure in the user-based subreddit network, 2014.

Figure 31: Validated community structure in the user-based subreddit network, 2015.

Figure 32: Validated community structure in the user-based subreddit network, 2016.

Figure 33: Validated community structure in the user-based subreddit network, 2017.

Figure 34: Validated community structure in the domain-based subreddit network, 2013.

Figure 35: Validated community structure in the domain-based subreddit network, 2014.

Figure 36: Validated community structure in the domain-based subreddit network, 2015.

Figure 37: Validated community structure in the domain-based subreddit network, 2016.

Figure 38: Validated community structure in the domain-based subreddit network, 2017.

9 Echo-chamber structures and statistical patterns↩︎

As detailed in the Results, we evaluated the alignment between subreddit communities identified in interaction-based and news-sharing networks by constructing overlap matrices, where each entry records the number of subreddits shared between a pair of partitions. Figures 40–44 display chord diagrams of community-overlap networks. Flows are proportional to the number of subreddits shared between partitions; source and target tiles are colored according to their communities, and each flow is shaded with the average of the two. Most flows closely match their endpoints, indicating that overlaps predominantly occur between communities with similar topical or partisan orientations.

To further investigate echo-chamber dynamics, we weighted community matches by the number of users active in both partitions, producing the edge-contribution (EC) matrices. These matrices quantify how much each pair of communities contributes to cross-participation, thus capturing echo chambers through user engagement patterns. We validated these matrices using the BiWCM null model [66] (see Methods in the main text), which filters out random noise and retains only statistically significant matches. Figures 45–49 report the validated overlap matrices together with the corresponding \(p\)-values. Each panel shows, on the left, the weighted bipartite network of interactions between user communities and domains, and on the right, the corresponding statistical validation through \(p\)-values obtained with the test.

Although community matches often emerge in correspondence with similar topical areas—even at very low \(p\)-values—few entries remain statistically significant once corrections for multiple hypothesis testing are applied using the false discovery rate (\(\alpha = 0.05\)).

This limitation points to a loss of resolution in the analysis. At the same time, many of the subgraphs identified through validated community matches display high modularity (see Fig. 39) and emerge as near-complete, fully connected components within the interaction-derived subreddit network. This combination of internal structure and external cohesion indicates that the large communities identified in the first partition are meaningful units. It also suggests that further resolution can uncover well-defined substructures nested within them. To address these resolution limits, we conducted a finer-grained study by applying community detection within the validated communities themselves, thereby uncovering subcommunities nested within the larger partisan clusters.

The results are shown in Figures 50–54, which present the analysis at the subcommunity level through adjacency matrices of statistically validated community matches. These figures also highlight the links that remain significant under FDR correction—corresponding to effective echo chambers—and indicate that most validated matches follow shared topical themes. In these plots, subcommunities have been reordered to maximize consistency across years, making topical correspondences clearer. For readability, multi-tag communities are abbreviated with their main components, and full names are provided in Tab. 11.

Table 12 complements this structural evidence by quantifying user participation in validated echo chambers. For each year, it reports the absolute number of users active in communities identified as echo chambers, their fraction relative to the total active users in the same year, and their distribution across partisan subcommunities. In particular, we highlight the share of EC users participating in Democratic/Conservative or Banned communities, expressed both in absolute counts and as fractions relative to all EC users of that year.

Figure 40: Chord diagram of community matches, 2013.

Figure 41: Chord diagram of community matches, 2014.

Figure 42: Chord diagram of community matches, 2015.

Figure 43: Chord diagram of community matches, 2016.

Figure 44: Chord diagram of community matches, 2017.

Figure 45: Echo chambers at the community level, 2013.

Figure 46: Echo chambers at the community level, 2014.

Figure 47: Echo chambers at the community level, 2015.

Figure 48: Echo chambers at the community level, 2016.

Figure 49: Echo chambers at the community level, 2017.

Figure 50: Echo chambers at the sub-community level, 2013.

Figure 51: Echo chambers at the sub-community level, 2014.

Figure 52: Echo chambers at the sub-community level, 2015.

Figure 53: Echo chambers at the sub-community level, 2016.

Figure 54: Echo chambers at the sub-community level, 2017.

Table 11: Full names of multi-tag communities abbreviated in the heatmaps of echo chambers at the sub-community level.Entries marked with an asterisk (*) correspond to the abbreviated forms displayed in the figures.
Year	Abbreviation	Full name
2013	Mixed Dem/Gun/Econ *	Dem/Gun/Econ/Lib/Politic
	Mixed Far-Left/Dem/Lib *	Far-Left/Dem/Lib/Politic
	Mixed Far-Left/News/Econ *	Far-Left/News/Econ/Eur/Politic
	Mixed Far-Left/PolTalk/Dem *	Far-Left/PolTalk/Dem/Ban/Politic
	Mixed Gun/Dem/News *	Gun/Dem/News/Politic
2014	Mixed Geop/News/Far-Left *	Geop/News/Far-Left/Econ/PolTalk/Dem/Lib/Cons/Gun/Far-Right/Ban/SocialJustice/UK/Eur/Canada/Austr
	Mixed Econ/Far-Left/Dem *	Econ/Far-Left/Dem/Politic
	Mixed Lib/Dem/Far-Left *	Lib/Dem/Far-Left/Econ
2015	Mixed Geop/News/Far-Left *	Geop/News/Far-Left/Econ/PolTalk/Dem/Lib/Cons/Gun/Far-Right/Ban/SocialJustice/UK/Eur/Canada/Austr
	Mixed Far-Left/SocialJustice/Econ *	Far-Left/SocialJustice/Econ/Politic
	Mixed SocialJustice/Far-Right/UK *	SocialJustice/Far-Right/UK/Politic
2016	Mixed Far-Left/News/Cons *	Far-Left/News/Cons/Ban/Politic
	Mixed Dem/Ban/News *	Dem/Ban/News/Far-Left/Cons/Politic
	Mixed Gun/Dem/News *	Gun/Dem/News/Lib
	Mixed Far-Right/Far-Left/Dem *	Far-Right/Far-Left/Dem/SocialJustice/Ban/Politic
	Mixed SocialJustice/News/Far-Left *	SocialJustice/News/Far-Left/PolTalk/Dem/Lib/Politic
2017	Mixed Geop/News/Far-Left *	Geop/News/Far-Left/Econ/PolTalk/Dem/Lib/Cons/Gun/Far-Right/Ban/SocialJustice/UK/Eur/Canada/Austr
	Mixed Dem/Cons/Geop *	Dem/Cons/Geop/Politic
	Mixed Far-Left/Dem/Cons *	Far-Left/Dem/Cons/Politic
	Mixed Far-Left/Cons/SocialJustice *	Far-Left/Cons/SocialJustice/Dem/Politic
	Mixed Lib/SocialJustice/Geop *	Lib/SocialJustice/Geop/News/Politic
	Mixed SocialJustice/Far-Left/Dem *	SocialJustice/Far-Left/Dem/News/Politic

Table 12: Validated users in echo chambers (EC). For each year, the table reports the total number of users active in validated echo chambers, their fraction relative to the total active users in the same year (Frac. Users / Tot.), as well as the number and fraction of EC users participating in Democratic/Conservative (Frac. Dem/Cons / EC) and Banned (Frac. Ban / EC) communities.
Year	Valid. Users	Frac. Users / Tot.	Dem/Cons Users in EC	Frac. Dem/Cons / EC	Ban Users in EC	Frac. Ban / EC
2013	76 864	0.1983	34 395	0.4475	2 564	0.0334
2014	61 429	0.1992	31 550	0.5136	2 256	0.0367
2015	152 571	0.3597	16 612	0.1089	4 852	0.0318
2016	247 804	0.2789	88 026	0.3552	62 465	0.2521
2017	348 533	0.3392	177 617	0.5096	40 336	0.1157

10 Insights on Democrats, Conservatives, and Banned communities↩︎

To further qualify the findings reported in the main text, we examined user activity within the Democratic, Conservative, and Banned–aligned communities. As shown in Fig. 55, panel (a) highlights a progressive increase in exclusive participation across partisan subreddits: Democratic commenters become increasingly concentrated in Democratic-labeled forums, and the same holds for Conservatives, while the fraction of users active in both spheres steadily decreases over time. Panel (b) shows that, despite this rise in partisan activity, the average score of comments in these communities declines sharply compared to 2013 levels, with decreases of \(93.4\%\) for Democratic subreddits, \(85.6\%\) for Conservative ones, and \(97.7\%\) for Banned forums. The decline is especially pronounced in candidate-specific subreddits: Sanders (\(99.9\%\)), Clinton (\(100.0\%\)), and Trump (\(98.6\%\) compared to 2015, when his forum first appeared), all exhibit substantially lower average scores than their corresponding partisan communities, underscoring the increasingly polarizing dynamics of electoral discussions.

Figure 56 complements this analysis by focusing on the mutual distances between Democratic, Conservative, and Banned communities in the statistically validated subreddit projections. Distances are reported both at the user level (left) and domain level (right), in analogy with Fig. 5 of the main text. Beyond the general trends, here we explicitly test whether yearly changes are statistically significant by comparing each distribution of distances with that of the previous year through Kolmogorov–Smirnov tests. The results highlight significant shifts especially during electoral years, indicating that partisan and banned communities became more clearly separated as elections approached.

The previous analyses were carried out at the level of subreddits, meaning that measures such as comment scores could not disentangle whether low evaluations arose from interactions between supporters and opponents engaging in cross-posting activity, or from exchanges occurring within the same partisan group. To address this limitation and move to a finer resolution, we conducted a targeted analysis of Democratic-, Conservative-, and Banned-labeled users, as introduced in the main text and partially illustrated in Fig. 5. Specifically, we filtered the original dataset of comments by assigning labels to users through propagation from subreddit to user, and then retaining only comments authored by users tagged as Democrats, Conservatives, or Banned. The size of the resulting dataset is reported in Tab. 13.

Within this filtered set, we further identified comments directed specifically toward posts and comments authored by members of each faction. Figures 57 and 58 report, respectively, the normalized number of comments exchanged between groups and the average reception (scores) of these interactions. The row-normalized comment matrices reveal how communication volumes evolved across factions, while the score heatmaps capture systematic asymmetries in the evaluation of partisan and banned contributions across the 2013–2017 period.

Overall, the analysis shows that cross-group interactions increased over time, even though the majority of comments still occurred within each partisan community. At the same time, while average scores declined across all groups, cross-group exchanges were evaluated even more negatively, a pattern that became especially pronounced between 2015 and 2017.

Figure 56: Average cosine distances between Democratic, Conservative, and Banned subreddits in the statistically validated projections. The left panel reports distances computed at the user level, while the right panel reports distances at the domain level. Shaded areas denote standard errors. Significance markers indicate the results of Kolmogorov–Smirnov tests, showing when distance distributions significantly diverge from those of the previous year, with marked effects during electoral periods.

15pt

Table 13: Number of comments in the filtered dataset authored by users labeled as *Democrats*, *Conservatives*, and *Banned* between 2013 and 2017. Labels were assigned through label propagation from subreddits to users, and the total dataset was filtered according to these tags.
Year	Democrats	Conservatives	Banned
2013	208 327	645 817	43 402
2014	219 528	493 419	40 672
2015	1 558 935	400 348	148 262
2016	9 326 035	6 946 677	8 095 653
2017	9 947 710	3 481 146	11 511 021

11 Robustness to temporal resolution↩︎

To probe the robustness of our findings at finer temporal scales, we partitioned the dataset into three four–month windows per year: \(y_1\) (January–April), \(y_2\) (May–August), and \(y_3\) (September–December). The third window systematically captures the electoral cycles, including campaigning, election day, and immediate aftermath for both the 2014 midterms and the 2016 presidential election.

As shown in Fig. 59, the communities identified in four–month windows largely mirror those found in the annual analysis, with clusters emerging around the same topics and political orientations. In particular, we observe the formation of large opposing communities, such as Democratic- and Conservative-aligned clusters, especially pronounced during electoral periods, as well as communities dominated by banned and far-right subreddits. Some differences arise due to the finer resolution: for instance, in the early years Democratic-aligned subreddits appear less distinct and are often embedded in larger clusters dominated by other labels, such as News. Around electoral periods, however, the polarization between opposing communities becomes sharper.

Nevertheless, polarization indices and user label compositions (Fig. 60) confirm that the core polarization and echo-chamber patterns are consistent with those observed in the annual analysis, while also revealing finer-grained shifts in the alignment of banned users across electoral periods. In particular, the 2016 windows reveal a decrease in overall polarization but a stronger association of banned users with Conservative subreddits. Echo-chamber analysis further replicates the alignment patterns observed in the annual setting.

Finally, inter-group distance measures (Fig. 61) reinforce these findings. Mean distances across validated networks show consistent trends with the annual analysis: Democratic–Banned distances remain the largest, decreasing slightly before elections, while Conservative and Banned communities remain close. Democratic–Conservative distances increase during electoral periods, with statistically significant shifts observed in selected four-month windows including 2014_3, and 2016_3. By contrast, peaks in Conservative–Banned distances at the end of 2015 and 2016 do not appear to correspond to particularly significant changes in the distance distributions relative to the rest of the network.

Figure 59: Temporal community flows in interaction-based networks. Flow diagrams illustrate the evolution of statistically validated subreddit communities at four-month resolution (y_1, y_2, y_3). Arrows indicate transitions of subreddits between communities across consecutive windows. Overall patterns broadly replicate the annual analysis. — Figure 59: Temporal community flows in interaction-based networks. Flow diagrams illustrate the evolution of statistically validated subreddit communities at four-month resolution (\(y_1\), \(y_2\), \(y_3\)). Arrows indicate transitions of subreddits between communities across consecutive windows. Overall patterns broadly replicate the annual analysis.

Figure 60: Polarization (a) and echo-chamber (b) analysis at finer resolution. (a) Donut charts showing user composition (Democratic, Conservative, Banned) from polarization matrices in the 2014_3 and 2016_3 windows. The 2016 window shows reduced overall polarization, yet a sharper alignment of banned users with Conservative subreddits. (b) Echo-chamber analysis for 2016_3, based on a weighted bipartite projection between subreddit communities from validated user-subreddit and subreddit-domain networks. Edge weights represent user overlap; BiWCM validation highlights significant alignments. Patterns are consistent with those observed in the annual analysis..

Figure 61: Inter-group distances in interaction-based networks. Mean distances across validated networks confirm the annual-level findings. Panels compare: (a) annual vs. (b) four-month averages, (c–d) distances between Democratic and Conservative communities, and (e) all pairwise distances (Democratic, Conservative, Banned). As in the annual analysis, Democratic–Banned distances remain the largest, decreasing slightly before elections, while Conservative and Banned communities remain close. Democratic–Conservative distances increase during electoral periods. Statistical tests highlight significant differences in selected windows, including the 2014 midterms (2014_3) and the 2016 presidential elections (2016_3)..

12 GPT TAG-assignment validation↩︎

We manually annotated subreddit tags by extending those in the Politosphere dataset and validated them using GPT-4 Turbo [67]. Initially, we prompted GPT-4 with the full list of subreddit names to suggest a minimal set of representative categories, and its outputs closely matched our manual scheme. Next, each subreddit—along with its public description—was submitted to GPT-4 to assign one or more of the chosen tags, allowing mixed labels. The resulting tag distribution was then refined by overriding assignments based on external metadata: subreddits known to have been prohibited were tagged “Banned,” and those with Democratic or Republican metadata were labeled accordingly. Tag scores were normalized per subreddit to account for multiple labels.

We observed an overrepresentation of the general “Politics” category, prompting a secondary reclassification in which subreddits labeled exclusively as “Banned” or “Politics” were re-evaluated and relabeled. Performance was then assessed through two complementary measures. At the subreddit level, we computed weighted recall-like and precision-like scores, which quantify, respectively, the fraction of true tags correctly recovered and the fraction of predicted tags that match the manual annotation, with weights ensuring that partially correct multi-label cases contributed proportionally. At the aggregate level, we measured the overall distributional agreement between manual and GPT labels using the coefficient of determination (\(R^2\)), which captures how well GPT reproduces the variance of the real distribution. Under this refinement, weighted precision and recall increased from 0.61 and 0.75 to 0.73 and 0.81, while \(R^2\) improved markedly from 0.51 to 0.85.

In both classification rounds, despite a slight mismatch in the distribution of Democrats before the elections and of Far Right afterwards, the overall community labels and their population structure remain highly consistent with the patterns reported in the main text (Fig. 2 and Fig. 3). This consistency is further illustrated in Fig. 62, where panels (a)–(c) compare hand-labeled and GPT-corrected tag distributions, show donut charts of 2016 polarization at the user level, and trace the yearly formation of subreddit communities.

Figure 62: Comparison of GPT-corrected classification and community analysis. In panel (a) the distribution of hand-labeled and GPT-corrected tags is compared, (b) shows donut charts of 2016 polarization illustrating the composition of selected topic communities at the user level within validated annual subreddit networks, and (c) presents flowcharts tracing the yearly formation of subreddit communities within the same networks..

References↩︎

[1]

Habermas, J.The Structural Transformation of the Public Sphere: An Inquiry Into a Category of Bourgeois Society. Studies in contemporary German social thought (Polity Press, 1989).

[2]

Sylvester, D. E.&McGlynn, A. J.The digital divide, political participation, and place. 28, 64–74, /0894439309335148(2009).

[3]

Bail, C.Breaking the Social Media Prism: How to Make Our Platforms Less Polarizing(Princeton University Press, 2021).

[4]

Vargo, C. J., Guo, L., McCombs, M.&Shaw, D. L.Network issue agendas on twitter during the 2012 u.s. presidential election: Network issue agendas on twitter. 64, 296–316, /jcom.12089(2014).

[5]

Han, B.-C.Infokratie: Digitalisierung und Krise der Demokratie(Matthes & Seitz, Berlin, Deutschland, 2021).

[6]

Pentland, A.Social Physics: How Good Ideas Spread—The Lessons from a New Science(The Penguin Press, New York, NY, 2014).

[7]

Garrett, R. K.Echo chambers online?: Politically motivated selective exposure among internet news users. 14, 265–285, /J.1083-6101.2009.01440.X(2009).

[8]

Del Vicario, M.et al.Echo chambers: Emotional contagion and group polarization on facebook. /srep37825(2016).

[9]

Zollo, F.et al.Debunking in a world of tribes. 7:e0181821(2017).

[10]

Cossard, A.et al.Falling into the echo chamber: The italian vaccination debate on twitter. 14, 130–140, /icwsm.v14i1.7285(2020).

[11]

Cinelli, M., Morales, G. D. F., Galeazzi, A., Quattrociocchi, W.&Starnini, M.The echo chamber effect on social media. 118, e2023301118, /pnas.2023301118(2021). .

[12]

Pratelli, M., Saracco, F.&Petrocchi, M.Entropy-based detection of twitter echo chambers. 3, /pnasnexus/pgae177(2024).

[13]

Budak, C., Nyhan, B., Rothschild, D. M., Thorson, E.&Watts, D. J.Misunderstanding the harms of online misinformation. 630, 45–53, /s41586-024-07417-w(2024).

[14]

Falkenberg, M.et al.Towards global equity in political polarization research, /ARXIV.2504.11090(2025).

[15]

De Francisci Morales, G., Monti, C.&Starnini, M.No echo in the chambers of political interactions on reddit. 11, 2818, /s41598-021-81531-x(2021).

[16]

Monti, C., D’Ignazi, J., Starnini, M.&De Francisci Morales, G.Evidence of demographic rather than ideological segregation in news discussion on reddit. In Proceedings of the ACM Web Conference 2023, WWW ’23, 2777–2786, /3543507.3583468(ACM, 2023).

[17]

Morini, V., Pollacci, L.&Rossetti, G.Toward a standard approach for echo chamber detection: Reddit case study. 11, /app11125390(2021).

[18]

Colacrai, E., Cinus, F., Morales, G. D. F.&Starnini, M.Navigating multidimensional ideologies with reddit’s political compass: Economic conflict and social affinity. abs/2401.13656(2024).

[19]

Reddit Inc.Reddit content policy. https://www.redditinc.com/policies/content-policy(2024). Accessed: 2024-09-24.

[20]

De Clerck, B., Rocha, L. E.&Van Utterbeeck, F.Maximum entropy networks for large scale social network node analysis. 7, 68(2022).

[21]

Cimini, G.et al.The statistical physics of real-world networks. 1, 58–71, /s42254-018-0002-6(2019).

[22]

Gualdi, S., Cimini, G., Primicerio, K., Clemente, R. D.&Challet, D.Statistically validated network of portfolio overlaps and systemic risk. 6, 1–14, /srep39467(2016).

[23]

Saracco, F.et al.Inferring monopartite projections of bipartite networks: an entropy-based approach. 19, 053022, /1367-2630/aa6b38(2017).

[24]

Pugliese, E.et al.Unfolding the innovation system for the development of countries: Coevolution of science, technology and production. 9, 16440, /s41598-019-52767-5(2019).

[25]

Cimini, G., Carra, A., Didomenicantonio, L.&Zaccaria, A.Meta-validation of bipartite network projections. 5, 76, /s42005-022-00856-9(2022).

[26]

Bruno, M., Mazzilli, D., Patelli, A., Squartini, T.&Saracco, F.Inferring comparative advantage via entropy maximization. 4, 045011, /2632-072x/ad1411(2023).

[27]

Fessina, M., Zaccaria, A., Cimini, G.&Squartini, T.Pattern-detection in the global automotive industry: A manufacturer-supplier-product network analysis. 181, 114630, ://doi.org/10.1016/j.chaos.2024.114630(2024).

[28]

Park, D.&Park, J.Evolution of sample-based music authorship network. 14, /epjds/s13688-025-00524-2(2025).

[29]

Becatti, C., Caldarelli, G., Lambiotte, R.&Saracco, F.Extracting significant signal of news consumption from social networks: the case of twitter in italian political elections. 5, 91, /s41599-019-0300-3(2019).

[30]

Caldarelli, G., Nicola, R. D., Vigna, F. D., Petrocchi, M.&Saracco, F.The role of bot squads in the political propaganda on twitter. 3, 1–15, /s42005-020-0340-4(2020).

[31]

Radicioni, T., Saracco, F., Pavan, E.&Squartini, T.Analysing twitter semantic networks: the case of 2018 italian elections. 11, 13207, /s41598-021-92337-2(2021).

[32]

Mattei, M., Pratelli, M., Caldarelli, G., Petrocchi, M.&Saracco, F.Bow-tie structures of twitter discursive communities. 12, 12944, /s41598-022-16603-7(2022).

[33]

Guarino, S., Pierri, F., Di Giovanni, M.&Celestini, A.Information disorders during the covid-19 infodemic: The case of italian facebook. 22, 100124, ://doi.org/10.1016/j.osnem.2021.100124(2021).

[34]

Hofmann, V., Schütze, H.&Pierrehumbert, J. B.The reddit politosphere: A large-scale text and network resource of online political discourse. 16, 1259–1267, /icwsm.v16i1.19377(2022).

[35]

Baumgartner, J., Zannettou, S., Keegan, B., Squire, M.&Blackburn, J.The pushshift reddit dataset. https://arxiv.org/abs/2001.08435(2020). .

[36]

Blondel, V. D., Guillaume, J.-L., Lambiotte, R.&Lefebvre, E.Fast unfolding of communities in large networks. 2008, P10008, /1742-5468/2008/10/P10008(2008).

[37]

Rosvall, M.&Bergstrom, C. T.Maps of random walks on complex networks reveal community structure. 105, 1118–1123(2008).

[38]

Peixoto, T. P.Hierarchical block structures and high-resolution model selection in large networks. 4, 011047(2014).

[39]

Falkenberg, M., Zollo, F., Quattrociocchi, W., Pfeffer, J.&Baronchelli, A.Patterns of partisan toxicity and engagement reveal the common structure of online political communication across countries. 15, /s41467-024-53868-0(2024).

[40]

Subtirelu, N.Donald trump supporters and the denial of racism: An analysis of online discourse in a pro-trump community. In Simon, A.&Kerkhoff, S.(eds.) Language Aggression in Public Debates on Immigration, 151–174, /bct.102.08sub(John Benjamins, 2019).

[41]

Desiderio, A., Mancini, A., Cimini, G.&Di Clemente, R.Highly engaging events reveal semantic and temporal compression in online community discourse. 4, pgaf056, /pnasnexus/pgaf056(2025).

[42]

Newsroom, W.Bernie sanders tells berklee students to create political art(2016). Published November 21, 2016.

[43]

Krugman, P.The economics of left-behind regions(2024).

[44]

Robertson, R. E.et al.Users choose to engage with more partisan news than they are exposed to on google search. 618, 342–348, /s41586-023-06078-5(2023).

[45]

Augenstein, I.et al.Factuality challenges in the era of large language models and opportunities for fact-checking. 6, 852–863, /s42256-024-00881-z(2024).

[46]

Cirulli, D., Cimini, G.&Palermo, G.How large language models play humans in online conversations: a simulated study of the 2016 us politics on reddit(2025). Preprint, .

[47]

Amadori, E.et al.Involvement drives complexity of language in online debates(2025). .

[48]

Bond, R. M.&Garrett, R. K.Engagement with fact-checked posts on reddit. 2, /pnasnexus/pgad018(2023).

[49]

Chandrasekharan, E.et al.You can’t stay here: The efficacy of reddit’s 2015 ban examined through hate speech. 1, 1–22, /3134666(2017).

[50]

Balassa, B.Trade liberalization and revealed comparative advantage. 33, 99–123, /j.1467-9957.1965.tb00050.x(1965).

[51]

Saracco, F., Clemente, R. D., Gabrielli, A.&Squartini, T.Randomizing bipartite networks: the case of the world trade web. 5, 10595, /srep10595(2015).

[52]

Benjamini, Y.&Hochberg, Y.Controlling the false discovery rate: A practical and powerful approach to multiple testing. 57, 289–300(1995). Full publication date: 1995.

[53]

Pedregosa, F.et al.Scikit-learn: Machine learning in Python. 12, 2825–2830(2011).

[54]

Vallarano, N.et al.Fast and scalable likelihood maximization for exponential random graph models with local constraints. 11, 15227, /s41598-021-93830-4(2021).

[55]

pandas development team, T.pandas-dev/pandas: Pandas, /zenodo.3509134(2020).

[56]

Harris, C. R.et al.Array programming with NumPy. 585, 357–362, /s41586-020-2649-2(2020).

[57]

Virtanen, P.et al.Scipy 1.0: fundamental algorithms for scientific computing in python. 17, 261–272, /s41592-019-0686-2(2020).

[58]

Loper, E.&Bird, S.Nltk: The natural language toolkit(2002). .

[59]

Hagberg, A. A., Schult, D. A.&Swart, P. J.Exploring network structure, dynamics, and function using networkx. In Varoquaux, G., Vaught, T.&Millman, J.(eds.) Proceedings of the 7th Python in Science Conference, 11 – 15(Pasadena, CA USA, 2008).

[60]

Csardi, G.&Nepusz, T.The igraph software package for complex network research. Complex Systems, 1695(2006).

[61]

Peixoto, T. P.The graph-tool python library(2014). https://graph-tool.skewed.de.

[62]

Bastian, M., Heymann, S.&Jacomy, M.Gephi: An open source software for exploring and manipulating networks(2009).

[63]

Hunter, J. D.Matplotlib: A 2d graphics environment. 9, 90–95, /MCSE.2007.55(2007).

[64]

van der Laken, P. P.d3blocks: Python library for creating interactive visualizations using d3.js. https://github.com/d3blocks/d3blocks(2022). Accessed: 2024-05-04.

[65]

Inc., P. T.Plotly: Collaborative data science. https://plotly.com/python/(2015). Accessed: 2024-05-04.

[66]

Buffa, L.et al.Maximum entropy modeling of optimal transport: the sub-optimality regime and the transition from dense to sparse networks(2025). .

[67]

OpenAI. Gpt-4 technical report. https://openai.com/research/gpt-4(2023). Accessed via OpenAI API (https://platform.openai.com).

[68]

SimilarWeb Ltd.Similarweb. https://www.similarweb.com(2025). Accessed: 2025-07-08.

[69]

Shannon, C. E.A mathematical theory of communication. 27, 379–423, /j.1538-7305.1948.tb01338.x(1948).

[70]

Vinh, N. X., Epps, J.&Bailey, J.Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. 11, 2837–2854(2010).

[71]

Meilă, M.Comparing clusterings—an information based distance. 98, 873–895, ://doi.org/10.1016/j.jmva.2006.11.013(2007).

[72]

Rosvall, M., Axelsson, D.&Bergstrom, C. T.The map equation. 178, 13–23, /epjst/e2010-01179-1(2009).

Corresponding author: giulio.cimini@roma2.infn.it↩︎
Corresponding author: fabio.saracco@cref.it↩︎

Polarization and echo chambers in Reddit’s political discourse

Introduction↩︎

Results↩︎

User-interaction network: Communities of subreddits with similar user bases↩︎

Polarization, banned users and cross interactions↩︎

Information Ecosystem and Echo Chambers↩︎

Focus on Democratic, Conservative, and Banned Groups↩︎

Discussions↩︎

Methods↩︎

Dataset Description↩︎

Network validation↩︎

Polarization index↩︎

Text Preprocessing and Similarity Analysis↩︎

Analysis and validation of Echo Chambers↩︎

Network distance between communities↩︎

Declarations↩︎

1 Tag distribution and dataset statistics↩︎

2 Effectiveness and advantages of statistical validation↩︎

3 Community detection algorithm comparison↩︎

4 Polarization and user labels in interaction-based partitions↩︎

5 Impact of tag removal on the polarization index↩︎

6 Textual patterns and shifts in similarity↩︎

8 Communities and subcommunities in subreddit networks↩︎

9 Echo-chamber structures and statistical patterns↩︎

10 Insights on Democrats, Conservatives, and Banned communities↩︎

11 Robustness to temporal resolution↩︎

12 GPT TAG-assignment validation↩︎

References↩︎

Subjects

Updated on Academus

Polarization and echo chambers in Reddit’s political discourse

Introduction↩︎

Results↩︎

User-interaction network: Communities of subreddits with similar user bases↩︎

Polarization, banned users and cross interactions↩︎

Information Ecosystem and Echo Chambers↩︎

Focus on Democratic, Conservative, and Banned Groups↩︎

Discussions↩︎

Methods↩︎

Dataset Description↩︎

Network validation↩︎

Polarization index↩︎

Text Preprocessing and Similarity Analysis↩︎

Analysis and validation of Echo Chambers↩︎

Network distance between communities↩︎

Declarations↩︎

1 Tag distribution and dataset statistics↩︎

2 Effectiveness and advantages of statistical validation↩︎

3 Community detection algorithm comparison↩︎

4 Polarization and user labels in interaction-based partitions↩︎

5 Impact of tag removal on the polarization index↩︎

6 Textual patterns and shifts in similarity↩︎

7 Domain sharing across subreddits↩︎

8 Communities and subcommunities in subreddit networks↩︎

9 Echo-chamber structures and statistical patterns↩︎

10 Insights on Democrats, Conservatives, and Banned communities↩︎

11 Robustness to temporal resolution↩︎

12 GPT TAG-assignment validation↩︎

References↩︎

Subjects

Updated on Academus