The SALURBAL Project. The Salud Urbana en América Latina (SALURBAL) project is a novel international partnership for actionable evidence on urban health in Latin America 7. While formally funded in 2017, this partnership has been evolving since 2015 as a network of researchers and policymakers across different countries who created the Urban Health Network for Latin America and the Caribbean (LAC-Urban Health)7. SALURBAL has a governance and operating structure that facilitates this collaboration, described in more detail elsewhere 7 (see supplementary information, Figures S1 and S2). The project team was initially convened by a small but geographically diverse group of researchers who sought to create a diverse and interdisciplinary team 31. The project has contributed a range of outputs including academic and policymaker workshops, forums, academic papers, and policy briefs directly relevant to the research and policy landscape in Latin America (Supplementary information, Table S1). SALURBAL has analyzed and harmonized data relevant to urban health across a range of sources for all cities of 100,000 residents or more across 11 countries (n = 371) and, as of the beginning of the project until the time of this study (May 2017 to August 2020) 49 papers had been published.
Data collection and study variables. Data collection for this study included two phases. First, we harmonized multiple sources of data from records administered by the SALURBAL Project coordinator regarding the project’s collaboration activities from May 2017 to August 2020. The data spanned multiple SALURBAL activities including 126 research proposals, three academic training workshops, monthly meetings of three Project Cores (the Data and Methods Core, the Built and Physical Environment Core, and the Social Environment Core) and 12 working groups, seven biannual project meetings, three GMB workshops, 49 papers approved or published, two policy symposia, and one Knowledge-to-Policy Forum. Second, we obtained individual-level information across seven attributes (country, city, discipline, research topic, sector, career stage, gender) from project registration forms and the SALURBAL directory. When the relevant information was not reported, or the information provided did not fit our classification of attributes (see Table S2 in the Supplement), we searched publicly available online records, including professional websites with biographical descriptions, published documents and workplace-specific websites. In cases where a participant worked across multiple sectors or disciplines, we assigned them to the sector or discipline in which they worked most of the time. We verified all information collected about participants with a member of the SALURBAL Project executive committee and a key team member who were familiar with participants. Country and city were defined according to the participants main work location (Supplementary information, Table S2). Discipline was defined according to the characteristics outlined by Krishnan32 (Supplementary information, Table S2). Research topic was defined as a subcategory of discipline, with categories outlined in Table S2 in the supplementary information document33. The sector was defined as public and government, academia, civic society, private sector, or intersectoral according to the sector of their primary place of employment. Career stage was defined according to the participants’ position title at their primary place of employment and their academic qualification (i.e., junior or senior, where junior was defined as student or individual with less than five years of experience). Gender was inferred based on available biographical information (women and men).
Data analysis. We characterized the SALURBAL collaboration network using a temporal multilayer network approach. In this network, each layer represented a project activity type while the temporal window was defined using a Time Windows in Networks algorithm34. Once defined, the same temporal window was used for all measures. Then, we assessed the SALURBAL network cohesion over time using four structural property measures of the network, including density, average clustering coefficient, and average shortest path. To characterize diversity, we first created a measure of diversity within layers. Then, we validated the diversity scores using a configuration model to ensure that the scores were significantly higher than would otherwise be expected by chance. Second, we measured the diversity between layers using an adapted measure of diversity in multiplex networks by Carpi et al35. We also assessed the contribution of different SALURBAL activities to network diversity between activities using a layer-reduction method. Third, we assessed network diversity within and between the SALURBAL activities over time. We conducted multiple linear regression to determine what project activities were associated with more diversity. Lastly, we used a Louvain community detection algorithm to identify collaboration communities within each temporal window.
Characterization of the SALURBAL collaboration network. We characterized the evolving network of the SALURBAL Project participants and their collaborations from May 2017 to August 2020, using a temporal multilayer network approach. We define participants as individuals engaged in at least one SALURBAL Project activity during the studied time period. The project activities are grouped using a multiplex network structure divided into six undirected layers. Each layer comprised all the SALURBAL Project participants (nodes) engaged in the corresponding activity. Each node was characterized by seven attributes (country, city, discipline, research topic, sector, career stage, gender). The edges (or connections) between nodes represent collaborations between participants. Each edge had a weight that represents the number of collaborations between a pair of connected nodes in that particular layer.
We used a temporal network approach to evaluate the project’s diversity and cohesion over time. To define the appropriate temporal window size \(w\), we aggregated our multiplex network into one layer. Then, we implemented a Time Windows in Networks (TWIN) algorithm that optimizes the trade-off between the noise and information contained in the data34. We divided the timespan of the project i.e., 40 months (May 2017 to August 2020), into different temporal windows \(w\). For each temporal windows \(w\), we calculated a range of network measures to analyze the network’s structural properties and the effect of the window size. These measures included: the number of connected components, the diameter which is the longest of all the shortest distance between all the possible nodes pairs, the average shortest path which is the average number of steps along the shortest path for all possible collaborations between participants, the radius which is the minimum of all the shortest distance between all the possible nodes pairs, the size of the largest clique which is defined as the maximum number of participants in a group when each member is connected to each of the others and the giant connected component which is the size of the largest connected component 15. We also calculated the diversity within layers described above for each attribute to evaluate the diversity of SALURBAL participants’ collaborations with respect to their attributes. Using these measures, we constructed the statistical time series\({F}_{w}\) for each metric and window size. Then, we measured the noise by calculating the variance\({ V(F}_{w}),\)and estimated the loss of information by calculating the compression ratio\(R\left({F}_{w}\right)\). We defined the optimal temporal window \(w\) that minimizes the absolute difference between \({V(F}_{w})\) and \({ R(F}_{w})\) for each network measure. Finally, we averaged the optimal window of all the measures in order to choose the window size with the largest changes in the diversity and network structure34.
Measures
Diversity within layers (project activities). Let \({DW}_{l}^{o}\) be the diversity of each activity \(l\) of the project, for each attribute \(o\in O:\{discipline,domain,sector,gender,seniority,country,city\}\) of SALURBAL participant\(i\). First, we evaluated the diversity of each node individually. This was done using a two-part process: 1) The intra-nodal process, which examines the dyads (direct connections) of the node \(i\), and 2) The inter-nodal process, in which we defined subgraphs representing the working groups in which node \(i\) participates, and we examined all possible connections of the nodes belonging to each subgraph. These two measures were multiplied by \(\beta\) and\(\left(1-\beta \right)\), respectively. As seen below:
$${DW}_{i}^{o,l}=\beta *\left( \frac{1}{{|v}_{i}^{l}|} \sum _{j\in {v}_{i}^{l}}\left(1- {\delta \left({x}_{i,}{x}_{j}\right)}^{o,l}\right) \right)+(1-\beta ){*}\left( \frac{1}{{|v}_{i}^{l}|\left(\left|{v}_{i}^{l}\right|-1\right)}\sum _{j,k \in {v}_{i}^{l}}\left(1- {\delta \left({x}_{j,}{x}_{k}\right)}^{o,l}\right) \right)$$
Where, \({v}_{i}\) is the neighborhood of node \(i,{v}_{i}=\{i,j,\dots ,n\}\), \({x}_{i}\), \({x}_{j}\) and \({x}_{k}\) are the attribute values according to the outcome \(o\) of the node \(i,j,\) and \(k,\) respectively. \({\delta \left({x}_{i,}{x}_{j}\right)}^{o,l}\) is a Kronecker delta that is equal to one if \({x}_{i}={x}_{j}\) or zero, otherwise.
Second, we calculated the diversity \({DW}_{l}^{o}\)of each layer \(l\), as the average of \({DW}_{i}^{o,l}\) over all the nodes. The result ranges from 0 to 1; the closer the diversity score is to 1, the higher the diversity of the node \(i\) or the layer \(l\) of the SALURBAL network.
Lastly, we constructed a configuration model which is a random graph that preserves the degree sequence of participants' attributes within the SALURBAL network 16. The nodes are randomly connected while ensuring that every attribute category in the network maintains the degree observed in the SALURBAL network, with a tolerance of 10%. We simulated a total of 1000 random graphs and calculated the diversity within layers for the successful simulations. Next, we calculated the ratio of the SALURBAL diversity to the simulated diversity (i.e., diversity observed in the SALURBAL network divided by the diversity observed across the simulated random graphs) and performed a Fischer p-value test for non-symmetrical distributions. A p-value lower than 0.1 was interpreted as greater diversity than would be expected by chance.
Diversitybetweenlayers (project activities). While the diversity within layers allows us to quantify the diverse collaboration among the project participants in each project’s activity, it does not provide information on whether the project structure (i.e. project activities) is creating new diverse collaborations. We assessed the extent to which the diversity in collaborations differs between SALURBAL activities as characterized within the multiplex network. Each layer (i.e., project activity) is compared to the others, to assess the differences between the connectivity path (i.e., the differences in how a participant collaborates in one activity with some participants and in another activity with different participants than the first ones) in a way that allows us to evaluate the contribution of each activity in creating new and diverse collaborations. First, we calculated for each node \(i\) the Node Distance Distribution \({N}_{i}^{\overline{p}}\) that specifies in a probabilistic manner the shortest path distance between node \(i\)and all other nodes connecting to it in the same layer \(\overline{p}\). The Transition Matrix \({T}_{i}^{\overline{p}}\)is the probability that node \(i\) in layer \(\overline{p}\) is reached in one step by a random walker35. Second, using these distributions, we calculated the node difference, that quantifies the differences of the connectivity paths of node \(i\) in layers \(\overline{p }\)and \(\overline{q}\)35, with the following equation:
$${D}_{i}\left(\overline{p},\overline{q}\right)=min\frac{\sqrt{J\left({N}_{i}^{\overline{p}},{N}_{i}^{\overline{q}}\right)}+\sqrt{J\left({T}_{i}^{\overline{p}},{T}_{i}^{\overline{q}}\right)}}{2\sqrt{log\left(2\right)}} \left(2\right)$$
Where, \(J\) is the Jensen-Shannon (JS) divergence that measures the distance between two probability distributions. Third, we calculated the difference between layers as the average of \({D}_{i}\left(\overline{p},\overline{q}\right)\) over all the nodes. With this definition,\({D}_{i}\left(\overline{p},\overline{q}\right)=0\) indicates that layers \(p\) and \(q\) are identical, while\({D}_{i}\left(\overline{p},\overline{q}\right)=1\) indicates that one of the layers is fully connected, while the other is totally disconnected35.
Finally, we defined global diversity recursively as \(U\left(S\right)=\underset{{\overline{s}}_{i}ϵS}{max}\{U\left(S{s}_{i}\right)+D\left(\overline{{s}_{i}},S/\overline{{s}_{i}}\right)\}\) for all \(S\in \stackrel{\prime }{S}\) with \(\left|S\right|\ge 2\) where \(\left|S\right|\) represents the cardinality of the set of layers35. This layer reduction method is realized through dynamic programming, which can be explained in two steps: first, we selected the smallest \(LD\left(\overline{p},\overline{q}\right)\) value and add it to the global diversity\(U\left(S\right)\). Second, we removed the LD combinations which contribute least to system diversity \(U\left(S\right)\).
To calculate the diversity of the participant’s attributes, we modified the adjacency matrix of the network \({A\left(i,j\right)}^{o,\overline{p}}=1-{\delta \left({x}_{i},{x}_{j}\right)}^{o,\overline{p}}\), only considering the diverse collaborations for the attribute \(o\) in the layer \(l\).
$$\text{D}\text{B}=\frac{2*U\left(S\right)}{\left|S\right|-1} \left(3\right)$$
To compare the diversity within layers and between layers over time, we standardized the results of the diversity between layers. First, we standardized the results by dividing by the number of layers because some activities were not performed in all the time windows. Second, to ensure the two measurements have the same objective maximization, we standardized the diversity between layers again, since a balance between new and maintained collaborations is desired to ensure long-term sustainability and growth of the SALURBAL network (Eq. 3). With this standardization, the diversity between layers ranges from zero to two. A value close to one for a given attribute indicates that around half the diverse collaborations are new and unique, while the other half are the same across layers (i.e., the same people collaborate in two or more project activities).
Assessing the diversity of SALURBAL network over time. With the optimal temporal window defined, we calculated the diversity within and between layers for each temporal window. Then, we analyzed the results of the diversity between layers and within layers by conducting a multiple linear regression to determine what project activities are associated with greater network diversity. The dependent variable is the average of the diversity within and between layers, and the independent variables are binary variables for each project activity, that is equal to one if the activity was realized in that time window or zero, otherwise. Lastly, we identified the communities (group of individuals that are strongly connected) for each temporal window using the Louvain community detection algorithm 16. This algorithm detects the existence of clusters by optimizing the modularity for each community. The modularity quantifies the strength of a community by comparing the actual density of edges in a subgraph to the density one would expect to have in the subgraph if the vertices of the graph were attached regardless of community structure 16. For each community, we formed a subgraph of the aggregated network of SALURBAL collaborations. Then, calculated the diversity within layers in each community (subgraph). The aim is to understand whether communities are constituted by diverse participants or whether the greatest connectivity exists among participants with the same attributes.