We investigated the criteria set out for promotion to (full) professor that are used by various institutions across the globe. Between May 2016 and July 2021, we drew on the Global Young Academy membership and alumni network to provide documents that describe promotion policies, involving criteria and procedures. This gave us 159 policies set by academic institutions themselves (‘institutional policies’) as well as 37 policies set by government agencies (‘national policies’), from a total of 55 countries. About 60% of these policies are from countries located outside Western Europe and North America, providing a globally diverse perspective, unlike other studies [5–9]; altogether our documents cover countries with a total population of 5.7 billion. With the number of researchers per million inhabitants ranging from 19 in Angola to 8,700 in South Korea, the coverage of our study amounts to over 9 million researchers (UNESCO world data bank, as of June 2026) [34, 35]. Our global coverage and workflow to analyse these documents is illustrated in Fig. 1 and methods.
A common feature of most documents describing policies for assessing promotion is to consider three major domains: research, teaching, and services. We surveyed the documents for specific criteria and mapped them to 18 categories in four groups (research outputs, career development, recognition, and service), with the particularly prevalent “research outputs” further differentiated into 11 sub-categories. Specifically, we distinguish ‘quantitative’ methods which apply metrics to measure and/or weigh a candidate’s achievements (i.e. bibliometrics) from ‘qualitative’ methods which are generally based on a qualitative description that may be ascertained by the evaluator and/or provided by the applicant.
The structure and style of the policy documents varied substantially. Ministerial-level policies were often very general or brief, whereas documents describing institutional policies may include an evaluation form with a detailed points system. We restricted our attention to the presence or absence of criteria, not considering their weight in the assessment process. Sometimes, the guidelines were less detailed than the application or evaluation form. We could not overcome the fact that many documents leave room for interpretation, likely to depend on the career stage and scholarly background of the assessors, and our analysis is based on our best understanding. If there are additional criteria applied in practice that are not explicitly mentioned in the documents, then these would not be captured by our analysis.
Unsurprisingly, all of the documents surveyed mentioned research output, with teaching (94%) and mentoring (72%) also featuring prominently (see Fig. 2). Moreover, 63% of the documents considered obtaining research funding, as well as criteria relating to services to the profession and recognition, such as administrative roles, awards, professional development, and service to society. Criteria related to career development were generally less commonly applied than those related to research outputs, recognition, and service. Quantitative assessment of research outputs was significantly more common (90%) than their qualitative assessment (55%). The most popular quantitative indicators were number of publications (63%), patents (48%), and journal impact factor (44%), but their adoption was not universal.
Beyond just measuring frequencies, we further explored the main trends with regard to policy choices in terms of co-occurrence of assessment criteria. For this, we carried out a principal factor analysis on the data extracted from the surveyed documents. Since the criteria are defined as dichotomous variables (present/absent), the analysis was based on the matrix of empirical tetrachoric correlations between them. We obtained four factors explaining 64% of the total variance of the dataset (see Fig. 3, Ext data Fig 2 and methods sect. 5.3), which synthesise a degree of association between the criteria, i.e. each factor corresponds to a family of assessment criteria that tend to appear in the same policy document. Strikingly, these four factors broadly agree with the conceptualisation of assessment criteria policies that we had previously reached by consensus ex-ante (methods section 4.3)
Each criterion aligns with at least one of the four factors (listed in order of prominence), which can be interpreted as follows: (F1) “candidate quality”: overall candidate quality, mainly assessed using qualitative criteria and encompassing a wide range of skills and competencies, and particularly including various service roles to the profession or community as well as mentoring, teaching, administration, commercialisation, and consultancy; (F2) “cumulative output”: a researcher’s cumulative output, especially in terms of publications (irrespective of quality), and length of service; (F3) “impact metrics”: several metrics, mostly focusing on publications with regard to productivity, citations, order and role of authors, and journal impact factors, but also including patents, awards, and the ability to attract external funding; (F4) “career development”: the development of the candidate’s career, including experience abroad, identifiable contributions to publications, patents, professional titles, awards, presentations, service to the profession, as well as proficiency in relevant languages.
We found some notable differences between institutional and national policies. As illustrated in Fig. 4, national policies were more narrowly focused on research outputs, teaching, length of service, and to a smaller extent, mentoring. In contrast, institutional policies far more prominently included criteria related to careers and community, particularly those that benefit the institution by enhancing its reputation, bringing in money, or taking on administrative roles. While both institutional and national policies frequently resorted to metric-based criteria for assessing research output (69% of national policies and 76% of institutional policies exhibit at least one quantitative criterion), this appears to be more pronounced for national policies, which less frequently consider qualitative criteria and more frequently base judgement on journal impact factors. However, further analysis revealed that this is mainly a Global North trend, not shared by the Global South. In fact, all national policies encompass criteria that load into “impact metrics” factor, but never very heavily. But there is significant heterogeneity among institutional policies: some heavily rely on criteria that pertain to the “impact metrics” factor, and some rather little. For all other factors, we find broad distributions over the full range. While the promotion criteria tend to group within characteristic families described by our four factors, there are pronounced differences across policies on the weight of each factor. While there is a large proportion of policies that are mostly similar, some national or institutional policies take substantially different approaches than others.
We also analysed potential differences between the Global North and Global South [36, 37], and across various per-capita income levels [38] (see Fig. 5). Most notably, institutions in the Global South more commonly adopt bibliometric approaches than those in the Global North, which more commonly adopt qualitative approaches for assessing research outputs. The contrast is particularly strong with regard to journal impact factors and the number of publications, as well as most of the qualitative criteria. For governmental policies however, we see the exact opposite trend: the Global North favours quantity over quality and the Global South favours quality over quantity. We also more pronouncedly found the “career development” factor in the Global South and the “candidate quality” factor in the Global North. Looking at per-capita income levels, we find that metrics-based assessment was most popular in upper-middle income countries and least popular in high-income countries. The use of qualitative measures is generally less prominent than that of quantitative measures, but we see striking differences between Global North and Global South and as well across per-capita income levels. For the group of lower- and lower-middle income countries, qualitative measures are least frequently used, whereas upper-middle income and high-income countries show the same pattern as found comparing Global South and North: qualitative measures featured commonly in national policies (but not in institutional policies) for upper-middle income countries as well as in institutional policies (but not in national policies) for high-income countries.
Close to half of the policy documents (43%) are discipline-specific. Based on an OECD subject classification [39], we distinguished: (1) ‘Natural Sciences’, (2) ‘Engineering and Technology’, (3) ‘Medicine and Health Sciences’, and (4) ‘Social Sciences and Humanities’, and categorise the remaining documents (57%) as ‘General’ (Extended data Fig. 1). There were broad similarities across all the disciplines, with some trends showing more clearly with regard to the four factors (see Fig. 6). For each factor and discipline we found broad distributions, which overlap substantially if we compare disciplines, i.e. the differences within the same discipline are generally larger than those between disciplines. On average, commercialisation features more frequently for Engineering and Technology, and mentoring less frequently. Collaboration most often showed up for the Natural Sciences, and least often for the Social Sciences and Humanities. The “candidate quality” factor was not as popular for the Social Sciences and Humanities as for other disciplines, while “impact metrics'' appear to be more prominent for the Natural Sciences and for Engineering and Technology. One might wonder whether this is related to these disciplines having a stronger affinity to numbers. However, the Natural Sciences also had the most pronounced rejection of the metric approach, reflected by the bi-modal distribution which shows in the violin plot. The “cumulative output” factor was most strongly matched for Engineering and Technology and least matched for the Social Sciences and Humanities. While scoring systems and emphasis across the different types of contributions may vary between disciplines, there were some differences on the prevalent types of research outputs. As a consequence, journal impact factors were not relevant for the humanities, where publications are usually monographs, nor for many fields of engineering, where conference proceedings dominate.