Geoinference of Author Affiliations using NLP-based Text Classification

doi:10.21203/rs.3.rs-4193984/v1

Download PDF

Article

Geoinference of Author Affiliations using NLP-based Text Classification

https://doi.org/10.21203/rs.3.rs-4193984/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 16 Oct, 2024

Read the published version in Scientific Reports →

You are reading this latest preprint version

Author affiliations are essential in bibliometric studies, requiring relevant information extraction from free-text affiliations. Precisely determining an author's location from their affiliation is crucial for understanding research networks, collaborations, and geographic distribution. Existing geoparsing tools using regular expressions have limitations due to unstructured and ambiguous affiliations, resulting in erroneous location identification, especially for unconventional variations or misspellings. Moreover, their inefficient handling of big datasets hampers large-scale bibliometric studies. Though machine learning-based geoparsers exist, they depend on explicit location information, creating challenges when detailed geographic data is absent. To address these issues, we developed and evaluated a natural language processing model to predict the city, state, and country from an author's free-text affiliation. Our model automates location inference, overcoming drawbacks of existing methods. Trained and tested with MapAffil, a publicly available geoparsed dataset of PubMed affiliations up to 2018, our model accurately retrieves high-resolution locations, even without explicit mentions of a city, state, or even country. Leveraging NLP techniques and the LinearSVC algorithm, our machine learning model achieves superior accuracy based on several validation datasets. This research demonstrates a practical application of text classification for inferring specific geographical locations from free-text affiliations, benefiting researchers and institutions in analyzing research output distribution.

Physical sciences/Mathematics and computing/Computer science

Physical sciences/Mathematics and computing/Software

The field of bibliometrics has seen a significant increase over recent years, fundamentally contributing to research evaluation methodology.¹ The use of machine learning and natural language processing (NLP) have become an integral part of many scholarly fields—however, bibliometric analyses present numerous untapped opportunities for applying machine learning techniques: more specifically, the extraction of location information from author affiliations, which is crucial for mapping authors to cities and countries worldwide. By including geographic dimensions in bibliometric studies, a more nuanced understanding of authorship can be achieved such as the impact of research in different regions. Additionally, geoparsing author affiliations can enable institutions to track the geographic locations of authors who have published papers produced from that particular institution, providing new possibilities for analysis and insight. Furthermore, the COVID-19 pandemic enforced unprecedented remote collaboration, and communication has changed drastically with new technologies like real-time video conferencing; the shift towards widespread remote work during and after the COVID-19 pandemic has particularly gained a new focus among researchers who aim to better understand relationships between collaboration and physical distance. Finally, the widespread availability of growing social media platforms has significantly expanded the dissemination of scholarly work in recent years, and the discussion of how geography plays a role in academic research has become especially prevalent in light of the increasingly interconnected global society facilitated by social media.

However, manually extracting location information from author affiliations is time-consuming and error-prone. In October of 2013, the National Library of Medicine stopped performing quality control of author affiliations and making any necessary edits; thus, the process of collecting geographical data of authors is quite tedious and far from trivial.² Although there are several existing tools and datasets for geoparsing free-text affiliations, most utilize some form of regular expressions to extract and resolve toponyms (names of geographical places). However, there are several limitations to this method. Due to the unstructured and often ambiguous nature of free-text author affiliations, relevant toponyms are not always existent in free-text affiliations. Oftentimes, the city or even state is not directly mentioned in the affiliation text. As such, affiliations with unconventional variations, writing styles, or misspellings often result in the erroneous identification of an affiliation’s specific geographic location when utilizing existing geoparsing tools. Moreover, these methods of geoparsing are not optimized for speed, meaning that bibliometric studies that require the geoparsing of a large amount of free-text data are extremely time intensive.³ Although there are existing geoparsers that utilize machine learning rather than regular expressions, these machine learning-based geoparsers are designed solely for extracting geographical entities, and they depend on affiliation texts to provide detailed location information explicitly, including the city and country names within the text. Consequently, this requirement presents the same constraint as regular expressions, making it challenging to accurately identify locations when such high-resolution information is not explicitly mentioned in the input string.

Out of the several existing datasets and tools that are able to provide valuable geoparsing information for author affiliations in PubMed publications, such as Mordecai, GoPubMed, GeoMaker, Google Maps, and CLIFF, the MapAffil dataset outperforms the rest in both resolution and accuracy, precisely mapping authors to their respective cities, states, and countries.³ However, the publicly available MapAffil dataset has only geoparsed PubMed affiliations from papers published in 2018 or earlier, highlighting the need for a new geoparsing model. Furthermore, not only is the algorithm that was utilized to create the MapAffil dataset not publicly available for use, but it was also described by its creators to be one that relied on extensive regular expressions, rendering it both error-prone and extremely time intensive.

Due to the comprehensiveness of the 2018 MapAffil dataset, we designed an NLP model that was trained on geoparsed MapAffil affiliations. Unlike the MapAffil algorithm, our model employs a machine learning approach that utilizes specialized text classification to infer the city, state, and country of an author’s free text affiliation, even when the input string contains no specific geographic entities. Our model enables large scale bibliometric analyses involving author affiliations of publications past 2018, and can also be efficiently utilized on other substantial quantities of text that require specific location inference.

MapAffil is one of the highest performing bibliographic datasets for mapping author affiliation strings to their respective cities and geocodes, and we used their most recent dataset of geoparsed PubMed affiliations to generate an appropriate training and testing dataset for our NLP model.³ The current MapAffil 2018 dataset is based on a snapshot of PubMed taken in December of 2018, and maps all PubMed author affiliations (within the December 2018 snapshot) to cities and their geocodes worldwide with extracted disciplines, inferred GRIDs, and assigned ORCIDs. The complete dataset was first downloaded as a TSV file (single tab-delimited Latin-1 encoded file, with only the City column using non-ASCII characters), and then converted to a Parquet file for the purposes of our model. This resulted in a comprehensive Parquet file of the MapAffil 2018 dataset, containing approximately 52 million authorships. These 52 million authorships were then reduced to approximately 20 million unique affiliation texts. These unique affiliation texts were utilized as the training and test data for our NLP geoparsing model.

In order to clean our dataset, we took several measures that would filter out any “noisy” free text affiliations that were particularly ambiguous (Fig. 1). First, we removed all affiliations that had minimal information regarding academic institutions or geographical locations, rendering them impossible to geocode. In order to classify an affiliation as one that was impossible to resolve, we utilized spaCy's named entity recognition (NER) to identify all ORGs (companies, agencies, institutions) and GPEs (geopolitical entity, i.e. countries, cities, states) in each free text affiliation. We used spaCy’s English transformer pipeline (en_core_web_trf) with a batch size of 8000, disabling the following components: "tok2vec", "tagger", "parser", "attribute_ruler", and "lemmatizer". Utilizing these spaCy outputs, affiliations that had neither ORGs nor GPEs detected were removed from the full dataset. Affiliations where the only ORG detected was “Department of…” or “Division of…” and no GPEs detected were removed as well, as these affiliation texts had no significant identifying information for location inference. Next, affiliations with no country labeled in the MapAffil dataset were also removed. In addition, all affiliations denoted by the prefixes FROMPMC, FROMNIH, and FROMPAT were removed as well, as these were supplemented with data from PubMed Central, NIH grants, and Microsoft Academic Graph to compensate for missing affiliations in PubMed before 1988 or for non-first authors; these affiliation texts were found to be far lower in reliability in assigning affiliation data to specific authors.⁴ In order to address the issue of certain affiliations containing information for multiple authors, affiliations that exceeded a certain length (over 200 characters, accounting for about 7% of MapAffil affiliations), or contained semicolons were also excluded from the dataset, as most affiliations that exceeded 200 characters or contained semicolons were often observed to be ones that contained affiliation information for more than one author. Affiliations that had incomplete city information were excluded from the training dataset but were set aside to be later used as a validation set for our model, as MapAffil failed to successfully extract a city for a number of affiliation texts. Table 1 provides examples of each type of affiliation that was removed from the original dataset, illustrating why it was necessary for these to be filtered out. Ultimately, our dataset was left with approximately 16 million "clean" affiliations.

Table 1

Examples of noisy affiliations that were removed from training and testing data
No ORG, No GPE	• Due to the number of contributing authors, the affiliations are provided in the Supplemental Material. • Authors' affiliations are listed at the end of the article. • Departments of Pediatrics. • Affiliation: unknown; Email: unknown.
No country labeled by MapAffil	• Department of Psychology. • Division of Cardiology. • Editor. • Associate Professor.
PMC/NIH/PAT prefix	• FROMPMC: From the Laboratories of The Rockefeller Institute for Medical Research • FROMNIH: BOSTON UNIVERSITY MEDICAL CAMPUS • FROMPAT: Russian Academy of Sciences
Over 200 characters	• Resource for Biocomputing, Visualization, and Informatics, University of California, San Francisco, CA 94143, USA and National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. (250 characters) • Department of Electrical Engineering, Stanford University, Stanford, CA 94035, USA, Department of Bioinformatics, Bina Technologies, Redwood City, CA 94065, USA, Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA, Mayo Clinics, Department of Health Sciences Research, Rochester, MN 55902, USA, Department of Statistics, Stanford University, Stanford, CA 94035, USA and Department of Health Research and Policy, Stanford University, Stanford, CA 94035, USA. (500 characters)
Contains semicolons	• Investigation performed at The Carrell Clinic, Dallas, Texas, USA; Department of Orthopaedics, Washington University School of Medicine, St Louis, Missouri, USA; Department of Orthopaedics and Rehabilitation, Vanderbilt University Medical Center, Nashville, Tennessee, USA; and Reedsburg Area Medical Center, Reedsburg, Wisconsin, USA. • Novartis Institutes for Biomedical Research, Oncology Disease Area, Basel 4002, Switzerland; Cambridge, MA 02139, USA; and Emeryville, CA 94608, USA.
Incomplete city labeling by MapAffil	• University of Minnesota, USA. (MapAffil “city”: MN, USA) • Department of Psychology, Florida State University. (MapAffil “city”: FL, USA) • Health Care Management Department, Wharton School, University of Pennsylvania. (MapAffil “city”: PA, USA)

The dataset was then preprocessed and split into training and testing sets, enabling the NLP model to predict the city/state/country for each free-text affiliation. First, the affiliation texts and their corresponding cities were extracted from a subset of the top 1,000,000 most common authorships from the clean dataset and stored in separate lists. The affiliation texts were converted into numerical features using the Term Frequency-Inverse Document Frequency (TF-IDF) vectorization technique. TF-IDF is a numerical statistic that captures the importance of each word in the text relative to the entire dataset; this vectorization step assisted in down-weighting words that were common across all affiliations but up-weight words that were more specific to a particular affiliation.⁵ Stop words (common words like "the," "and," etc.) were also removed during this process to reduce noise and improve model performance. In addition to TF-IDF, we experimented with the Bag of Words (BoW) feature engineering technique, which represents a document text as an unordered collection of words while keeping track of word frequency.⁶ However, we observed that TF-IDF vectorization resulted in overall higher F1 scores across various text classifiers, and ultimately selected it as the final vectorizer for our geoinference NLP model. After each affiliation text was transformed into numerical vectors with TF-IDF, they served as the input variable for the model, and the list of corresponding cities (formatted as “city, state, country”) contained the target labels for which the model aimed to predict.

The text classifier of choice for the model was LinearSVC, a Support Vector Machine known for its effectiveness in high-dimensional spaces and multi-class data, and its implementation with the liblinear library provided greater flexibility, efficiency in handling large datasets, and competitive accuracy compared to other NLP text classifiers. The LinearSVC classifier is similar to SVC with the kernel parameter set to ‘linear’; however, it is implemented using the liblinear library instead of libsvm, offering greater flexibility in selecting penalties and loss functions. Additionally, this implementation is designed to handle large amounts of data more efficiently, making it well-suited for scaling to a substantial number of samples. The LinearSVC algorithm was selected after extensive experimentation with other leading text classification algorithms widely used in natural language processing tasks: these included Random Forest, an ensemble learning technique that functions by creating numerous decision trees, as well as Logistic Regression, a linear algorithm that models the relationship between the input features and outcome by estimating probabilities with a sigmoid function, and lastly Multinomial Naive Bayes, a probabilistic classification algorithm based on Bayes’ theorem that is particularly suited for text classification tasks involving discrete features. Table 2 presents the overall accuracy and F1 scores achieved by different NLP text classifiers. For each classifier, the table also includes results for two distinct types of feature engineering (TF-IDF and BoW). Figure 2 graphically represents the relative accuracies for various leading NLP text classifiers, including LinearSVC, Random Forest, Logistic Regression, and Multinomial Naive Bayes.

Table 2

Overall accuracy and F1 scores achieved by different NLP algorithms and feature types
Classifier	Vectorizer	Accuracy	F1	Training Size	Test Size
Linear SVC	TF-IDF	0.91	0.89	10,000	1,000
Random Forest	TF-IDF	0.87	0.84	10,000	1,000
Logistic Regression	TF-IDF	0.77	0.70	10,000	1,000
Multinomial Naive Bayes	TF-IDF	0.44	0.38	10,000	1,000
Linear SVC	BoW	0.91	0.88	10,000	1,000
Random Forest	BoW	0.86	0.84	10,000	1,000
Logistic Regression	BoW	0.86	0.83	10,000	1,000
Multinomial Naive Bayes	BoW	0.56	0.47	10,000	1,000

The model was first evaluated on a test set consisting of 10,000 affiliations randomly sampled from the clean dataset of MapAffil geoparsed affiliations. Predictions for the city, state, and country of each author's affiliation were generated as a single string, mirroring the format of MapAffil's “city" column which combines all three. Evaluation metrics, including overall accuracy, precision, recall, and F1 score, were employed to assess the model's performance. The overall accuracy achieved was 95.59%. Precision, measuring the proportion of correct positive predictions, yielded a value of 94.34%, while recall, reflecting the ability to identify all relevant instances, attained 95.59%. These results demonstrate the model's efficacy in accurately extracting location information from author affiliations. Notably, the F1 score (suitable for imbalanced datasets) which combines precision and recall, yielded a high value of 94.77%, further confirming the model's robust performance. To exemplify its accuracy, several affiliations lacking explicit city/country/state information are provided in Table 3, with the model successfully extracting all three components.

Table 3

*Model’s predictions of the city/state/country for affiliations lacking explicit geographical information*
Affiliation Text	City Prediction by Model
Weill Cornell Medical College.	Manhattan, New York, NY, USA
Service de Nephrologie-Hemodialyse, Centre Hospitalier d'Ajaccio.	Ajaccio, Corsica, France
Department of Cardiac Surgery, National Cerebral and Cardiovascular Center.	Suita, Osaka, Japan
Servico de Neurologia, Hospital das Clinicas da Universidade Federal de Minas Gerais/EBSERH.	Minas Gerais, Brazil
Department of Engineering and Public Policy, Carnegie Mellon University.	Pittsburgh, PA, USA

In addition, we tested our model on a set of 100 randomly sampled affiliations that MapAffil was unable to assign a city to, as these were deemed too “ambiguous” for the MapAffil algorithm to geoparse. Out of the 100 manually inspected cases, our model successfully predicted the correct city/state/country to 51 affiliation texts. This demonstrates the efficacy of our classification model compared to the MapAffil algorithm; if these results were extrapolated to all 245,000 affiliations that MapAffil was unable to completely geoparse, our model would be able to resolve an additional 125,000 ambiguous affiliations compared to one of the best performing affiliation geocoding datasets that currently exist. Table 4 provides a subset of affiliations that MapAffil was unable to completely geoparse, but our model successfully assigned a city/state/country to. The entire list of the model’s predictions for the 100 selected affiliations is provided in Supplementary Table S1.

Table 4

*Model’s predictions of the city/state/country for affiliations that MapAffil was unable to completely label*
Affiliation Text	City Prediction	Incomplete MapAffil Labeling
Professor and Interim Chair of Family Medicine and Assistant Dean for Faculty vitality at Rutgers New Jersey Medical School.	Newark, NJ, USA	NJ, USA
Department of Family and Nutritional Sciences, University of Prince Edward Island, Prince Edward Island, Canada.	Charlottetown, PE, Canada	PE, Canada
First Affiliated Hospital, School of Medicine, Zhejiang University, Zhejiang Province, China.	Hangzhou, Zhejiang, China	China
Department of Internal Medicine, Vermont Center on Behavior and Health, University of Vermont Larner College of Medicine, United States.	Burlington, VT, USA	VT, USA
Department of Animal Health and Antimicrobial Strategies, National Veterinary Institute, SVA, Osterakersvagen 21, SE-81040, Hedesunda, Sweden.	Uppsala, Sweden	Sweden

Lastly, we tested our model on a set of 100 affiliations of authors that published papers after 2018, as MapAffil only geoparsed PubMed affiliations that existed prior to the December 2018 snapshot. Again, a manual inspection of the model’s predictions yielded a high accuracy of 96%, only failing to assign the correct city for 2 affiliations (the remaining 2 “incorrect” predictions were found to be affiliation texts that referred to multiple authors). Table 5 provides a subset of examples of post 2018 affiliations that were successfully geoparsed by our model. The entire list of the model’s predictions for the 100 selected affiliations is provided in Supplementary Table S2.

Table 5

Model’s predictions of the city/state/country for affiliations from articles published after December 2018
Affiliation Text	City Prediction by Model
Genito Urinary Oncology, Prostate Brachytherapy Unit, Goustave Roussy, Paris, France	Villejuif, Paris, France
Division of Paediatric Surgery, The National Hospital, Abuja, Nigeria	Abuja, FCT, Nigeria
CAS Key Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, China	Mengla, Xishuangbanna, Yunnan, China
Centre for Medical Image Computing, University College London, London, United Kingdom; Epilepsy Society MRI Unit, Chalfont St Peter, United Kingdom	London, UK
Pediatric Heart Lung Center, Section of Pulmonary Medicine, Department of Pediatrics, University of Colorado Anschutz Medical Center, Aurora, CO	Aurora, Denver, CO, USA

These results reinforce the necessity of employing machine learning techniques for geoparsing tasks and affirm the model's potential as a valuable tool for accurately extracting location information from author affiliations.

After extensive testing on several different validation sets, our NLP model achieved accuracy, precision, recall, and F1 scores that were all close to 95%, indicating its efficacy in accurately extracting the city, state, and country information from free-text author affiliations. Our model's ability to successfully geoparse affiliations that were "too ambiguous" for MapAffil, one of the best performing algorithms/dataset for geoparsing author affiliations, demonstrates the effectiveness of utilizing specialized machine learning to geoparse affiliations. Lastly, the model's success in geoparsing PubMed affiliations from publications after 2018 further highlights its applicability to extract location information from unstructured text that extends beyond MapAffil's coverage, allowing for its utilization across various text sources.

However, there are potential limitations and weaknesses to consider; one of the main challenges is the ambiguity and variability of author affiliations, which can lead to errors in the extracted location information. It should be noted that the model's performance is constrained by its training on the existing MapAffil 2018 dataset, limiting its accuracy for previously unseen cities, states, and countries. Nevertheless, given the extensive and comprehensive nature of the MapAffil dataset, it is unlikely that this limitation will frequently arise. Moreover, the accuracy of MapAffil’s dataset is not perfect, despite its use as a training set for our model; there are several instances where affiliations in the MapAffil dataset were mapped to incorrect cities such as Stanford University being mapped to Palo Alto, CA instead of Stanford, CA. It should be noted however that our model was able to successfully map this particular affiliation to its correct city of Stanford, CA, demonstrating the advantage of employing machine learning to geoparse free-text. Furthermore, our model is limited to predicting cities for affiliations that refer solely to a singular author; a potential improvement is implementing the model’s ability to output multiple cities when a singular affiliation text includes information for multiple authors. Finally, due to the fact that the MapAffil dataset exclusively geocoded PubMed affiliations, our model is hence optimized for author affiliations from PubMed. However, if a similar training dataset to MapAffil is provided, the same approach to building our NLP model is certainly possible as well.

While our current model is capable of extracting location information up to the city level based on an author's affiliation, there is room for improvement to predict more precise location details, even down to specific buildings. This enhancement can be achieved by providing our model with a pre-geoparsed list that contains data with building-level resolution. By incorporating a higher resolution input file as a training dataset for our model, it becomes possible to generate predictions with significantly enhanced accuracy. This advancement would greatly benefit certain studies that necessitate geoparsing affiliations. For instance, a study published in 2010 examined the impact of physical distance between co-authors on collaboration⁷; in this particular study, researchers manually obtained location information, going as far as identifying the precise buildings associated with each affiliation. However, if our model had access to a pre-geoparsed list containing a sample of affiliations, the overall study would have benefitted from significant efficiency improvements by using our model to automate the geoparsing process. This example illustrates the potential for our model to be utilized in future bibliometric studies that require geoparsing author affiliations, fostering greater efficiency and effectiveness in similar research endeavors.

Although powerful large language models (LLMs) such as GPT-4 may appear capable of performing similar tasks to our model, they have several limitations when it comes to extracting the city, state, and country from an author's affiliation. These include the lack of domain-specific training, which may result in a lack of specialized knowledge and context for accurate geoparsing. Additionally, LLMs may face challenges in entity recognition, struggling to accurately identify and extract geopolitical entities and organizations from the text, especially when specific location information is not explicitly mentioned in the affiliation text; this issue most commonly arises when encountering complex affiliation structures and disambiguating location references, leading to potential errors or incomplete extraction. In fact, when the same prompt is given multiple times, the responses vary widely each time and are extremely inconsistent. Moreover, it's important to note that LLMs lack a specific geoparsing function, making it inherently less suited for accurate and reliable location prediction compared to specialized NLP models specifically designed for geoparsing tasks; its limited access to external geoparsing resources and datasets restricts its ability to leverage comprehensive and real-time information. Lastly, LLMs are not optimized for processing and geoparsing large numbers of affiliations efficiently, which can hinder its scalability and practicality for large-scale analyses. In contrast, specialized NLP models trained on domain-specific datasets, as presented in this paper, offer a more targeted and reliable approach to accurately extract location information from author affiliations, and can be efficiently utilized on substantial quantities of text.

Ultimately, our model contributes to the fields of NLP and bibliometrics by showcasing the practical application of text classification for location prediction based on research affiliation data, and by outlining the use of various NLP techniques in the development of an effective geoparsing model.

COMPETING INTERESTS

The authors declare no competing interests.

Author Contribution

BL wrote the main manuscript text and prepared figures 1-2 as well as tables 1-5. JB and IK supervised the entire research. All authors analyzed the results and reviewed the manuscript.

CODE/DATA AVAILABILITY

The datasets generated during and/or analyzed during the current study are available in the affiliation-geoinference repository, https://github.com/leebr27/affiliation-geoinference

Ellegaard, O., Wallin, J. A. The bibliometric analysis of scholarly production: How great is the impact. Scientometrics 105, 1809–1831 10.1007/s11192-015-1645-z (2015).
U.S. National Library of Medicine. MEDLINE®/pubmed® XML Element Descriptions and their Attributes https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html#medlinecitation (2018).
Torvik, V. I. MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide. Dlib Mag 21, 10.1045/november2015-torvik (2015).
Tuomela M. S., Fegley B. D., Torvik V.I. Introducing the Author-ity Exporter, and a case study of geo-temporal movement of authors. In: METRICS Workshop ASIST Annual Meeting, http://hdl.handle.net/2142/91612 (2016).
Rajaraman, A., Ullman, J.D. Mining of Massive Datasets. 1–17, 10.1017/CBO9781139058452.002 (2011).
Qader, W. A., Ameen, M. M., Ahmed B. I., An Overview of Bag of Words;Importance, Implementation, Applications, and Challenges, 2019 International Engineering Conference (IEC), 200–204 10.1109/IEC47844.2019.8950616. (2019).
Lee, K., Brownstein, J. S., Mills, R. G. & Kohane, I. S. Does collocation inform the impact of collaboration. PLoS One 5, e14279, https://doi.org/10.1371/journal.pone.0014279 (2010).

No competing interests reported.

AffiliationGeoinferenceSUPPLEMENTARYINFORMATION.pdf

Download PDF

Journal Publication

published 16 Oct, 2024

Read the published version in Scientific Reports →

Editorial decision: Revision requested
03 Jun, 2024
Reviews received at journal
13 May, 2024
Reviews received at journal
02 May, 2024
Reviewers agreed at journal
22 Apr, 2024
Reviewers agreed at journal
21 Apr, 2024
Reviewers invited by journal
21 Apr, 2024
Editor assigned by journal
12 Apr, 2024
Editor invited by journal
08 Apr, 2024
Submission checks completed at journal
08 Apr, 2024
First submitted to journal
30 Mar, 2024

You are reading this latest preprint version

Geoinference of Author Affiliations using NLP-based Text Classification

Status:

Journal Publication

Version 1

Abstract

Figures

INTRODUCTION

DATA AND METHODOLOGY

RESULTS

DISCUSSION AND CONCLUSION

Declarations

COMPETING INTERESTS

Author Contribution

CODE/DATA AVAILABILITY

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1