To the best of our knowledge, this is the first study that assesses the level of bioinformatics capacity in Tanzania. We found out that the majority of the respondents were males (59.6%), had a master's degree (48.31%) and were in the age group 26 - 32 years (52.38%). The mean work experience of the respondents in years was 6.2, indicating a young group of scientists. The highest education level for most respondents (45.8%) was a master's degree, followed by a bachelor's degree. When asked to rate their seniority at the scale of 0-100, the respondents rated themselves with mean seniority of 39.1, further indicating the perception of the junior ship in the area of bioinformatics practice. Only 21.4% were PhD holders; this is a pool of scientists that can mentor the early-career counterparts. Interestingly, most of the respondents' current area of specialization was mostly molecular biology (21.4%) while only a few (8.3%) related their complete research interest in genomics and bioinformatics, suggesting that the molecular biology scientists are diversifying their career into bioinformatics.
This survey pointed out that the infrastructure and the human capacity for conducting bioinformatics related research in Tanzania is underdeveloped. Precisely, 96.4% of the respondents perform bioinformatics analysis using personal computers/laptops, with only about 10% having access to advanced infrastructures such as high-performance computers, cloud computing and institutional servers. This severely limits the capacity to conduct bioinformatics related research as it usually involves massive datasets and requires reliable high computing capacity that cannot be afforded by personal computers alone[14]. More than 67% of the respondents use Windows operating system (OS), which does not support many genomics and bioinformatics analysis platforms, contrary to only about 14.3% who use the Linux OS that supports a broad range bioinformatics analysis tools.
For most of the respondents, usage of standard bioinformatics analysis tools was also low; therefore, it comes as no surprise that 66.7% of respondents had no publication related to bioinformatics at all. These findings align with the review by Lyantagaye (2013), who noted that the level of bioinformatics research in Tanzania was still at its infancy, with a lack of investment and underdeveloped infrastructure. The review noted the presence of one modern laboratory at SUA, capable of generating molecular biology and genomics data, and the STM-1 SEACOM undersea fibre-optic cable that was expected to increase the internet speed and bandwidth[2]. The situation is not unique to Tanzania alone. Karikari (2015) noted a low level of bioinformatics capacity in terms of personnel and infrastructure in Ghana, with frequent electrical power failures, unreliable internet connections, and lack of high-speed computing power being some of the significant infrastructural challenges[15]. In Africa, three countries are responsible for a large fraction of the bioinformatics output from the continent; South Africa, Kenya, and Nigeria. The existence of H3ABioNet has, to a large extent, tried to reduce this disparity by empowering other countries in Africa to participate and contribute to bioinformatics[8, 16].
Bioinformatics consists of multidisciplinary fields, including mathematics, computer science, statistics and others. Statistics and programming being one of the disciplines that play significant roles in building reproducible methods for biological discovery and validation, especially for complex, high-dimensional data as encountered in genomics. Therefore assessing the knowledge and level of usage of statistics and programming among the respondents was essential. We found that only a quarter of respondents reported using computer programming language and 17.9% use a database management system. The most used programming language is Python by 8 (9.5%) of the respondents and the databases management systems most used were Microsoft Access and MySQL. Both Python and MySQL find wide application in bioinformatics applications[17]. However, there are a large proportion of respondents without skills in hardcore programming. Short training may help to improve the skills of these researchers. It was also evident that the knowledge and usage of different statistical packages are mostly based on IBM's SPSS package. On the one hand, many respondents are using R statistical packages. On the other hand, packages like WinBUGS and SAS are rarely used by bioinformatics researchers in Tanzania.
Our respondents made high use of Microsoft Office Products in the increasing order of Microsoft Word, Microsoft PowerPoint and Microsoft Excel. Only a few individuals made occasional use of Microsoft Access and Microsoft Outlook, again showing less advanced use of these products.
There are many bioinformatics tools and resources that respondents said they could access, with PubMed, which they use to retrieve scientific literature, being most popular. The other frequently used resource is GenBank as well as some sequence alignment tools, showing good progress as users can access relevant and essential resources. The use of commercial products such as CLC Workbench (a QIAGEN platform for DNA, RNA and protein sequence data analysis, was limited, probably due to shortage of funding.
More than half of the respondents reported one or more problems that they face in relation to bioinformatics practice in Tanzania. Majority of the respondents (26.2%) reported a lack of training and skills as a significant problem. Only a few respondents (2.4%) reported inadequate electrical power supply and lack of internet access as challenges. The reduced cost of the internet connectivity and improvement of bandwidth has helped other Africa nations improve their bioinformatics infrastructure and capacity[18]. Tanzania has equally benefited from the bandwidth improvement, and this may be the reason that few respondents cited internet connectivity as a challenge. Capacity building through training and infrastructural support for bioinformatics research remain to be the major challenges, as noted in other African countries[4, 7, 15, 18].
Regarding most commonly performed analyses, sequence alignment and phylogenetics were used by 67.9% and 50% of the respondents, respectively. Other methods of analysis, such as GWAS were less commonly used. Both most and least frequent applications may require training modules in long or short term training.
In our study, most of the respondents 40 (47.6%) reported learning bioinformatics at bachelor's degree level, followed by 27 (32.1%) who learned at the masters' training and only 18 (21.4%) during PhD training. Conferences and workshops also serve as essential sources of bioinformatics skills for some respondents (28.6%), while a small percentage (15.5%) used online resources to learn bioinformatics skills. These later may have benefitted from the opportunity provided by the H3ABioNet[19] in addition to other training opportunities such as those used in other countries[20–22].
It is possible that most of the surveyed Tanzanian bioinformatics researchers were either trained abroad or learned bioinformatics through postgraduate research projects. Today, no full bioinformatics or computational biology degree program exists in the country. There are bioinformatics courses that are part of undergraduate and postgraduate degree programs at the University of Dar es Salaam (UDSM) and Sokoine University of Agriculture (SUA). Two undergraduate courses exist at the UDSM according to the UDSM undergraduate prospectus 2018/2019. Besides, seven postgraduate courses are also existent at UDSM according to the 2019/2020 postgraduate prospectus. At SUA, three undergraduate and three postgraduate courses are offered (SUA prospectus 2014/15)[2]. Therefore it is not surprising that most of the respondents 16.7% and 14.3% in this study are from UDSM and SUA, respectively.
There is a long way to go and an opportunity to fill the expertise gap observed in this survey. For starters, Muhimbili University of Health and Allied Sciences (MUHAS) is preparing to start a Master's of Science in Bioinformatics through collaboration with EANBiT (Eastern Africa Network for Bioinformatics Training) (http://eanbit.icipe.org/). EANBiT has developed a 2-years master's degree curriculum that has been used in training since 2017 and is expected to be adopted by MUHAS in the foreseeable future (personal communication)[23]. This will be important in establishing the critical mass of expertise in the fields of bioinformatics and computational biology in Tanzania. Eventually, it may lead to attracting grants, research projects, collaborations as well as the development of infrastructure necessary to research in the field.
In terms of curriculum development and training establishment, there are examples to learn from other countries such as India and South Africa[18, 24]. In the early days of bioinformatics, the discipline was not embedded as part of undergraduate curricula in South Africa. To address the gap, students registered for postgraduate degrees in bioinformatics in South African Universities had to start with short formal bioinformatics training before embarking in their studies. Later, the National Bioinformatics Network (NBN) developed joint courses compulsory for NBN funded students that introduced them to a range of bioinformatics topics, programming and other technical skills[18]. In India, similar initiatives were undertaken by the Biotechnology Information System (BTIS) under the Department of Biotechnology (DBT), Government of India[22].
Equally in Tanzania, there is also a need to develop relevant skills through extending undergraduate bioinformatics courses to other universities that offer biomedical, life and computer science courses in the country. Students will be exposed to the field early on and potentially incite their interest. It will also prepare them with basic knowledge and skills for postgraduate research and education specializing in bioinformatics education[25]. Besides, we advocate for the establishment of short programs for professionals who may be constrained on time to do a full-fledged degree. This can go hand in hand with existing programs and infrastructure but also in collaboration with other organizations in Tanzania, Africa and worldwide. EANBiT, for example, offers a residential training course on bioinformatics for East African students and early career researchers (http://eanbit.icipe.org/content/2018-trainees). Other successful training models were in Sudan[26].
In the era of digital technologies, bioinformatics capacity in Tanzania could greatly benefit from online learning and has to be prioritized. It is less costly, often self-paced and accessible to many people at the same time. Online learning may be more suitable for professionals who cannot spend time in physical classes. Although a multitude of online learning platforms for bioinformatics exist, relevant organizations and institutions have a critical role in developing appropriate curriculum and mobilize resources to facilitate the learning process and ensure that online learning is effective. The duration of vast online courses and resources and providing guidelines to learners is also essential.
Collaborative programs with hybrid virtual-physical models have become especially attractive recently, such as the Courses such as the 3-months Introduction to Bioinformatics (IBT) course offered by H3ABioNet (https://www.h3abionet.org/training/ibt). The annual system that has been provided since the year 2016 attracted 364 enrolled participants hosted at 20 institutions across 10 African countries in the inaugural year[19]. In 2020, the course went utterly online due to physical meetings restrictions caused by the pandemic COVID-19 but still had over 1000 participants distributed across 40 classrooms in Africa (H3ABioNet newsletter May 2020: https://spark.adobe.com/page/0OVCv7sPapYfa/). H3ABioNet has also hosted a 16S analysis course since 2019 in a similar manner.
Bioinformatics and computational biology research are expensive to conduct. Establishing collaborations among relevant institutions and stakeholders in Tanzania and with external partners may help in developing the necessary infrastructure and conducting research. Collaboration between research institutions, academia and civil societies with similar objectives regarding bioinformatics research catalyzes the rapid growth of the field. The recent establishment of the Tanzania Society of Human Genetics (TSHG) (http://tshg.or.tz/) indicates both the need and interest in furthering this critical biological sub-discipline. This will lead to the development of strong programs and improve the competitiveness of funding. In addition to joining Pan African and global networks, Tanzania needs to plan how it can improve and offer streamlined bioinformatics services. Initiatives of this nature have worked in other countries such as Australia[27, 28]. Before becoming fully capacity in bioinformatics, Tanzania needs to work closely with existing bioinformatics networks to build its capacity through training. The H3ABioNet help desk can help African countries to quickly grasp the assistance needed to get going with bioinformatics tasks[29].
The Government has a pivotal role to play by supporting basic infrastructure for education and training as well as for research and application. The Government also plays a crucial role in promoting human capacity building in bioinformatics and computational biology by ensuring that graduates are recognized by the government scheme and get job opportunities. The collaborative approach will help to guarantee the sustainability of the initiatives, training, and infrastructure and research activities. Tanzania can emulate examples from other countries where government funding has facilitated the growth of bioinformatics[18, 27, 28]. In South Africa, the leader of bioinformatics in Africa, the very early phases of bioinformatics at the South African National Bioinformatics Institute (SANBI) on the University of the Western Cape (UWC) campus was co-funded by the Government through the. South Africa's National Research Foundation (NRF)[18]. Tanzania and other African countries need to emulate the funding models of SANBI to improve bioinformatics skills and research in their institution.
The respondents agreed to participate in the bioinformatics network and genomics initiative in Tanzania. The bioinformatics community needs to work with the Government to support a national forum that brings together bioinformaticians and genomics practitioners to discuss issues of common interest. Such a forum can already build on the existing platforms such as the TGN and the TSHG to facilitate joint meetings and promote bioinformatics agenda. Similar National platforms have shown to help to build the capacity in bioinformatics in South Africa, India and Australia[18, 24, 27].