2.1. Big Data
Laney (2001) introduced the most common definition of big data based on a paradigm of the three V’s, known as Volume, Velocity, and Variety. In terms of data volume, it is estimated that 1.7 megabytes of new information will be generated every second for every person over the world by 2020 (Cohen, 2018). In terms of velocity, big data involves data creation speed, which is even more critical than the volume for many applications. Tick by tick data or nearly real-time information allows companies to be much more agile than their competitors (Mcafee & Brynjolfsson, 2012). The third one variety, which is the most interesting of the three V’s, that is, big data comprises the data's different natures (Sicular, 2013). For example, big data comes in messages, texts, images, updates, videos, web searches, financial transactions, emails, posts on Facebook, Twitter, and other social networks. Several scholars claimed to add two more V’s in the definition of big data, that is, variability and virality. Variability refers to the contextualisation of the data, while the virality requires the growth of big data to be exponential (Giacalone & Scippacercola, 2016). While all three former V’s are growing, the variety has become the most essential and vital driver of big data investment. This trend will be continued as more firms are seeking to integrate more sources of information. The possibility of utilizing more data from indirect sources that are not closely related to the company’s primary products can be enormous (Lee, 2018). These peculiar features and specifications lead to the two forms of big data storage including structured data and unstructured data. The data are also measured on different scales or are qualitative. Thus, big data is not just a massive amount of data, but it is a system to manage different types of data at a time (Giacalone & Scippacercola, 2016).
2.2. Data-driven banking
The progressive regulations and technological developments, particularly information technology and big data, have led to digital banking operations and virtual banking systems worldwide (Mcafee & Brynjolfsson, 2012; OECD, 2020). Also, data driven services by the service provider is always expected by the customer in real time (Motammarri et al., 2017), that is why, the market is moving fast and sustaining the market dynamics, which ultimately supports banking operations' digital growth for a more extended period and toward better banking systems. However, the regulations, transformation, technological advancements, and innovations will not work unless incumbent institutions entrants a data-driven approach with high financial performance and more significant profits (Fosso Wamba et al., 2020; Glass & Callahan, 2014; Hale & Lopez, 2019; Hung et al., 2020; Mohiuddin et al., 2021). The data innovation increases market competition in different aspects of the banking industry (Jagtiani & Lemieux, 2018; Rabhi et al., 2019), promoting the entire banking system with diversified opportunities. Even the data-driven approaches help to perform better in merger-acquisition (Zhu et al., 2020), predict the success of banking telemarketing (Moro et al., 2014), banking supply chain (Gasser et al., 2017), and banking performance (Brynjolfsson et al., 2011). Big data involve banking in different ways, mainly through big data's core characteristics, including volume, variety, and velocity (Adam et al., 2014; Feng & Shanthikumar, 2018; Gutierrez, 2017). Volume refers to the enormous quantities of data that banks try to connect to improve data-based decision-making. Banking data increases mainly from different possible sources such as consumer loans, commercial loans, demand deposits, mortgage loans, teller automation, credit desk, transaction files (ATM). This also boosts customer's demand for increased competition, better products/service, new technology usages, technology diffusion, and other factors. Besides, banks have already transferred their data systems to digital database management systems (Bedeley & Iyer, 2014) because small databases fail to keep massive data volume. Keeping up with big data volume allows banks to process its' information faster with worth value and at the same time avoid different potential embarrassing scandals due to lack of data. Variety relates to banking operations by maintaining multiple types of banking data. The bank retains various transactional records, including internal and external sources of potential data such as social networking data, business operations data, and other high-frequency data. It also refers to managing structured, semi-structured, and unstructured data (Bedeley & Iyer, 2014; Chen, 2019). Finally, velocity relates to banking operations with persistent data generation in every moment. As banking is the most crucial industry worldwide, millions of customer transaction histories are highly frequent every minute (Skyrius et al., 2018). In the traditional sense, banks can only grasp customers' financial behaviours related to the banking business. They cannot obtain emotional or behavioural data of customers' hobbies, living habits, and consumption tendencies in social life. Also failed to make linkage of customers’ emotional or behavioural data with business data. With the rapid development of data innovations, the banking industry has gradually strengthened its connection with big data sources, screened out useful information, integrated multi-channel data, and enriched customer profiles to achieve sustainable operations. The sustainability of banking operations ensures the highly secured banking operations that will lead to safe banking for all insiders and outsiders. These are crucial for social and economic development (Guimaraes & Sato, 1996; M. M. Hasan et al., 2019b; van der Gaast & Begg, 2012; Weinberg, 1998). Big data is still in the exploratory stage of operating model in the banking industry. The analysis and processing of big data, especially unstructured data, still lack effective software and hardware supports. Under this circumstance, commercial banks' big data technology decisions are at risk of making the wrong choices, being too advanced, or too lagging. The application and development of big data is a general trend. Still, premature investment in large amounts, selection of software and hardware that are not suitable for your actual situation, or too conservative inaction will have an adverse impact on the development of commercial banks (Balachandran & Prasad, 2017; Shamim et al., 2019; Soltani Delgosha et al., 2020).
2.1. The present landscape of big data and business
The rapid development of connected mobile networks, the Internet of Things, and social networks has resulted in the exponential growth of diversified data. Semi-structured and unstructured data source channels have become more complicated, leading to the modern digital information era. As the demand for data innovation increases daily, big data analytics software's revenue is also growing rapidly. In 2011, the revenue was 32.14 billion USD and almost doubled in 2018 (Liu, 2020). Also, big data and business analytics revenue are increased worldwide year to year. The global market value of big data and business analytics was valued at $168.8 billion in 2018. However, it is predicted to grow to nearly $274.3 billion by 2022, with a five-year CAGR (compound annual growth rate) of 13.2 per cent[1]. Global information is also snowballing, generating hundreds of billions of data every day (M. M. Hasan, Popp, et al., 2020). These contain past, present, internal operations, and external activities of different industries. Every day's trillions of data provide numerous opportunities to the people, such as effective communication at a lower cost, using global information systems to work together from different places, making decisions, monitoring the transaction process, and providing control measures. Global information systems also help overcome differences in distance, time, language, and culture and cooperate effectively. Cooperation can be improved through groupware software, group decision support systems, extranets, and electronic meeting facilities. For these reasons, the amount of information is boosting globally (see Figure 1).
According to Figure 1, the last decade was the booming decade for data innovation. The global information was almost two zettabytes in 2010 which remarkably increased to 59 zettabytes in 2020. The information has been producing rapidly after 2019, and it will be more than six times in the following decades. Moreover, according to Analytics Insight, 2019 was an important year in the landscape of big data. After merging with Cloudera and Hortonworks at the beginning of the year, the use of big data is on the rise globally, and organizations have begun to accept the importance of data operations and orchestration to their business success. The current value of the big data industry is US$189 billion, an increase of US$20 billion over 2018, and will continue to increase, reaching US$247 billion by 2022. Big data trends are the data scientist, data officers, and managers will be the centre of attraction; big data analytics will significantly impact investment and cloud-based operations; machine learning will get the focus (Dialani, 2020). These trends will capture the market with a significant amount of money, such as the value of data innovation in cognitive computing will reach nearly $18.6B. Data innovation in application infrastructure will reach almost $11.7B, public safety and homeland security will reach about $7.5B. The real-time data will also be considered a fundamental value proposition in every case, segment, and solution. Besides, the market-leading companies are also rapidly integrating data innovation technologies with IoT infrastructure[3]. Therefore, it is highly crucial to consider data management in any innovation-decision and corporate activities.
As a single sector, data innovation is mostly related to the banking industry, discrete manufacturing, professional services, process manufacturing, and federal government activities (Adam et al., 2014; Hassani et al., 2018b; Hussain & Prieto, 2016; Leskovec et al., 2015; Parashar, 2020; Y. Sun et al., 2019; Yadegaridehkordi et al., 2020). According to Figure 2, the banking industry is the biggest single entity from where data or innovation revenue comes from, accounting for nearly 14% of total revenue. Discrete manufacturing is the second biggest sector, which contributed to almost 11.3% of total revenue. Professional services stay at the same level as process manufacturing at 8.2% of total revenue. As the banking industry is the biggest one, identifying the advantages and challenges of data innovation is very important for every phase.
2.2. Present Literature
This study also presents a brief of existing literature on big data and banking research. Figure 3 focuses on the issues that are highly discussed on this study area. According to Figure 3, the highlighted keywords of big data and banking research are big data, banking, big data analytics, fraud detection, risk management, data mining, challenges, commercial banks, internet of things, cloud computing, and so on.
Many researchers discussed the concept of big data on industrial usage in different periods, however, mainly the discussion came in front after 2014 onwards. According to these bibliometric findings, we have found that most of the research were published after 2018, thus, we consider this is a very uprising issue of publication. Here, Figure 3 presents the notable contributions on big data and banking.
More specifically, Table 1 presents the top cited publications in this field, here we have mentioned only those publications meet the criteria of minimum 10 citation available on web of science database. However, in the Figure 4, we have specified the number of occurrences to 5.
Table 1: Notable contribution on this study area
Article Title
|
Journal Abbreviation
|
Citation
|
Article Title
|
Journal Abbreviation
|
Citation
|
(Zhong et al., 2016)
|
COMPUT IND ENG
|
186
|
(Wenzel & Van Quaquebeke, 2018)
|
ORGAN RES METHODS
|
18
|
(T. M. Choi et al., 2017)
|
IEEE T CYBERNETICS
|
93
|
(Goel et al., 2017)
|
IEEE INT CONF BIG DA
|
17
|
(Gepp et al., 2018)
|
J ACCOUNT LIT
|
47
|
(Cockcroft & Russell, 2018)
|
AUST ACCOUNT REV
|
15
|
(Hassani et al., 2018a)
|
J MANAG ANAL
|
37
|
(Gai et al., 2016)
|
|
14
|
(Herland et al., 2018)
|
J BIG DATA-GER
|
26
|
(Evangelatos et al., 2016)
|
PUBLIC HEALTH GENOM
|
12
|
(N. Mohamed & Al-Jaroodi, 2014)
|
|
25
|
(Bauder & Khoshgoftaar, 2018a)
|
|
11
|
(N. Sun et al., 2014)
|
IBM J RES DEV
|
25
|
(Bakken & Reame, 2016)
|
ANNU REV NURS RES-SE
|
11
|
(Muhammad et al., 2018)
|
INFORM SYST FRONT
|
25
|
(Bauder & Khoshgoftaar, 2018b)
|
HEALTH INF SCI SYST
|
11
|
(Hariri et al., 2019)
|
J BIG DATA-GER
|
24
|
(Bauder et al., 2018)
|
PROC INT C TOOLS ART
|
10
|
(Sobolevsky et al., 2014)
|
IEEE INT CONGR BIG
|
23
|
(Calvard & Jeske, 2018)
|
INT J INFORM MANAGE
|
10
|
(Li et al., 2018)
|
ANN OPER RES
|
23
|
(Park et al., 2019)
|
J RETAIL CONSUM SERV
|
10
|
(Pérez-Martín et al., 2018)
|
J BUS RES
|
20
|
(M. M. Hasan, Popp, et al., 2020)
|
J BIG DATA-GER
|
10
|
Source: Authors’ compilations (Collected from WoS database)
Footnote:
[1] https://www.statista.com/statistics/551501/worldwide-big-data-business-analytics-revenue/
[2] The total amount of data created, captured, copied, and consumed in the world is forecast to increase rapidly, reaching 59 zettabytes in 2020. The rapid development of digitalization contributes to the ever-growing global data sphere
[3] https://www.globenewswire.com/news-release/2020/03/18/2002786/0/en/Global-Big-Data-Market-Insights-2020-2025-Leading-Companies-Solutions-Use-Cases-Business-Cases-Infrastructure-Technology-Integration-Industry-Verticals-Regions-and-Countries.html
[4] Note: The statistic shows the leading industries based on their share of the global big data and analytics market in 2019. That year, banking will be responsible for producing 13.9 percent of big data and business analytics revenues. In total, the market is forecasted to grow to 189.1 billion U.S. dollars in revenue in that year (https://www.statista.com/statistics/616225/worldwide-big-data-business-analytics-revenue/).
[5] Note: VOSviewer is used to experiment the bibliometric data, maximum number of occurrences is specified here 2, thus of the 466 keywords, 51 meet the threshold
[6] Note: VOSviewer is used to experiment the bibliometric data, maximum number of occurrences is specified here 5, thus 38 meet the threshold