Work on distributed Resource Description Framework (RDF) is performed to manage the growing massive RDF. To utilize this large volume RDF is partition into small parts called fragments and further approach the same for allocation in the distributed database environment. usually, focus is given to reduce the communication cost during the query processing tasks. It also ensures to maintain data integrity and approximation ratio due to frequent access patterns from outside. It is also focus on balancing and allocation of fragments into different sites [34].
A heuristic approach for fragmentation is proposed to reduce transmission cost (TC) of queries in distributed environment. at initial stage fragmentation is based on cost effective model in context of relational model and at later stage based on DDBS design. There are different replication based allocation scenario i.e. mixed replication based data allocation scenario (MAS), full replication based data allocation scenario (FAS), and nonreplication data allocation scenario (NAS)[3].
Modified Bond Energy Algorithm (BEA), it is a hierarchical process to make fragments vertically and allocate the fragments into geographical sites across the network. This algorithm use affinity of attributes and is helpful to generate cluster of attributes, to calculate cluster allocation cost and also decide about their appropriate sites for allocation. Here attributes accessed collectively by the same query are placed into one fragment [35].
A study was to review and compare the existing algorithms in design perspective with a view to identify their strength and weakness. This is just to present an affective design for the distribution of data fragments on the distributed environment [18].
A nonredundant dynamic fragment allocation technique is proposed and is based on the changing access pattern at different sites with a view to improve the performance. Here fragments reallocation is depend on access made on each fragment data volumes based on defined time constraint and threshold value. This proposed technique change the reallocation strategy by modifying the read and write data volume factor and introduced threshold time volume and Distance Constraints Algorithm. Write data volume is considered for the reallocation process when more than one sites approach for the fragments. This ensures the overall improvement of distributed system performance [28, 45].
Primary concern of distributed database system design is to making fragmentation of the relations in case of relational database or classes in case of object oriented databases, allocation and replication of the fragments in different sites of the distributed system, and local optimization in each site. Fragmentation is a design technique to divide a single relation or class of a database into two or more partitions such that the combination of the partitions provides the original database without any loss of information this reduces the amount of irrelevant data accessed by the applications of the database, thus reducing the number of disk accesses. Fragmentation can be horizontal, vertical or mixed/hybrid[9].
A hybrid optimized model using information on the type and frequency of queries for fragmentation of data horizontally and vertically and is based on supervised machine learning approach to produce non overlapping fragments. These fragments are maintained by archiving process rather than deletion operation on them. These fragments are used to facilitate searching operations based on index so that database tables are partition horizontally and vertically [32].
Two algorithms Modify Create Read Update Delete (MCRUD) and Matrix based Fragmentation (MMF) for efficient partitioning of large databases without query statistics. It shows that earlier approaches of partitioning were based on type and frequency of the queries called observed or experimental data. Here it is also indicated that earlier partitioning approach were not suitable because at the initial stage of the design of distributed database query statistics are not available. In his paper, an optimal fragmentation technique is proposed to partition global relations of a distributed database when there is no data access statistics and no query execution frequencies are available. When data access statistics and query execution frequencies are not available at the initial stage then MMF is responsible to partition relation in the distributed database. MCRUD is responsible to take fragmentation decision without using empirical data [27].
Work on different replication strategies in MANET, mobile database, distributed database, and cellular network etc is highlighted. It discuss about replication protocols as ROWA, ROWA-A and Quorum Based Protocol. All are replica control protocol, ROWA is responsible to fetch the read request values to the nearest site from the occurrence of request location and replicate the changes to all the sites. An alternative approach is in the form of ROWA-Available and is same as ROWA in the case of read operations but replicate the changes only to all available replica copies and do not bother about any replication failure. ROWA-A is responsible for maintaining the availability of data but do not compromised with the correctness of data. In case of failure users are working with stale value of data. It shows incorrect or out of date copy of replica. Quorum based replica is to update the subset of replicas rather than replicate the changes as a whole and helpful to maintain consistency of data [36].
An integrated approach is proposed for DDBMS namely data fragmentation, network sites clustering and allocation of fragments. This work is responsible to improvise problems in the form of; fragmentation, redundancy in data allocation and redistribution problem due to complexity, to maintain data availability and consistency issues [27].
It is also highlighted, to maintain inconsistency issues faced by the mobile users during the access of database in his mobility from any of the activity center. Here a 5-cube structure with nearest neighbors propagation distribution protocol is proposed to make useful distributed database system for the mobile users. It ensures consistent data to all mobile users/sites by dynamically replicates the changes to all its adjoining sites from the transactional node [44].
A new dynamic deallocation approach for a given fragment as Update Matrix (UM) and Distance Cost Matrix (DM) is proposed. It works on the basis of changing data database system. It was assumed that fragments are allocated on network site is based on applied frequency value of the database data items. Reallocation of data fragments on the remote sites is planned based on communication and update cost value. Each fragment is having update cost value. Fragment having maximum update cost value is considered for reallocation and chosen candidate site to store fragments to minimize the communication cost. UM is defined as the value getting after issuing of update query at a particular site for the manipulated fragment. In this approach when same query is applied at more than one site, then queries can be treated different to each other and having different frequency value [1].
An algorithm called Simulates Annealing with Genetic Algorithm (SAGA) is used for optimal allocation of fragment in distributed environment. Here, allocation of data is depends on access patterns for fragments and focused on reducing the allocation cost during movement of data fragment from one site to another [2].
A decentralized approach for dynamic table fragmentation and allocation in distributed database systems is proposed. It is based on observation and monitoring of the sites access patterns to tables which reforms fragmentation, replication, and reallocation based on recent access history aiming at maximizing the number of local accesses compared to accesses from remote sites [26].
A new technique called Attribute Level Precedence (ALP) to partition global schema/database relations at initial and later stage in case of nonavailability of data access statistics and query execution frequencies. ALP technique is capable to take advance decision for fragmentation at the initial stage (i.e. knowledge gathered during requirement analysis phase) without empirical data statistics. ALP is a table responsible to fragment a relation horizontally based on the importance of an attribute in a network site [35, 45].
The problem of fragmenting tables so that data is accessed locally has been studied before.It is also related to some of the research in distributed file systems[20].
One important difference between distributed file systems and distributed database systems is the typical granularity of data under consideration (files vs. tuples) and the need for a fragmentation attribute that can be used for partitioning in distributed database systems.
Fragmentation is tightly coupled with fragment allocation. There are methods that do only fragmentation [5, 37, 39, 49, 50] and methods that do only allocation of predefined fragments [6, 7, 12, 15, 19, 29, 46]. Some methods also exist that integrate both tasks [14, 16, 24, 25, 38, 40, 43]. Replication, however, is typically done as a separate task [10, 13, 22, 30, 31, 48], although some methods, take an integral view of fragmentation, allocation and replication [16, 40, 43]. Dynamic replication algorithms [10, 22, 30, 31, 48] can optimize for different measures, but we believe that refragmentation and reallocation must be considered as alternatives to replication. In DYFRAM we choose among all these options when optimizing for communication costs. that replication scheme is somewhat similar to that of DIBAS [16], but DYFRAM also allows remote reads and writes to the master replica, whereas DIBAS always uses replication for reads and do not allow remote writes to the master replica. This operation shipping is important when analyses [13] of replication vs.
remote reads and writes conclude that the replication costs in some cases may be higher than the gain from local data access. A key difference between DIBAS and DYFRAM is that DIBAS is a static method where replication is based on offline analysis of database accesses, while DYFRAM is dynamic and does replication online as the workload changes.
Another important categorization of fragmentation, allocation and replication methods is whether they are static or dynamic. Static methods analyze and optimize for an expected database workload. This workload is typically a set of database queries gathered from the live system, but it could also include inserts and updates. Some methods also use more particular information on the data in addition to the query set [39]. This information has to be provided by the user, and is not available in a fully automated system. A form of static method is the design advisor [50] which suggests possible actions to a database administrator.
The static methods are used at major database reconfigurations. Some approaches, such as evolutionary algorithms for fragment allocation [6, 15], lend themselves easily to the static setting.
Static methods look at a set of queries or operations. It can be argued that the workload should be viewed as a sequence of operations, not as a set [4]. Dynamic methods continuously monitor the database and adapt to the workload as it is at the moment and are thus viewing a sequence of operations. Dynamic methods are part of the trend towards fully automatic tuning [47], which has become a popular research direction. Recently, work has appeared aiming at integrating vertical and physical partitioning while also taking other physical design features like indices and materialized views into consideration [5]. Adaptive indexing [4, 11] aims to create indices dynamically when the costs can be amortized over a long sequence of read operations, and to drop them if there is a long sequence of write operations that would suffer from having to update both base tables and indices. In adaptive data placement, the focus has either been on load balancing by data balancing [14, 24], or on query analysis [25].
Closest to modern approaches may be the work of Brunstrom et al. [12], which studied dynamic data allocation in a system with changing workloads. Their approach is based on predefined fragments that are periodically considered for reallocation based on the number of accesses to each fragment.
A third aspect is how the methods deal with distribution. The method can either be centralized, which means that a central site gathers information and decides on the fragmentation, allocation or replication, or it can be decentralized, delegating the decisions to each site. Some methods use a weak form of decentralization where sites are organized in groups, and each group chooses a coordinator site that is charged with making decisions for the whole group [22, 30].
In DYFRAM, fragmentation, allocation and replication decisions are fully decentralized.Each site decides over its own fragments, and decisions are made on the fly based on current operations and recent history of local reads and writes.
Mariposa [40, 41] is a notable exception to the traditional, manually fragmented systems.It provides refragmentation, reallocation and replication based on a bidding protocol. A Mariposa site will sell its data to the highest bidder in a bidding process where sites may buy data to execute queries locally or pay less to access it remotely with larger access times, optimizing for queries that have the budget to buy the most data. A DYFRAM site will split off, reallocate or replicate a fragment if it optimizes access to this fragment, seen from the fragment’s viewpoint. This is performed also during query execution, not only as part of query planning, as is the case in Mariposa[23].
Finally there are general methods for fragmentation in figure (1), but it figure just shown Horizontally, vertically and sum of general mixed methods. For dynamic and other ways, usually use mixed methods with combination of dynamic or other algorithm for change fragments depend on time or requests.