3.1. General idea and framework
Federated learning allows data nodes to perform multiple rounds of local model training locally and then upload the local model to a central node for parameter aggregation, thus avoiding transmitting the original data across nodes and protecting data privacy to a certain extent. The core idea of the FedAvg algorithm is to use intermediate information, such as model parameters, to replace the original data to transmit between nodes [25]. However, this intermediate information is often the "refinement" of the knowledge contained in the original data, and there is still a risk of privacy leakage when exposed to adversaries. In this paper, privacy leakage is mainly divided into two categories. (i) Privacy leakage caused by local information exposure. (ii) Privacy leakage caused by global information exposure.
This paper proposes the following scheme ideas to resist two types of privacy leakage risks. (i) The adversary can reconstruct the local dataset from the data uploaded by a client in each round. In this paper, we use secure multi-party computation to hide the data uploaded by the client in each round and ensure that the server can summarize the uploaded data to obtain the correct aggregation results to avoid the leakage of local information. (ii) Considering that the amount of information in the data only decreases with the computation or processing, when the adversary cannot steal the personal data uploaded by the user, the closest information to the original data can be obtained is the aggregation model in each round. In this paper, based on the idea of local differential privacy, the client adds a specific perturbation to the local model obtained by local training and uploads the perturbed model to the server so that the aggregation process of each round satisfies differential privacy, that is, whether a sample of a client participates in the training or not, the distribution of the global model after aggregation does not change significantly [26]. Thus, the aggregation model can be prevented from being exploited by adversaries.
The overall framework of the proposed model is shown in Fig. 1. Participating nodes include (i) \(n\) clients \({C}_{1},{C}_{2},\cdots ,{C}_{n}\), responsible for local storage of their private datasets. (ii) \(m\) servers \({S}_{1},{S}_{2},\cdots ,{S}_{m}\), \(m\ge 2\), responsible for aggregate calculation of data shares. There is a secure channel between the client and the server. Table 1 lists some notations and descriptions used in this paper.
Table 1
Symbol
|
Description
|
\({S}_{i}\)
|
\(i\)th server node
|
\({C}_{i}\)
|
\(i\)th client node
|
\({D}_{i}\)
|
Local dataset of\({C}_{i}\)
|
\(\left|{D}_{i}\right|\)
|
The number of samples that \({D}_{i}\) contains
|
\(N\)
|
Minimum of\(\left|{D}_{i}\right|\)
|
\({M}^{r}\)
|
Global model of \(r\)th round
|
\({M}_{i}^{r}\)
|
Local model of the client \({C}_{i}\) in the \(r\)th round
|
\({M}_{i,j}^{r}\)
|
Model shares uploaded by \({C}_{i}\) to \({S}_{j}\) in the \(r\)th round
|
\({M}_{\ast ,j}^{r}\)
|
Aggregate share of server \({S}_{j}\) in the \(r\)th round
|
\(R\)
|
Total number of training rounds
|
\(C\)
|
Upper bound on the L2 norm
|
\(K\)
|
Lower bound on the number of clients
|
\(B\)
|
Size of mini-batch
|
\(E\)
|
The number of iterations of the client traverses the dataset
|
3.2. Threat model
The system mainly has three types of roles: client, server, and external adversary. This paper mainly considers the first two types of internal adversaries who directly participate in the training process and are more threatening.
Server. Assume that the server is semi-honest. It can correctly execute the algorithm and protocol process, but it will try to infer more private information based on the collected data. Simultaneously, it is assumed that the number of colluding server opponents is less than the threshold \(t\) of secret sharing, with (\(n,n\))-threshold secret sharing scheme as an example, assuming that there is at least one honest server.
Client. Assuming that the client is semi-honest, the goal of the adversary client is to obtain the relevant information of the honest client's training data by viewing the interactive content rather than uploading maliciously tampered data that will reduce the accuracy of the model or even cause the training not to converge. Simultaneously, the number of colluding clients is assumed to be less than \(n-1\). Otherwise, for the reversible aggregation function \(F\left({d}_{1},\cdots ,{d}_{n}\right)\), the colluding node can infer the input of the only honest node through the output and the known \(n-1\) inputs.
External adversary. The model is deployed to a node or cloud to provide prediction services after training. The adversary can analyze the output from limited access to the model interface and try to infer local data on a client. Considering the knowledge and ability of the adversary, the external adversary cannot obtain the intermediate information of the training process, so the attack's success rate is often lower than the above two types of internal adversaries.
3.3. Training process
The algorithm incorporates the following parameters: a set of servers denoted as \(S=\left\{{S}_{1},{S}_{2},\cdots ,{S}_{m}\right\}\), where m is greater than or equal to 2, and a set of clients represented by \(C=\left\{{C}_{1},{C}_{2},\cdots ,{C}_{n}\right\}\), with \(n\) being greater than or equal to 3. Each client corresponds to a local dataset, \(D=\left\{{D}_{1},{D}_{2},\cdots ,{D}_{n}\right\}\). It is assumed that a minimum number of \(K\) clients upload parameters per round. The machine learning algorithm employed, denoted as \(L\), is consistently executed by all clients during their local training. In this study, the optimization algorithm utilized to train the model \(M\) is gradient descent, with the model architecture declared prior to the training process. The primary focus of this research lies in training neural networks. The parameters for differential privacy are \(\epsilon\) and \(\delta\), where smaller values correspond to higher degrees of privacy protection. The maximum number of colluding servers is denoted as \(t{\prime }\), and the threshold of the secret sharing scheme should exceed this value. Lastly, the total number of training rounds is denoted as \(R\).
The specific procedure of the algorithm for training a privacy-preserving federated learning model referred to as Algorithm 1, is presented as follows within an academic context. Initially, the server initializes the model parameters, denoted as \({S}_{1}\). Subsequently, the client downloads the model and employs its local dataset for training, which results in acquiring new model parameters. A sequence of operations is conducted on the local model to maintain control over the sensitivity of the aggregated model parameters. Initially, the local model is trimmed and compressed, followed by the addition of qualified noise. The resulting model is then shared among all servers in a secretive manner. In this context, the FedAvg weighted average technique aggregates the model parameters. To facilitate clarity and simplicity in the algorithm presentation, each client is assumed to employ a dataset of equal size for local training. After a specific duration, allowing for adequate parameter updates, the server locally averages the parameter shares to obtain the aggregated shares. The client subsequently downloads the aggregate shares from each server to reconstruct the secret and obtain the updated model parameters. Repeating these steps can inform the training process until the desired objective is achieved.
Secret sharing and secure computation protocols are commonly designed based on algebraic structures like finite fields or commutative rings. However, these structures are not directly applicable to real-world data scenarios. Consequently, it becomes essential to appropriately encode data and establish a mapping relationship with the aforementioned algebraic structures. In our approach, we transfer the model parameters to the ring \({Z}_{2}^{l}\), where fixed-point numbers with \(l\) bits represent the actual parameters. Within this representation, the lower \(e\) bits are allocated for the decimal places. To illustrate, consider the floating-point parameter \(x\) in Step14 of the local model \({M}_{i}^{r}\). Its encoded form, denoted as \(x{\prime }\), is obtained using the expression \({x}^{{\prime }}=\text{i}\text{n}\text{t}\left(x\ast {2}^{e}\right)\), where int denotes the rounding operation. In this study, we set \(l\) to be 64 and \(e\) to be 32, allowing us to store the encoded data using the int64 data type. Notably, the encoded fixed-point number exhibits a maximum range of expression defined as \(\left[{-2}^{l-e-1}+{2}^{-e},{2}^{l-e-1}-{2}^{-e}\right]\). On the other hand, given an encoded value \(x{\prime }\), the decoding process involves a simple computation of \(x=x{\prime }/{2}^{e}\), enabling the retrieval of the original parameter.
Consider two numbers, x, and y, that are shared among n nodes, denoted as \(\left[x\right]=\left\{{x}_{1},{x}_{2},\cdots ,{x}_{n}\right\}\) and \(\left[y\right]=\left\{{y}_{1},{y}_{2},\cdots ,{y}_{n}\right\}\), where each node \(i\) possesses \({x}_{i}\) and \({y}_{i}\). To compute the share of \(x+y\), each node independently computes \({x}_{i}+{y}_{i}\). Consequently, in Step19 of Algorithm 1, \({S}_{i}\) adds the local shares to obtain the sum of the local models. It is also straightforward to observe that constant multiplication is performed locally. Given a share \(\left[x\right]\) and a constant \(c\), each node can locally compute \({c\ast x}_{i}\) to obtain the share \(\left[cx\right]\). The \({\sum }_{i=1}^{n}{c\ast x}_{i}\) can be obtained through secret recovery as \(c{\sum }_{i=1}^{n}{x}_{i}\), which ultimately yields \(cx\). Notably, since the parameter \(K\) in Step19 is a constant value for the server, the computation for the average share can also be carried out locally.
The resilience of Algorithm 1 to client disconnection is evident. Considering that the participating data nodes in the training process often consist of unstable mobile edge devices, the privacy protection scheme must ensure the effectiveness of the training process even when nodes experience periods of disconnection. In the context of Algorithm 1, if a client becomes disconnected, it results in the absence of the share of the model update being sent to the server. In such scenarios, the client is treated as non-participating in the current round of training. Notably, since all shares are simultaneously transmitted to all servers after executing the SecShr algorithm, the algorithm assumes that no client can selectively send shares to specific servers. However, if such a situation occurs, the server can efficiently resolve it by conducting an additional round of communication. This additional round would confirm the source client IDs for all received shares and subsequently intersect them when performing the aggregation process.
Additionally, Algorithm 1 demonstrates compatibility with more intricate custom aggregation functions. While FedAvg utilizes a weighted average as the aggregation operation for parameters, more is needed to meet the demands of complex application requirements. For instance, to combat Byzantine attacks, some researchers have proposed the computation of the median among all client update values, which is then employed as the aggregation result. Unlike privacy-preserving schemes based on homomorphic encryption or function encryption that solely support linear aggregation operations, the proposed algorithm enables the computation of any complex aggregation function \(g({x}_{1},{x}_{2},\cdots ,{x}_{n})\). This is achieved by redefining Step19 of Algorithm 1 as \({M}_{j}^{r}\leftarrow \text{S}\text{e}\text{c}\text{C}\text{o}\text{m}\text{p}\left(g\right({M}_{{i}_{1},j}^{r},\cdots ,{M}_{{i}_{K},j}^{r}\left)\right)\), where the secure multi-party computation protocol \(\text{S}\text{e}\text{c}\text{C}\text{o}\text{m}\text{p}\) may introduce additional communication among servers. The volume and number of communication rounds in \(\text{S}\text{e}\text{c}\text{C}\text{o}\text{m}\text{p}\) are influenced by the specific aggregation function \(g\).
Lastly, a trusted center is often relied upon in existing federated learning frameworks. However, in this paper, using secure multi-party computation for parameter aggregation naturally extends to decentralized scenarios. In such scenarios, each party serves as a data node and a computation node, conducting local model training and assuming responsibility for secure parameter aggregation. Specifically, the parties involved are denoted as \(\left\{{C}_{1},{C}_{2},\cdots ,{C}_{n}\right\}=\left\{{S}_{1},{S}_{2},\cdots ,{S}_{m}\right\}\), where \(m\) equals \(n\). Adopting a (\(n,n\))-threshold secret sharing scheme ensures that each party is not required to place trust in other participants, and their respective parameters cannot be reconstructed.