There were 115 papers selected for the systematic review. Table 1 shows the study characteristics and attack performance from 111 papers that included quantitative assessments of MIAs, the remaining papers described concepts and theories without experiments. Each paper presented at least one experiment with a performance metric above 50%, with 81.1% of them achieving attack performance ≥ 75%, and 50.5% reaching attack performance ≥ 90%. Of the 115 papers, 42 met inclusion for the meta-analysis, providing evidence on attack performance from 1,910 experiment scenarios (Fig. 1). The average attack performance from these experiments was: accuracy, 65.0% (95%CI: 64.0, 66.0%); recall (sensitivity), 74.0% (95%CI: 72.0%, 76.0%); precision, 65.0% (95%CI: 64%, 67%); F1 score, 63.0% (95%CI: 60.0%, 65.0%); and AUC, 61.0% (95%CI: 60.0%, 63.0%), with variation across training dataset and target model type (Table 2). Health data was used in 32.4% of the papers in the systematic review2–35, and 52.4% of papers in the meta-analysis3–5,9,11–14,16–28,31. The most analyzed health dataset was Texas Inpatient Hospital data (18 studies), with performance ranging from accuracy of 70.0% (95%CI: 68%, 73%), to recall (sensitivity) of 88.0% (95%CI: 82.0%, 94%)3,5,9,11,12,14,16,17,19–25,31,36,37. Review of the literature highlighted 3 overarching concepts related to the vulnerability of ML models to MIA. The first is retention of training data in the ML model35,36,38,40–42,45. The second is related to the extent to which single data points can influence model decisions, which is most often a consequence of the model architecture36,38. And third, the distributions in the training data38,39,41–43. These explanations are not mutually exclusive, as each may contribute to the success of a privacy attack to varying degrees.
3.1 Retention of Training Data in ML Models
A model consists of functions that map the vector of feature values to the output, which is typically a class label in classification models. Model parameters are the variables within these equations that are adjusted during the training process to minimize the difference between the model's output and the true values in the data. This is typically done by iteratively adjusting the parameters using an optimization algorithm, until the model achieves the maximal performance. Through this data-driven process, the model can become a comprehensive representation of the training data, retaining details that make them vulnerable to privacy breaches2,4,8,31,33,35,36,38,40–42,45–47. Retention of data in ML models is related to three concepts: over-parameterization, overfitting, and memorization33,35,36,38,40–42,45.
A model is considered over-parameterized if it contains more parameters than necessary to accurately fit the training data for a given task40,42. The more parameters a model has, the greater the capacity to learn the underlying patterns and values in the data, including noise33,35,40–42. In supervised learning, memorization refers to explicit mapping of inputs to their respective outputs, rather than learning the underlying patterns and dependencies in the data8,31,41,42,45. An input value is memorized if its output is predicted with higher confidence than expected given its frequency in the dataset and if its correct output cannot be predicted based on the distributions in the dataset when the value is excluded from the training data41,42,45. This phenomenon has predominantly been identified in deep neural networks, which are often highly complex with an unnecessarily high capacity to retain training data8,38. Memorization of input values may be essential for high model performance when the underlying data are heavily skewed, or when there is insufficient data to derive patterns41,42. However, the stark contrast in confidence values for outputs when memorized values are present versus absent from the training data also leaves these models vulnerable to MIAs31,41,42. If the model becomes too large and complex relative to the amount of training data, it can lead to overfitting2,4,33,35,38,39,41,42,45. Overfitting is a phenomenon where the model performs better on its training data than on unseen data, also known as generalization error2,4,33,35,38,39,41,42,45. This is often measured with the generalization gap, calculated as the difference between the training and test performance.
The number of model parameters was not reported in the reviewed papers. We estimated the effect of generalization gap for studies (n = 27) and scenarios (n = 1,138) with data on training and validation accuracy. Through recursive partitioning, the following categories were defined within which there was minimal variance in attack performance: accuracy < 17.4%, 17.4–34.1%, ≥ 34.1%; recall and precision < 12.4%, 12.4–24.2%, ≥ 24.2%; and AUC < 17.1%, 17.1–34.4%, ≥ 34.4%. Table 3 shows the adjusted attack performance for different magnitudes of generalization gap by target model type, and the adjusted difference in attack performance by generalization gap. The most robust estimates of the effect of generalization gap on attack performance were for neural networks, and accuracy of attacks on decision trees (Table 3). While higher generalization gap was associated with higher attack performance, the magnitude of the effect varied, with a larger effect on neural networks than other model architectures (Table 3).
Table 3
Adjusted attack performance for different magnitudes of generalization gap by target model type, and the adjusted difference in attack performance by generalization gap
Target Model Type
|
Study References
|
Generalization Gap (%)
|
n (%)
|
Adjusted Attack Performance
% (95%CI)
|
|
Adjusted Difference in Attack Performance
% (95%CI)
|
Attack Accuracy
|
|
Decision Tree
|
2,21,75
|
< 17.4
|
52 (74.9)
|
70.5 (64.0, 77.1)
|
|
ref
|
2,21
|
17.4–34.1
|
11 (15.1)
|
75.3 (67.7, 82.8)
|
|
4.7 (-1.3, 10.7)
|
2,21
|
≥ 34.1
|
8 (11.0)
|
77.2 (69.3, 85.1)
|
|
6.6 (0.4, 12.9)
|
Neural Network
|
2,5,11,40,75,78,87,91,93,108,109,112,113
|
< 17.4
|
190 (67.9)
|
56.3 (53.0, 59.5)
|
|
ref
|
5,11,40,43,44,80,85,91,108,113
|
17.4–34.1
|
51 (18.2)
|
64.7 (61.0, 68.4)
|
|
8.5 (4.7, 12.2)
|
2,43,89,91,108,109
|
≥ 34.1
|
39 (13.9)
|
78.8 (73.6, 84.0)
|
|
22.5 (17.1, 28.0)
|
Regression
|
2
|
< 17.4
|
1 (25.0)
|
74.7 (58.5, 91.0)
|
|
ref
|
2
|
17.4–34.1
|
2 (50.0)
|
93.0 (80.2, 100)
|
|
18.2 (-0.5, 37.0)
|
2
|
≥ 34.1
|
1 (25.0)
|
91.1 (74.3, 100)
|
|
16.4 (-6.3, 39.0)
|
k-Nearest Neighbour
|
2
|
< 17.4
|
1 (25.0)
|
55.9 (37.2, 74.7)
|
|
ref
|
2
|
17.4–34.1
|
1 (25.0)
|
59.6 (40.8, 78.3)
|
|
3.7 (-13.4, 20.7)
|
2
|
≥ 34.1
|
2 (50.0)
|
58.0 (44.7, 71.3)
|
|
2.1 (-20.9, 25.0)
|
Attack Recall (Sensitivity)
|
|
Neural Network
|
5,37,81,82,87,113
|
< 12.4
|
57 (32.4)
|
64.4 (55.0, 73.9)
|
|
ref
|
5,81–83
|
12.4–24.2
|
82 (46.6)
|
70.1 (60.7, 79.4)
|
|
5.6 (-2.3, 13.6)
|
77,81–83,113
|
≥ 24.2
|
37 (21.0)
|
76.2 (65.9, 86.5)
|
|
11.8 (2.0, 21.5)
|
Random Forest
|
87
|
< 12.4
|
1 (50.0)
|
69.3 (33.4, 100)
|
|
ref
|
87
|
12.4–24.2
|
1 (50.0)
|
80.9 (45.0, 100)
|
|
11.6 (-38.5, 61.6)
|
|
≥ 24.2
|
0
|
-
|
|
-
|
Attack Precision
|
|
Decision Trees
|
2
|
< 12.4
|
1 (33.3)
|
81.9 (54.4, 109.5)
|
|
ref
|
2
|
12.4–24.2
|
1 (33.3)
|
73.9 (46.3, 101.4)
|
|
-8.1 (-47.0, 30.9)
|
2
|
≥ 24.2
|
1 (33.3)
|
92.2 (64.7, 119.7)
|
|
10.3 (-28.7, 49.2)
|
Neural Network
|
2,22,37,37,38,82,82,83,87,113
|
< 12.4
|
71 (35.9)
|
63.1 (58.5, 67.7)
|
|
ref
|
5,22,81–83
|
12.4–24.2
|
66 (33.3)
|
62.8 (58.1, 67.4)
|
|
-0.4 (-5.4, 4.7)
|
2,22,38,77,81–83,113
|
≥ 24.2
|
61 (30.8)
|
67.7 (62.8, 72.5)
|
|
4.6 (-1.1, 10.2)
|
Regression
|
2
|
< 12.4
|
1 (33.3)
|
83.0 (56.2, 109.7)
|
|
ref
|
2
|
12.4–24.2
|
1 (33.3)
|
65.0 (38.2, 91.8)
|
|
-18.0 (-55.2, 19.3)
|
2
|
≥ 24.2
|
1 (33.3)
|
76.2 (49.5, 103.0)
|
|
-6.7 (-44.0, 30.5)
|
Target Model Type
|
Study References
|
Generalization Gap (%)
|
n (%)
|
Adjusted Attack Performance
% (95%CI)
|
|
Adjusted Difference in Attack Performance
% (95%CI)
|
Attack Precision Continued
|
|
Random Forest
|
87
|
< 12.4
|
1 (50.0)
|
65.0 (37.5, 92.5)
|
|
ref
|
87
|
12.4–24.2
|
1 (50.0)
|
56.0 (28.5, 83.5)
|
|
-9.0 (-48.0, 30.0)
|
|
≥ 24.2
|
0
|
-
|
|
-
|
Attack AUC
|
|
Neural Network
|
38,40,44,83,84,93,109
|
< 17.1
|
162 (87.6)
|
63.9 (58.0, 69.9)
|
|
ref
|
38,40,44,84
|
17.1–34.4
|
18 (9.73)
|
63.9 (56.7, 71.0)
|
|
0.0 (-5.6, 5.5)
|
109
|
≥ 34.4
|
5 (2.7)
|
91.4 (68.4, 100)
|
|
27.5 (3.8, 51.2)
|
Table 4: Adjusted attack performance for different magnitudes of partitioned density by target model type, and the adjusted difference in attack performance by partitioned density
Target Model Type
|
Study References
|
Partitioned Density
|
n (%)
|
Adjusted Attack Performance
% (95%CI)
|
|
Adjusted Difference in Attack Performance
% (95%CI)
|
Attack Accuracy
|
Decision Tree
|
2,21,75,86
|
<6.7
|
16 (22.2)
|
86.4 (76.9, 95.8)
|
|
ref
|
2,20,86
|
6.7-16.8
|
23 (31.9)
|
77.2 (67.9, 86.5)
|
|
-9.2 (-22.4, 4.1)
|
2,20,21,86
|
16.8-33.1
|
24 (33.3)
|
79.5 (69.9, 89.1)
|
|
-6.9 (-20.3, 6.6)
|
2,20,21,86,87
|
≥33.1
|
9 (12.5)
|
63.9 (55.1, 72.6)
|
|
-22.5 (-35.3, -9.6)
|
Neural Network
|
2,3,5,11,24,43,44,76,79,80,85,88,90,91,93,108,109,114,116
|
<6.7
|
237 (59.9)
|
70.9 (67.2, 74.6)
|
|
ref
|
2,5,20,108
|
6.7-16.8
|
14 (3.5)
|
67.4 (57.8, 77.1)
|
|
-3.5 (-13.8, 6.8)
|
5,20,40,93,113
|
16.8-33.1
|
50 (12.6)
|
60.7 (51.7, 69.7)
|
|
-10.2 (-19.9, -0.5)
|
5,20,87,88,90,93,113,115
|
≥33.1
|
95 (24.0)
|
53.9 (47.5, 60.3)
|
|
-17.0 (-24.4, -9.6)
|
Regression
|
2,79
|
<6.7
|
3 (27.3)
|
76.1 (65.9, 86.3)
|
|
ref
|
2,20
|
6.7-16.8
|
3 (27.3)
|
76.7 (65.8, 87.6)
|
|
0.6 (-14.3, 15.6)
|
2,20
|
16.8-33.1
|
2 (18.2)
|
71.3 (59.2, 83.3)
|
|
-4.8 (-20.6, 11.0)
|
2,20
|
≥33.1
|
3 (27.2)
|
54.4 (43.7, 65.1)
|
|
-21.7 (-36.6, -6.9)
|
Naïve Bayes
|
2
|
<6.7
|
2 (28.6)
|
60.7 (47.0, 74.4)
|
|
ref
|
2
|
6.7-16.8
|
3 (42.9)
|
45.1 (33.3, 56.9)
|
|
-15.6 (-33.7, 2.5)
|
2
|
16.8-33.1
|
1 (14.3)
|
50.6 (34.0, 67.2)
|
|
-10.1 (-31.6, 11.5)
|
2
|
≥33.1
|
1 (14.3)
|
53.8 (37.5, 70.1)
|
|
-6.9 (-28.2, 14.4)
|
k-Nearest Neighbour
|
2
|
<6.7
|
1 (20.0)
|
56.1 (40.0, 72.3)
|
|
ref
|
2
|
6.7-16.8
|
2 (40.0)
|
50.5 (37.6, 63.5)
|
|
-5.6 (-26.3, 15.1)
|
2
|
16.8-33.1
|
1 (20.0)
|
53.4 (36.8, 70.0)
|
|
-2.7 (-25.9, 20.4)
|
2
|
≥33.1
|
1 (20.0)
|
54.8 (38.5, 71.1)
|
|
-1.3 (-24.3, 21.6)
|
Attack Recall (Sensitivity)
|
Decision Tree
|
|
<4.2
|
0
|
|
|
|
20
|
4.2-13.7
|
1 (20.0)
|
71.7 (36.1, 100)
|
|
ref
|
20
|
13.7-25.6
|
1 (20.0)
|
58.3 (22.6, 94.0)
|
|
-13.5 (-58.1, 31.1)
|
20,87
|
≥25.6
|
3 (60.0)
|
62.6 (39.9, 85.3)
|
|
-9.1 (-51.5, 33.2)
|
Neural Network
|
3,5,41,76,82,83,112,119
|
<4.2
|
180 (65.0)
|
79.5 (72.3, 86.7)
|
|
ref
|
5,20,37,83,112,119
|
4.2-13.7
|
46 (16.6)
|
63.5 (51.8, 75.1)
|
|
-16.0 (-29.8, -2.3)
|
5,20,83
|
13.7-25.6
|
24 (8.7)
|
69.0 (51.2, 86.8)
|
|
-10.6 (-29.8, 8.6)
|
5,20,37,77,82,87,112,113,119
|
≥25.6
|
55 (18.0)
|
66.0 (56.8, 75.2)
|
|
-13.5 (-25.2, -1.9)
|
Target Model Type
|
Study References
|
Partitioned Density
|
n (%)
|
Adjusted Attack Performance
% (95%CI)
|
|
Adjusted Difference in Attack Performance
% (95%CI)
|
Attack Recall (Sensitivity) Continued
|
Regression
|
112,119
|
<4.2
|
5 (29.4)
|
86.3 (68.0, 100)
|
|
ref
|
20,112
|
4.2-13.7
|
5 (38.5)
|
60.0 (40.5, 79.5)
|
|
-26.3 (-53.0, 0.4)
|
20
|
13.7-25.6
|
1 (7.7)
|
67.0 (31.3, 100)
|
|
-19.3 (-59.4, 20.8)
|
20,112
|
≥25.6
|
2 (15.4)
|
74.8 (57.6, 92.0)
|
|
-11.5 (-36.6, 13.6)
|
Random Forest
|
112,119
|
<4.2
|
5 (45.5)
|
66.5 (48.2, 84.8)
|
|
ref
|
112
|
4.2-13.7
|
4 (36.4)
|
66.9 (45.1, 88.8)
|
|
0.4 (-28.0, 28.9)
|
|
13.7-25.6
|
0
|
|
|
|
87,112
|
≥25.6
|
2 (18.2)
|
57.2 (40.3, 74.1)
|
|
-9.3 (-34.2, 15.6)
|
Support Vector Machine
|
112
|
<4.2
|
4 (33.3)
|
88.7 (68.1, 100)
|
|
ref
|
112
|
4.2-13.7
|
4 (33.3)
|
47.0 (25.1, 68.8)
|
|
-41.7 (-71.8, -11.7)
|
|
13.7-25.6
|
0
|
|
|
|
112
|
≥25.6
|
4 (33.3)
|
65.8 (45.3, 86.2)
|
|
-22.9 (-52.0, 6.1)
|
Attack Precision
|
Decision Tree
|
2
|
<4.2
|
1 (11.1)
|
92.2 (65.2, 100)
|
|
ref
|
20
|
4.2-13.7
|
1 (11.1)
|
85.5 (59.3, 100)
|
|
-6.7 (-44.3, 31.0)
|
2,20
|
13.7-25.6
|
2 (22.2)
|
78.6 (59.5, 97.6)
|
|
-13.7 (-46.7, 19.4)
|
2,20,87
|
≥25.6
|
5 (55.6)
|
59.1 (46.4, 71.8)
|
|
-33.1 (-62.9, -3.2)
|
Neural Network
|
2,5,22,41,76,82,83,112,112,119
|
<4.2
|
193 (60.3)
|
71.9 (67.7, 76.0)
|
|
ref
|
5,5,20,22,83,112,119
|
4.2-13.7
|
45 (14.1)
|
58.6 (51.7, 65.6)
|
|
-13.2 (-21.2, -5.3)
|
2,5,20,83
|
13.7-25.6
|
25 (7.8)
|
58.5 (48.1, 68.8)
|
|
-13.4 (-24.5, -2.3)
|
2,5,20,37,77,82,83,87,112,113,119
|
≥25.6
|
57 (17.8)
|
56.7 (51.1, 62.3)
|
|
-15.2 (-22.2, -8.2)
|
Regression
|
2,112,119
|
<4.2
|
6 (28.6)
|
70.1 (56.6, 83.6)
|
|
ref
|
20,112
|
4.2-13.7
|
5 (23.8)
|
60.0 (44.6, 75.5)
|
|
-10.1 (-30.6, 10.4)
|
2,20
|
13.7-25.6
|
2 (9.5)
|
76.5 (57.7, 95.2)
|
|
6.4 (-16.7, 29.4)
|
20,20,112
|
≥25.6
|
8 (38.1)
|
57.1 (45.6, 68.6)
|
|
-13.0 (-30.7, 4.7)
|
Random Forest
|
112,119
|
<4.2
|
5 (33.)
|
65.0 (37.5, 92.5)
|
|
ref
|
112
|
4.2-13.7
|
4 (26.7)
|
56.0 (28.5, 83.5)
|
|
-9.0 (-48.0, 30.0)
|
|
13.7-25.6
|
0
|
|
|
|
87,112
|
≥25.6
|
6 (40.0)
|
56.4 (42.9, 69.9)
|
|
-9.3 (-29.9, 35.2)
|
Target Model Type
|
Study References
|
Partitioned Density
|
n (%)
|
Adjusted Attack Performance
% (95%CI)
|
|
Adjusted Difference in Attack Performance
% (95%CI)
|
Attack Precision Continued
|
Naïve Bayes
|
2
|
<4.2
|
1 (25.0)
|
50.4 (24.5, 76.3)
|
|
ref
|
2
|
4.2-13.7
|
0
|
|
|
|
2
|
13.7-25.6
|
1 (25.0)
|
51.0 (24.8, 77.2)
|
|
0.6 (-36.2, 37.4)
|
2
|
≥25.6
|
2 (50.0)
|
50.3 (31.9, 68.8)
|
|
-0.1 (-31.8, 31.7)
|
k-Nearest Neighbour
|
2
|
<4.2
|
1 (25.0)
|
60.1 (33.1, 87.1)
|
|
ref
|
2
|
4.2-13.7
|
0
|
|
|
|
2
|
13.7-25.6
|
1 (25.0)
|
55.4 (28.4, 82.4)
|
|
-4.8 (-42.9, 33.4)
|
2
|
≥25.6
|
2 (50.0)
|
52.5 (33.4, 71.6)
|
|
-7.7 (-40.7, 25.4)
|
Attack Precision Continued
|
Support Vector Machine
|
112
|
<4.2
|
4 (33.3)
|
50.6 (31.5, 69.7)
|
|
ref
|
112
|
4.2-13.7
|
4 (33.3)
|
53.4 (34.3, 72.5)
|
|
2.8 (-24.2, 29.8)
|
|
13.7-25.6
|
0
|
|
|
|
112
|
≥25.6
|
4 (33.3)
|
58.9 (39.9, 78.0)
|
|
8.3 (-18.7, 35.3)
|
Attack AUC
|
Neural Network
|
44,83,93,109
|
<4.3
|
77 (42.3)
|
72.2 (65.4, 79.0)
|
|
ref
|
83,84
|
4.3-12.6
|
8 (4.4)
|
63.6 (49.3, 78.0)
|
|
-8.6 (-24.4, 7.3)
|
40,83,93
|
12.6-25.1
|
52 (28.6)
|
60.8 (49.8, 71.8)
|
|
-11.4 (-24.4, 1.5)
|
83,84,93
|
≥25.1
|
45 (24.7)
|
56.2 (46.1, 66.3)
|
|
-16.0 (-28.1, -3.8)
|
Table 5: Relative and absolute importance of partitioned density and generalization gap by model architecture.
Target Model Architecture
|
|
Study References
|
Scenarios
[n]
|
|
Variable Importance
|
Relative Importance
|
|
Absolute Importance
|
Partitioned Density
|
Generalization Gap
|
Partitioned Density
|
Generalization Gap
|
Attack Accuracy
|
Decision Tree
|
|
2,21,75
|
33
|
|
1.0
|
0.28
|
|
0.62
|
0.18
|
Neural Network
|
|
2,5,11,40,43,44,80,85,87,91,93,108,109,112,113
|
260
|
|
0.30
|
1.0
|
|
0.39
|
1.29
|
Regression
|
|
2
|
4
|
|
1.0
|
0.74
|
|
0.05
|
0.04
|
k-Nearest Neighbour
|
|
2
|
4
|
|
1.0
|
0.52
|
|
0.08
|
0.04
|
Naïve Bayes
|
|
2
|
6
|
|
1.0
|
0.93
|
|
0.27
|
0.26
|
Attack Recall (Sensitivity)
|
Neural Network
|
|
5,37,77,82,83,87,113
|
159
|
|
1.0
|
0.52
|
|
1.48
|
0.77
|
Attack Precision
|
Neural Network
|
|
2,5,22,37,77,82,83,87,113
|
173
|
|
1.0
|
0.35
|
|
1.46
|
0.51
|
Attack AUC
|
Neural Network
|
|
40,44,83,84,93,109
|
182
|
|
0.67
|
1.0
|
|
0.49
|
0.73
|
3.2 Training Data Attributes
There are several training data attributes that are hypothesized to affect vulnerability to attacks4,23,28,36,38,39,41–43,47,58,59. Overall, the dimensionality of the data matters28,36,38,39,47,58. This is often discussed in reference to the number of classes for classification tasks, with higher numbers being associated with higher privacy risk, but the concept can be extended to independent variables23,36,59. There are two main explanations for this issue. First, in terms of the dependent variable, this translates into decision boundaries that are closely positioned around the training instances, which increases the probability of a single instance having an observable impact on the model’s decision boundary36. More broadly, higher dimensional data means a larger matrix of potential values over which the training data are distributed36. If the independent variables are too high dimensional relative to the number of training records, there is a greater likelihood of unique combinations of variable values among members of the training dataset36. Second, a higher number of classes requires the training algorithm to use more information to distinguish between classes, increasing the importance of more variables in determining the class of a given instance and subsequently the amount of information about each individual that can be extracted from the model23.
Partitioned density was used to combine the effects of the number of records, classes and features from the literature, accounting for the relationship between these factors. The following categories were defined with recursive partitioning: Accuracy: <6.7, 6.7–16.8, 16.8–33.1, ≥ 33.1; Recall and Precision: <4.2, 4.2–13.7, 13.7–25.6, ≥ 25.6; and AUC: <4.3, 4.3–12.6, 12.6–25.1, ≥ 25.1. Data were insufficient for analysis of F1 score. Table 4 shows the estimated effect of partitioned density on attack performance, adjusted for target model type and generalization gap, and training dataset and study as random effects. Data were insufficient for precise estimation of the effect of partitioned density on attack performance for most model types, however, evidence shows reduced attack performance with increasing partitioned density for all model architectures, to varying magnitudes (Table 4). The largest effects were seen for decision trees, followed by regression models and neural networks (Table 4). Considering model architecture, the estimated importance of generalization gap and partitioned density on attack performance is shown in Table 5. Partitioned density was associated with greater reduction in RSS than generalization gap, indicating more importance in predicting attack performance for all model architectures and performance metrics, except for neural networks when performance was measured with accuracy or AUC (Table 5).
Another attribute of training data hypothesized to influence privacy risk is variable distributions4,23,39,41–43,59. Skewed data, characterized by an imbalance of records across classes/categories, or long-tailed distributions of continuous variables, are associated with increased privacy risk, particularly for those in the minority classes or with rare values 4,23,39,41,42,59. The theoretical explanation for this is that the small number of records with those values leads to overfitting and/or memorization23,39,41,42. The extent to which imbalanced groups or skewed distributions lead to overfitting or memorization depends on the model architecture and can be modified by training parameters39,41,42. And finally, a concept related to both previous hypotheses for classification tasks is the degree of variation within and between classes36. Less variation within classes is thought to be associated with reduced privacy risk, as single instances are less likely to impact decision boundaries, compared to situations where classes are comprised of individuals with varied values36. Conversely, a low level of variation between classes may result in higher privacy risk, since tighter decision boundaries may be more influenced by single data points23,36,39. The data required to estimate the effect of these factors in the meta-analysis was not reported in the reviewed literature.
3.3 Model Architecture
Model architecture is the underlying structure of the model, which determines how input data is processed to make predictions or classifications, and has a strong influence on the privacy risk12,23,35,36,41,42,45,60. There are several theories on why model architecture contributes to privacy risk. First, architecture determines the complexity and capacity of the model12,23,41,42,45,60. This theory is particularly applicable to neural networks, and specifically deep or convolutional neural networks, which are designed to capture highly complex relationships in the training data12,23,41,42,45,60. Second, model architecture influences how the model determines optimal decision boundaries36. Architectures that are more sensitive to specific data points or distributions are more vulnerable to privacy attacks36,38. Conversely, if a model’s decision is unlikely to be impacted by the presence/absence of particular inputs, they will be more resilient to attacks36,38. For example, a naïve bayes algorithm estimates the probability of a given class for each variable independently, and subsequently, assuming sufficient sample size, single training data members have only marginal influence on those probabilities36. Conversely, models like decision trees or support vector machines (SVM) are highly influenced by specific datapoints36. A SVM functions by identifying specific datapoints that define the boundary between classes, known as support vectors, which are therefore intrinsic to the model and can be exploited in privacy attacks36. Similarly, decision trees use the values of input variables and the relationships between them to determine the decision boundaries36. With this architecture, a unique combination of features may lead to a new branch, or otherwise modify the decision of the model36. Third, some model architectures have known assumptions and error distributions, which can be leveraged by an adversary when evaluating model predictions during a MIA35. This applies to models in the regression family, where the error distribution is expected to be Gaussian or Bernoulli.
Evidence from this analysis was consistent with these explanations, with the highest effects of decision trees, regression models and neural networks, which were modified by partitioned density and generalization gap (Tables 3 and 4). The most robust estimates of model-specific effects were for attack accuracy. Relative to neural networks, decision trees and regression models were associated with adjusted increases in attack accuracy of 15.5% (95%CI:5.5, 25.4%) and 18.5% (95%CI:2.0, 35.0%), respectively for sparse data (partitioned density < 6.7). Attack accuracy was also higher on regression models relative to neural networks at generalization gaps ≥ 34.1% (12.3%; 95%CI:-5.2, 29.9%). Conversely, at generalization gap ≥ 34.1%, decision trees had lower attack accuracy than neural networks (-1.6%; 95%CI:-11.1, 7.7). However, the confidence intervals leave uncertainty about the direction and magnitude of these effects.