In the reviewed literature on defending against poisoning attacks in FL, all approaches, including those in9,43,44, are implemented solely on the server without any assistance from clients. However, it is reasonable to collaborate with clients who are the first beneficiaries of defending against poisoning in FL to achieve this goal. This work proposes SpyShield, a new approach to defend against poisoning attacks in FL inspired by Spyfall, which is a social deduction game of game dynamics similar to that present in the challenge of defending FL against poisoning.
This section is divided into four subsections. The first subsection briefly explains Spyfall and the resemblance between it and the problem of identifying adversaries in FL. The three remaining subsections discuss the details of SpyShield’s implementation.
Analogy
1) Social deduction games: These games involve players who are divided into two categories: an uninformed majority and an informed minority. The minority are informed which players belong to which group. The majority’s goal is to uncover the true identity of the minority members, while the minority aims to deceive the majority into thinking they are also part of the uninformed group. One of the oldest and most well-known examples is Mafia, a classic party game. Detecting poisoning in FL could be modeled as a social deduction game where the first uninformed minority is analogous to honest clients, and the second informed majority could be the set of adversaries.
2) Spyfall: Spyfall is a social deduction party game that shares striking similarities with the FL scenario. Unlike some other social deduction games, all the players are uninformed about who belongs to which group.
The game begins with each player receiving a card face-down. The card either indicates a location (e.g., beach, nightclub, circus, museum) or is simply labeled ”Spy.” The majority of the players receive the cards with the same location, such as ”Beach,” while the minority receive the cards labeled ”Spy.” Since all cards are dealt face-down, none of the players know which group any other player belongs to. The majority aims to identify the spies, while the spies attempt to guess the location. Players then take turns asking each other questions to assess if the other is a spy without giving away the location. For example, if the location is the beach, one could ask, ”Do you typically go to that location in the morning or at night?” as demonstrated in Fig. 1.

Visualization of the Spyfall gameplay, in which player (client) 3 is a spy, hence answering player (client) one’s question wrong.
After asking enough questions, before the timer runs out or the spies guess the location, players can vote on who they believe are the spies. All players get a vote, including the spies, meaning that spies actively participate in the discussion and influence other players’ votes. If the players voted on are indeed the spies, the majority group wins. If the players happen to vote off a client who is not a spy, the minority group of spies wins.
3) SpyShield Question: Analogously, in FL, we need to decide which clients, a minority, are malicious, analogous to the spies. The question that the server needs to ask a client to determine whether another client’s local model is poisoned is what this subsection discusses.
The server needs to ask a client referred to as the tester, ”What is the accuracy of this model?” where ’this model’ refers to a local model trained by another suspected client. The suspected client is referred to as the testee, while the client testing the model is referred to as the tester. The server asks multiple testers to evaluate the same testee’s model and averages their results. This process is repeated for every client as visualized in Fig. 2.

Visualization of the testing process using 4 clients.
The clients are then ordered based on their individual accuracies, and those falling below the median accuracy are isolated and excluded from aggregation, as visualized in Fig. 3. The median is chosen as the filtering threshold rather than the average to enhance robustness against adversarial clients. In scenarios where multiple adversaries exist, using the average can be misleading, as a few low-performing malicious clients can significantly skew the value. In contrast, the median is more resilient to outliers and better reflects the central tendency of the honest clients’ performance. This design aligns with SpyShield’s assumption that the majority of the clients are honest, as in9,10,45.

Visualization showing clients sorted by their average accuracy and filtered using the median of these values as a threshold.
One issue with this approach is a vulnerability in which a tester could infer the testee’s data–analogous to revealing the location in the game Spyfall.
This challenge is mitigated by aggregating the testee’s local model with those of other clients, introducing noise through this aggregation. The testee and these other clients are referred to as a ”group.” The group’s aggregated model is tested instead of the testee’s individual local model, as visualized in Fig. 4. This approach is similar to differential privacy, but instead of adding controlled noise, it disguises individual models by aggregating them with other models. As a result, no single client’s training data can be reconstructed. Furthermore, while this method does not prevent reconstruction attacks on federated learning (FL) as a whole, it ensures that the mechanism itself does not introduce vulnerabilities to such attacks.

Visualization of the updated testing process on client 3, with an average accuracy of 70%.
For the remainder of the methodology, a more comprehensive discussion is provided on the nuances of SpyShield.

Flow chart visualization of the SpyShield -highlighted in red- in the FL pipeline.
As illustrated in Fig. 5, SpyShield can be divided into three steps:
-
1.
Grouping and Mapping: This subsection discusses the criteria used for grouping clients and 1 assigning them to testers to ensure fairness in testing. It also details the algorithms used to enforce these criteria.
-
2.
Aggregation and testing: This subsection discusses how a group’s local models are aggregated and how the aggregated models are sent to their respective testers to be tested.
-
3.
Evaluation: This is the final step in SpyShield. Test results are collected from the testers and evaluated on the server to determine which clients are honest and should be included in the new global model aggregation.”
Grouping and mapping
As discussed earlier in Subsection “Analogy”, each client’s local model is aggregated with the local models of other clients. The resulting aggregated model is then tested by another client. The clients whose local models are aggregated are collectively referred to as a group. In the example previously shown in Fig. 4, notice the following criteria:
-
1.
The client being evaluated (client 3) is a member of all groups in the set of groups evaluating it. The client being evaluated is referred to as the testee, and the set of groups being evaluated is referred to as the grouping.
-
2.
Each client, except the testee, belongs to one and only one group in the grouping.
-
3.
The tester of a group must not be a member of that group.
The fact that each client, except the testee, belongs to one and only one group in the grouping mandates that the number of groups G in a grouping follows the equation:
$$\begin{aligned} G = \frac{T – 1}{c – 1}, \end{aligned}$$
(1)
where \(T\) is the total number of clients and \(c\) is the number of clients in a group.
The following working example of a SpyShield round with 7 clients should elaborate further.
First, groups are formed for each client (testee) according to the discussed criteria. Fig. 6 shows the grouping for the first testee. As per Eq. 1, the number of groups G is 3. Fig. 7 shows all the groups formed for each client.

Visualization of the first testee’s grouping.

Visualization of all groupings.
Second, each group is mapped to a respective tester who is not a member of that group. Fig. 8 shows the first testee’s group mappings.

Visualization of the first testee’s group mappings.
Figure 9 shows all group mappings. Notice that each client (tester) tests exactly three clients in this example. Assigning an equal number of models to each tester ensures fairness, especially during evaluation, discussed later in this section. Also, each tester tests at most one group from a grouping.

Visualization of all group mappings.
The constraints now amount to the following five constraints:
-
1.
The client being evaluated is a member of all groups in its grouping.
-
2.
Each client, except the testee, belongs to one and only one group in a testee’s grouping.
-
3.
The tester of a group must not be a member of that group.
-
4.
All clients (testers) are assigned to test an equal number of groups.
-
5.
A tester tests at most one group from each grouping.
1) Algorithms: This subsection explains the algorithms used to produce the groups and mappings with the aforementioned constraints and their complexities. There are two independent algorithms: one for grouping and one for mapping.
The grouping algorithm, Algorithm 1, is a simple nested loop of time and space complexity of \(O(T \times G \times c)\).
In the outer loop, for each client (testee), create a set containing all the other clients and refer to it as the remaining clients. In the inner loop for each group in the testee’s grouping, create one group by concatenating the current client with \(c-1\) random clients from the set of remaining clients. Then, make sure to remove those clients from the set so that they are not included in the next group.
The time complexity of randomly selecting a client from the remaining set of clients and removing it is O(1), assuming a dynamic array implementation is used where one could swap the selected client with the last client in the array and remove it, since the ordering of the clients is not required.

The grouping Algorithm 1 guarantees meeting the first two constraints defined in Subsection “Grouping and Mapping”. After the grouping algorithm is executed, the groups are mapped to their respective testers using the mapping Algorithm 2.
The mapping algorithm attempts to balance the distribution of assignments. It begins by initializing a set, referred to as frequencies, which stores tuples representing each client (tester) along with a counter tracking how many groups they have been assigned to, initially set to zero. The algorithm then iterates through the given groupings, which consist of multiple groups. For each group, it selects a client with the smallest assignment count from a subset of frequencies, ensuring that the chosen client is not already part of the group and that no other group in the same grouping is assigned to them. This approach helps distribute group assignments among clients while avoiding conflicts. Although the algorithm does not strictly guarantee perfectly equal distribution, it minimizes imbalances by always prioritizing clients with the lowest frequency.

Moreover, the mapping algorithm, Algorithm 2, ensures that the third and fifth constraints are always satisfied. However, the fourth constraint–ensuring that each tester is assigned an equal number of groups–is not strictly enforced. For instance, when applying this algorithm to the working example in Fig. 9, where each tester should ideally be assigned exactly three groups, the outcome may vary. Specifically, five testers might receive three groups as expected, while one tester could receive only two groups and another tester could receive four. Nevertheless, this level of imbalance is acceptable for SpyShield.
In terms of complexity, the algorithm has a space complexity of \(O(T \times G)\). The time complexity varies depending on the priority queue implementation used for managing the frequencies set. It ranges from \(O(T \times M)\) for a basic implementation to \(O(T \times M \times \log (T))\) when using a binary heap or a Fibonacci heap.
Aggregation and testing
Following the completion of the grouping and mapping phases, this section discusses the aggregation algorithm and how the aggregated models are prepared to be tested at the respective testing clients, as shown in Subsection “Grouping and Mapping”.
1) Aggregation Bias: The aggregation of the clients’ weights in the groups is not carried out using FedAvg, as the goal of this aggregation originally is not fairness. The objective of this aggregation is to camouflage the weights of the testee client with the weights of other clients to safeguard from inferring an individual client’s training data. To ensure that the testee’s weights still meaningfully influence the test’s results, a bias towards the testee’s weights is introduced during the aggregation. This subsection elaborates on the specific implementation of this biased aggregation technique.

Visualization of a group with no bias, \(B=1\).

Visualization of a group with bias, \(B=2\).
As presented in the working example earlier in this section, each group is constructed from 3 clients’ weights, as shown in Fig. 10, including the testee. Assume a bias of a factor of two towards the testee. This assumes that this client’s weights are included twice in the aggregation, as shown in Fig. 11. Generally, for any bias greater than zero, the weights are aggregated as follows:
$$\begin{aligned} W = \frac{1}{c + B – 1} (w_{1} \times B + \sum _{i=2}^{c} w_{i} ), \end{aligned}$$
(2)
where \(W\) is the aggregated weights, \(c\) is the number of clients being aggregated, \(w_i\) are the weights of the locally trained model of the \(i^{th}\) client in the group, and \(B\) is the bias towards the client intended to be tested, which is the first client in the group, \(w_{1}\).
2) Model Clustering: Once the weights of each group have been aggregated, the resulting aggregates are downloaded to their corresponding testing clients for evaluation. Prior to this, the aggregates are clustered according to the testing clients Fig. 12, enabling the server to efficiently deliver all assigned aggregates to a tester in a single batch. This streamlined approach minimizes communication overhead between the server and the participating clients, allowing SpyShield to initiate the evaluation process with a single communication round.

Visualization of model clustering.
The second challenge SpyShield faces is that the testers themselves could be compromised as well. Moreover, if the tester is compromised, they could falsely suggest that the model they are testing is inaccurate, misleading the server into believing that the testee is the compromised one. The following subsection presents the details of evaluating the honesty of the clients as both testers and testees.
For the remainder of this work, the clients who were compromised during training are referred to as dishonest clients, while clients who were compromised during testing (during their duty as testers) are referred to as unreliable clients. This work assumes that an unreliable client is not necessarily a dishonest one and vice versa.
Evaluation
Following testing the aggregated models on their respective testers and uploading the results to the server, the testees are evaluated to differentiate reliable testers from unreliable testers. The results of the reliable testers are then harnessed to evaluate the testees’ honesty. This subsection first discusses evaluating the reliability of the testers and then evaluates the honesty of the testees.
Referring to the example in Fig. 8, assume that client (testee) number 1 is honest and that client (tester) 7’s test data is poisoned. Therefore, assuming all other clients in the group are honest as well, the accuracy of testing the first group is below the median accuracy because of the tester’s unreliability. Now, assume that all the clients assigned to test models 7, 6, and 5 are unreliable. Then, all of the tests would result in accuracies below the median, in turn deceiving the server into unfairly flagging client (testee) number 1 as poisoned.
To overcome this challenge, we follow the same intuition as the one we have introduced in Subsubsection “Analogy”. The server averages the results received from each tester and sorts them. Testers whose average results fall below the median are deemed unreliable, and their results are not considered when evaluating clients’ honesty. In the working example shown in Fig. 13, clients (testers) are going to be ordered as follows: 5, 3, 1, 4, 7, 6, and 5. Hence, clients 5, 3, and 1 are going to be considered unreliable, and their results won’t be used to evaluate the testees.

Visualization of testers’ average results in which client (tester) 5 is an unreliable tester.
After filtering out unreliable clients, the results received from reliable testers are grouped by testee and averaged, as shown in Fig. 14. The averages are sorted, and the testees whose average is below the median are flagged as dishonest and removed from the aggregation used to update the global model. In the working example, clients (testees) are going to be ordered as follows: 2, 1, 3, 4, 5, 6, and 7.

Visualization of testees’ average results.
Observe Fig. 15, in which the average results of poisoned clients are distributed below the median.

The distribution of mean accuracy per testee in a simulation with 100 clients, 40 of which are poisoned with random weights.
The end output is the set of honest clients to be aggregated and used to construct the new global model used in the next run of the Federated Training round.
Code availability
SpyShield is implemented as a single Jupyter Notebook simulation, publicly available on GitHub (https://doi.org/10.5281/zenodo.16785226). This single Notebook file contains the full algorithm implemented in PyTorch and everything required to run the simulation, and it was also used to generate the results in the following section.
This concludes SpyShield’s methodology. The following section discusses the experiments conducted to observe how the number of groups per client, as discussed in Subsection “Grouping and Mapping”, affects SpyShield’s behavior and proves the robustness of SpyShield against poisoning attacks.
