A repository for implementing and simulating decentralized Graph Neural Network algorithms for classification of peer-to-peer nodes. Developed code supports the publication p2pGNN: A Decentralized Graph Neural Network for Node Classification in Peer-to-Peer Networks.
To generate a local instance of a decentralized learning device:
from decentralized.devices import GossipDevice
from decentralized.mergers import SlowMerge
from learning.nn import MLP
node = ... # a node identifier object (can be any object)
features = ... # feature vector, should have the same length for each device
labels = ... # one hot encoding of class labels, zeroes if no label is known
predictor = MLP(features.shape[0], labels.shape[0]) # or load a pretrained model with
device = GossipDevice(node, predictor, features, labels, gossip_merge=SlowMerge)
In this code, the type of the device (GossipDevice
) and the variable merge protocol
(SlowMerge
) work together to define a decentralized learning setting for
a Graph Neural Network that runs on unstructured peer-to-peer links
of uncertain availability. The communication network itself is the graph being analysed,
operating under the assumption that communicating peers are related (e.g.,
they could be friends in decentralized social networks).
Whenever possible (e.g. at worst, whenever devices send messages to the others for
other reasons) perform the following information exchange between linked devices
u
and v
:
send = u.send()
receive = v.receive(u.name, send)
u.ack(v.name, receive)
Clone the repository and install all dependencies with:
pip install -r requirements.txt
This will also install the infrastructure needed by torch geometric to automatically download data. Set up and run simulations on many devices automatically generated by existing datasets with the following code:
from decentralized.devices import GossipDevice
from decentralized.mergers import AvgMerge
from decentralized.simulation import create_network
dataset_name = ... # "cora", "citeseer" or "pubmed"
network, test_labels = create_network(dataset_name,
GossipDevice,
pretrained=False,
gossip_merge=AvgMerge,
gossip_pull=False,
seed=0,
min_communication_rate=0,
max_communication_rate=0.1)
for epoch in range(800):
network.round()
accuracy_base = sum(1. if network.devices[u].predict(False) == label else 0 for u, label in test_labels.items()) / len(test_labels)
accuracy = sum(1. if network.devices[u].predict() == label else 0 for u, label in test_labels.items()) / len(test_labels)
print(f"Epoch {epoch} \t Acc {accuracy:.3f} \t Base acc {accuracy_base:.3f}")
In the above snippet, datasets are automatically downloaded. Then, devices are instantiated from desired settings. Communication between pairs of linked devices occurs with probability [min_communication_rate, max_communication_rate] in each round; the probability is different between each pair of devices but does not change over time.
Everything runs on numpy because, at the time of the first implementation, adequate GPU memory was hard to find.
Find running implementations in the files experiments.py, with centralized equivalents in centralized_experiments.py. Publication experiments used the downloader of dgl. However, this is not working properly anymore and we have switched to torch geometric. Other than this, default settings match the publication.
Some merge schemes need a lot of memory to simulate. Reduce consumption
to a fraction of the original by moving from the default np.float64
numeric format to less
precise ones. Do so with the pattern demonstrated bellow, where the datatype is passed to the
dtype
argument of numpy array creation:
learning.optimizers.Variable.datatype = np.float16 # do this before calling `create_network`
The following parameters are provided for experimentation in the decentralized simulation. Several of the options are experimental.
Parameter | Option | Description |
---|---|---|
device_type | decentralized.devices.GossipDevice | A device that shares predictions with those it communicates. |
decentralized.devices.EstimationDevice | A device that is anonymous by sharing synthetically generated predictions that would emulate its own for some synthetic data generated by its local model. This is experimental and unstable. DO NOT USE. | |
decentralized.devices.EstimationDevice | A device that shares a corpus of synthetically generated predictions based on a similar strategy as above.. This is experimental and unstable. DO NOT USE. | |
classifier | torch.nn.MLP | Using multilayer perceptron as a base classifier. |
torch.nn.LR | Using logistic regression as a base classifier. | |
gossip_merge | decentralized.mergers.AvgMerge | (Default) When trying to perform gossip learning (not when pretrained), it averages each device's trained parameters with the parameters of its communicating neighbors. This is the standard gossip averaging algorithm. |
decentralized.mergers.FairMerge | A variation of the above that tries to converge to a quantity best estimating the truth average. This is experimental. | |
decentralized.mergers.TopologicalMerge | Another variation with the same goal as FairMerge. This is experimental. | |
decentralized.mergers.SlowMerge | Similar to AvgMerge but imposes slower convergence by maintaining a greater percentage of each node's learned parameters. | |
smoother | decentralized.mergers.NoSMooth | Default implementation of decentralized graph signals without any improvements. |
decentralized.mergers.Smoothen | Reduces the statistical bias of decentralized graph signals. This is experimental but promising. | |
decentralized.mergers.DecoupleNormalization | Decouples the order of magnitude with diffused values while diffusing decentralized graph signals. | |
pretrained | true/false | Whether the classifier's parameters should be pre-trained and shared, with the p2p architecture providing only a refinement (true). Otherwise, decentralized training protocols are employed. |
gossip_pull | true/false | Determines whether gossip training strategy should retrieve model parameters from a random device in the network (if true and not pretrained). This requires communication-on-request guarantees and is not very realistic for social media networks. |
Internally, decentralized graph signals are diffused by the decoupled GNN with the decentralized.mergers.PPRVariable
class.
The smoother
argument may improve the diffusion.
@article{krasanakis2022p2pgnn,
title={p2pgnn: A decentralized graph neural network for node classification in peer-to-peer networks},
author={Krasanakis, Emmanouil and Papadopoulos, Symeon and Kompatsiaris, Ioannis},
journal={IEEE Access},
volume={10},
pages={34755--34765},
year={2022},
publisher={IEEE}
}