v0.2.3
Key Changes
- New approaches: feddyn, oort, FedBalancer, synchronous hierarchical FL with coordinator, coordinated asyncfl, and differential privacy
- Metrics collection support: collect cpu/gpu utilization, ram and vram usage, bytes sent and received and the time to execute functions
- Local registry support: model and metrics are saved in a local directory, which allows experiments without mlflow
- misc improvements:
- support for channel leave functionality in the p2p backend
- support for simplifying tasklet composition primitives
- protobuf package version upgrade to be compatible with tensorflow v2.12.0
- deployer: fix bug to update pod status incorrectly
- fix a bug that removes model updates from a disk cache when the total cache data is larger than its size
- fix openapi specification mismatch between a released image and the source
What's Changed
- example/implementation for Oort by @jaemin-shin in #369
- feddyn implementation pytorch by @GustavBaumgart in #370
- update for Oort: overcommitment support by @jaemin-shin in #372
- fix: remove non-orchestration mode by @openwithcode in #374
- update for Oort: accurate calculation of round duration with timestamp by @jaemin-shin in #375
- feat: replica support for roles by @openwithcode in #376
- fix: remove errors regarding group association to SDK config files by @jaemin-shin in #378
- doc: update supported algorithms/mechanisms by @myungjin in #377
- fix: missing delta weights computation in middle aggregator by @myungjin in #382
- fix: incorrect groupAssociation in lib example configs by @myungjin in #383
- refactor: move syncfl files by @myungjin in #384
- Synchronous orchestrator architecture by @elqurio in #379
- feat: per-channel backend support by @myungjin in #381
- fix: hardening coordinated syncfl by @myungjin in #385
- refactor: auto-formatting sdk files with black by @myungjin in #386
- misc: example for synchronous hierachical FL with coordinator by @myungjin in #388
- feat+refactor+fix: per-channel backend support in SDK by @myungjin in #387
- fix: zero weight upload from trainer in hybrid mode by @jaemin-shin in #389
- feat: optimize composer's extensibility by @myungjin in #390
- Remove unnecessary config examples by @elqurio in #391
- example/implementation for FedBalancer, with a new sampler category by @jaemin-shin in #380
- refactor: remove legacy code by @myungjin in #393
- feat: coordination revision by @myungjin in #394
- updated feddyn implementation pytorch by @GustavBaumgart in #392
- chore: syncfl_hier_coord_mnist schema update by @myungjin in #395
- feat: channel leave functionality for p2p backend by @myungjin in #398
- feat: additional optimization for channel leave by @myungjin in #399
- Impove tasklet composition extensibility by @elqurio in #397
- Move examples to the root of the lib/python dir by @elqurio in #400
- feddyn algorithm correction by @GustavBaumgart in #401
- fix: slow tx task termination by @myungjin in #402
- Fix alias already exists for coord_syncfl by @elqurio in #406
- Monitor deployed tasks and update status in case of crashes by @openwithcode in #404
- new feature: add differential privacy by @jaemin-shin in #403
- feat: coordinated asyncfl by @myungjin in #407
- fix: Resolve the get tasks command by job ID to return the expected tasks details of a job by @openwithcode in #412
- fix: refactor examples tests by @openwithcode in #409
- bump: upgrade version of go to 1.18 by @openwithcode in #411
- feat: track tasklet runtime by @GustavBaumgart in #405
- feat: communication cost metric collection by @GustavBaumgart in #414
- documentation: fix typos in README.md and docs by @jaemin-shin in #416
- doc: add citation for flame arxiv paper by @myungjin in #417
- refactor: optimize create job task of control plane by @jaemin-shin in #415
- fix: disable automatic eviction of diskcache at aggregation by @jaemin-shin in #419
- feat: utilization/memory usage on CPU and GPU by @GustavBaumgart in #420
- fix: GPU/CPU monitoring termination by @GustavBaumgart in #424
- fix: protobuf version error by @GustavBaumgart in #425
- feat: local registry by @GustavBaumgart in #421
- fix: fix oort utility calculation and add medmnist example for oort by @jaemin-shin in #423
- refactor: monitoring scope adjustment in deployer by @myungjin in #427
- feat: local registry timestamp by @GustavBaumgart in #426
- misc: add debug message in deployer by @myungjin in #428
- fix: incorrect task status update by deployer by @myungjin in #429
- fix: check payload before accumulating bytes received by @GustavBaumgart in #430
- fix: deployer pod state update by @myungjin in #431
New Contributors
- @jaemin-shin made their first contribution in #369
Full Changelog: v0.2.2...v0.2.3