github dmlc/dgl v2.4.0

2 months ago

Highlights

  • DGL 2.4 documentation can be found here: https://www.dgl.ai/dgl_docs/index.html
  • distributed module is not imported in default when import dgl. Users need to import manually: import dgl.distributed.
  • DistNodeDataLoader and DistEdgeDataLoader are moved from dgl.dataloading to dgl.distributed. Users are recommended to call dgl.distributed.DistNode/EdgeDataLoader though dgl.dataloading.DistNode/EdgeDataLoader is still available. Such backward compatibility will be removed in next release.
  • GraphBolt examples are now in examples/graphbolt.
  • The users are now required to install GraphBolt's CUDA wheels if they have a CUDA enabled torch installation.
  • numpy 2.x is now supported.
  • torch 2.4 & CUDA 12.4 are now supported by @pyynb in #7629
  • Importing DGL does not cause an import of GraphBolt anymore. by @Rhett-Ying in #7676, #7756
  • GraphBolt does not depend on the deprecated torchdata package anymore and this release is incompatible with the torchdata package. by @frozenbugs in #7638, #7609, #7667, #7688
  • [GraphBolt][CUDA] Use better memory allocation algorithm to avoid OOM. by @mfbalin in #7618
  • [GraphBolt] GPU utilization has been maximized by eliminating all (known) GPU synchronizations: #7528, #7682, #7709, #7707, #7712, #7705, #7602, #7603, #7634, #7757 by @mfbalin.
  • [GraphBolt][io_uring] gb.DiskBasedFeature is now ready to use for out-of-core training: #7506, #7713, #7562, #7515, #7530, #7518 by @mfbalin.
  • [GraphBolt] Users are now recommended to use gb.numpy_save_aligned instead of numpy.save to save their features for out-of-core training. by @mfbalin in #7524
  • [GraphBolt] gb.CPUCachedFeature was added to speedup out-of-core training: #7492, #7508, #7520, #7526, #7525, #7531, #7537, #7538, #7581, #7723, #7644, #7731 and more by @mfbalin.
  • [GraphBolt] Feature fetching pipeline is fully parallelized by enabling all hardware components run concurrently: #7546, #7547, #7548, #7550, #7549, #7553, #7551, #7552, #7554, #7555, #7559, #7540 and more by @mfbalin.
  • [GraphBolt][Temporal] Temporal sampling support is extended with more samplers and GPU support: #7500, #7503, #7677, #7678 by @mfbalin.
  • [GraphBolt][CUDA] Sampling pipeline parallelism optimizations in #7714, #7665 and example use in #7702, #7669, #7664, #7662 by @mfbalin.
  • [GraphBolt][PyG] Add to_pyg for layer input conversion. by @mfbalin in #7745 and #7747.
  • [Feature] Fixed sampler with limit on sampled nodes/edges in batch subgraph by @ayushnoori in #6668
  • [GraphBolt] Refactor and extend FeatureStore. by @mfbalin in #7558
  • [dev] Several build and setup improvements by @Rhett-Ying in #7565, #7567, #7570, #7571, #7574, #7684
  • [GraphBolt][CUDA] gb.indptr_edge_ids. by @mfbalin in #7592, #7593
  • [GraphBolt] Allow using multiple processes for GraphBolt partition conversion by @thvasilo in #7497
  • [GraphBolt][CUDA] Update CCCL to 2.6.0. by @mfbalin in #7636
  • [Performance] Change hash table for performance. by @mfbalin in #7658, #7631
  • [GraphBolt][CUDA] Refactor overlap_graph_fetch, simplify gb.DataLoader. by @mfbalin in #7681, #7732
  • [Build] Organize cmake file by @mfbalin in #7715
  • [GraphBolt] Feature.count(). by @mfbalin in #7730

Bug Fixes

New Examples

  • [GraphBolt] Add DiskBasedFeature example for DGL model by @Liu-rj in #7624
  • [GraphBolt][PyG] Heterogenous example. by @mfbalin in #7722
  • [GraphBolt][PyG] Link prediction example. by @mfbalin in #7752

New built-in datasets:

  • [GraphBolt] igb-hom-[tiny|small|medium] variants of IGB datasets are added. by @BowenYao18 in #7717

New Contributors

Full Changelog: v2.3.0...v2.4.0

Don't miss a new dgl release

NewReleases is sending notifications on new releases.