github hpcaitech/ColossalAI v0.3.5
Version v0.3.5 Release Today!

latest releases: v0.3.7, v0.3.6
2 months ago

What's Changed

Release

Llama

Moe

Lr-scheduler

Eval

Gemini

Fix

Checkpointio

  • [checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

Extension

Doc

Tests

Accelerator

Workflow

Feat

Nfc

Hotfix

Sync

Shardformer

  • [shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
  • [shardformer] llama support DistCrossEntropy (#5176) by flybird11111
  • [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
  • [shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
  • [shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
  • [shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

Npu

Pipeline

  • [pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
  • [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
  • [pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
  • [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

Colossal-llama-2

  • [Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224) by Tong Li
  • [Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878) by Yuanchen

Devops

Colossaleval

  • [ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169) by Yuanchen

Colossalqa

Plugin

  • [plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) by flybird11111

Feature

Inference

  • [inference] refactor examples and fix schedule (#5077) by Hongxin Liu
  • [inference] update examples and engine (#5073) by Xu Kai
  • [inference] Refactor inference architecture (#5057) by Xu Kai
  • [Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) by Jianghai

Hotfix/hybridengine

  • [hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074) by Bin Jia
  • [hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) by Bin Jia

Misc

Kernels

Exampe

  • [exampe] fix llama example' loss error when using gemini plugin (#5060) by flybird11111

Pipeline,shardformer

  • [pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when strict=False, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) by Elsa Granger

Full Changelog: v0.3.5...v0.3.4

Don't miss a new ColossalAI release

NewReleases is sending notifications on new releases.