Release Note
重要声明
- 此版本为测试版,还在迭代中,目前还没有稳定,后续API会根据反馈有可能进行不兼容的升级。对于想要体验飞桨最新特性的开发者,欢迎试用此版本;对稳定性要求高的工业级应用场景推荐使用Paddle 1.8稳定版本。
- 此版本主推命令式编程模式(动态图)的开发方式,并提供了高层API的封装。命令式编程模式具有很好的灵活性,高层API可以大幅减少重复代码。对于初学者或基础的任务场景,推荐使用高层API的开发方式,简单易用;对于资深开发者想要实现复杂的功能,推荐使用命令式编程模式的API,灵活高效。
- 此版本同时对飞桨的API目录体系做了优化,原目录下API会建立alias仍然可用,但建议新的程序使用新目录结构。
基础框架
基础API
-
组网类API实现动静统一,支持在命令式编程模式和声明式编程模式(静态图)两种模式下运行
-
API目录结构调整,Paddle 1.x 版本的API主要位于paddle.fluid目录,本版本对API目录结构进行调整,使得分类更为合理,具体调整规则如下:
- 原fluid.layers下跟tensor操作相关的API移动到paddle.tensor目录
- 原fluid.layers下跟组网相关的操作移动到paddle.nn目录,带有参数的类型放到paddle.nn.layers目录,函数式的API放到paddle.nn.functional目录
- 原fluid.dygraph下命令式编程模式专用API移动到paddle.imperative目录
- 创建paddle.framework目录,用来存放跟框架相关的Program, Executor等API
- 创建paddle.distributed目录,用来存放分布式相关的API
- 创建paddle.optimizer目录,用来存放优化算法相关的API
- 创建paddle.metric目录,用来创建评估指标计算相关的API
- 创建paddle.incubate目录,用来存放孵化中的代码,其中的API有可能会发生调整,该目录存放了复数计算complex和高层API相关的代码
- 所有在paddle.tensor和paddle.framework目录下的API,在paddle目录下创建别名,比如:paddle.tensor.creation.ones可以使用paddle.ones别名
-
新增API如下:
- 在paddle.nn目录新增8个组网类的API: interpolate, LogSoftmax, ReLU, Sigmoid, loss.BCELoss, loss.L1Loss, loss.MSELoss, loss.NLLLoss
- 在paddle.tensor目录新增59个Tensor相关API:add, addcmul, addmm, allclose, arange, argmax, atan, bmm, cholesky, clamp, cross, diag_embed, dist, div, dot, elementwise_equal, elementwise_sum, equal, eye, flip, full, full_like, gather, index_sample, index_select, linspace, log1p, logsumexp, matmul, max, meshgrid, min, mm, mul, nonzero, norm, ones, ones_like, pow, randint, randn, randperm, roll, sin, sort, split, sqrt, squeeze, stack, std, sum, t, tanh, tril, triu, unsqueeze, where, zeros, zeros_like
- 新增device_guard用来指定设备,新增manual_seed用来初始化随机数种子
-
部分原fluid目录下API,并没有迁移到paddle目录下
- 原fluid.contrib目录下API,保留在原位置,未迁移:BasicGRUUnit, BasicLSTMUnit, BeamSearchDecoder, Compressor, HDFSClient, InitState, QuantizeTranspiler, StateCell, TrainingDecoder, basic_gru, basic_lstm, convert_dist_to_sparse_program, ctr_metric_bundle, extend_with_decoupled_weight_decay, fused_elemwise_activation, fused_embedding_seq_pool, load_persistables_for_increment, load_persistables_for_inference, match_matrix_tensor, memory_usage, mixed_precision.AutoMixedPrecisionLists, mixed_precision.decorate, multi_download, multi_upload, multiclass_nms2, op_freq_statistic, search_pyramid_hash, sequence_topk_avg_pooling, shuffle_batch, tree_conv, var_conv_2d
- 原LodTensor相关的API,目前还在开发中,暂未迁移:LoDTensor, LoDTensorArray, create_lod_tensor, create_random_int_lodtensor, DynamicRNN, array_length, array_read, array_write, create_array, ctc_greedy_decoder, dynamic_gru, dynamic_lstm, dynamic_lstmp, im2sequence, linear_chain_crf, lod_append, lod_reset, sequence_concat, sequence_conv, sequence_enumerate, sequence_expand, sequence_expand_as, sequence_first_step, sequence_last_step, sequence_mask, sequence_pad, sequence_pool, sequence_reshape, sequence_reverse, sequence_scatter, sequence_slice, sequence_softmax, sequence_unpad, tensor_array_to_tensor
- 原fluid下分布式相关API,目前还在开发中,暂未迁移
- 原fluid目录以下API,将在高层API中重新实现,未迁移:nets.glu, nets.img_conv_group, nets.scaled_dot_product_attention, nets.sequence_conv_pool, nets.simple_img_conv_pool
- 原fluid目录以下API,有待进一步完善,暂未迁移:dygraph.GRUUnit, layers.DecodeHelper, layers.GreedyEmbeddingHelper, layers.SampleEmbeddingHelper, layers.TrainingHelper, layers.autoincreased_step_counter, profiler.cuda_profiler, profiler.profiler, profiler.reset_profiler, profiler.start_profiler, profiler.stop_profiler
- 原fluid目录以下API不再推荐使用,未迁移:DataFeedDesc, DataFeeder, clip.ErrorClipByValue, clip.set_gradient_clip, dygraph_grad_clip.GradClipByGlobalNorm, dygraph_grad_clip.GradClipByNorm, dygraph_grad_clip.GradClipByValue, initializer.force_init_on_cpu, initializer.init_on_cpu, io.ComposeNotAligned.with_traceback, io.PyReader, io.load_params, io.load_persistables, io.load_vars, io.map_readers, io.multiprocess_reader, io.save_params, io.save_persistables, io.save_vars, io.xmap_readers, layers.BasicDecoder, layers.BeamSearchDecoder, layers.Decoder, layers.GRUCell, layers.IfElse, layers.LSTMCell, layers.RNNCell, layers.StaticRNN, layers.Switch, layers.While, layers.create_py_reader_by_data, layers.crop, layers.data, layers.double_buffer, layers.embedding, layers.fill_constant_batch_size_like, layers.gaussian_random_batch_size_like, layers.get_tensor_from_selected_rows, layers.load, layers.merge_selected_rows, layers.one_hot, layers.py_reader, layers.read_file, layers.reorder_lod_tensor_by_rank, layers.rnn, layers.uniform_random_batch_size_like, memory_optimize, release_memory, transpiler.memory_optimize, transpiler.release_memory
高层API
- 新增paddle.incubate.hapi目录,对模型开发过程中常见的组网、训练、评估、预测、存取等操作进行封装,实现低代码开发,MNIST手写数字识别任务对比命令式编程模式实现方式,高层API可减少80%执行类代码。
- 新增Model类封装,继承Layer类,封装模型开发过程中常用的基础功能,包括:
- 提供prepare接口,用于指定损失函数和优化算法
- 提供fit接口,实现训练和评估,可通过callback方式实现训练过程中执行自定义功能,比如模型存储等
- 提供evaluate接口,实现评估集上的预测和评估指标计算
- 提供predict接口,实现特定的测试数据推理预测
- 提供train_batch接口,实现单batch数据的训练
- 新增Dataset接口,对常用数据集进行封装,支持数据的随机访问
- 新增常见Loss和Metric类型的封装
- 新增CV领域Resize, Normalize等16种常见的数据处理接口
- 新增CV领域lenet, vgg, resnet, mobilenetv1, mobilenetv2图像分类骨干网络
- 新增NLP领域MultiHeadAttention, BeamSearchDecoder, TransformerEncoder, TransformerDecoder , DynamicDecode API接口
- 发布基于高层API实现的12个模型,Transformer,Seq2seq, LAC,BMN, ResNet, YOLOv3, , VGG, MobileNet, TSM, CycleGAN, Bert, OCR
性能优化
- 新增
reshape+transpose+matmul
fuse,使得Ernie量化后 INT8 模型在原来基础上性能提升约4%(6271机器),量化后INT8模型相比未经过DNNL优化(包括fuses等)和量化的FP32模型提速约6.58倍。
调试分析
- 针对Program打印内容过于冗长,在调试中利用效率不高的问题,大幅简化Program、Block、Operator、Variable等对象的打印字符串,不损失有效信息的同时提升调试效率
- 针对第三方库接口
boost::get
不安全,运行中抛出异常难以调试的问题,增加BOOST_GET
系列宏替换了Paddle中600余处存在风险的boost::get
,丰富出现boost::bad_get
异常时的报错信息,具体增加了C++报错信息栈,出错文件及行号、期望输出类型和实际类型等,提升调试体验
Bug修复
- 修复while loop中存在slice操作时计算结果错误的bug
- 修复inplace ops引起的transformer 模型性能下降问题
- 通过完善cache key, 解决Ernie精度测试最后一个batch运行失败的问题
- 修复fluid.dygraph.guard等context中出现异常时无法正确退出的问题
2.0 alpha Release Note
Important Statements
- This version is a beta version. It is still in iteration and is not stable at present. Incompatible upgrade may be subsequently performed on APIs based on the feedback. For developers who want to experience the latest features of Paddle, welcome to this version. For industrial application scenarios requiring high stability, the stable Paddle Version 1.8 is recommended.
- This version mainly popularizes the imperative programming development method and provides the encapsulation of high-level APIs. The imperative programming(dynamic graph) mode has great flexibility and high-level APIs can greatly reduces duplicated codes. For beginners or basic task scenarios, the high-level API development method is recommended because it is simple and easy to use. For senior developers who want to implement complex functions, the imperative programming API is commended because it is flexible and efficient.
- This version also optimizes the Paddle API directory system. The APIs in the original directory can create an alias and are still available, but it is recommended that new programs use the new directory structure.
Basic Framework
Basic APIs
-
Networking APIs achieve dynamic and static unity and support operation in imperative programming and declarative programming modes(static graph)
-
The API directory structure is adjusted. In the Paddle Version 1.x, the APIs are mainly located in the paddle.fluid directory. This version adjusts the API directory structure so that the classification is more reasonable. The specific adjustment rules are as follows:
- Moves the APIs related to the tensor operations in the original fluid.layers directory to the paddle.tensor directory
- Moves the networking-related operations in the original fluid.layers directory to the paddle.nn directory. Puts the types with parameters in the paddle.nn.layers directory and the functional APIs in the paddle.nn.functional directory
- Moves the special API for imperative programming in the original fluid.dygraph directory to the paddle.imperative directory
- Creates a paddle.framework directory that is used to store framework-related program, executor, and other APIs
- Creates a paddle.distributed directory that is used to store distributed related APIs
- Creates a paddle.optimizer directory that is used to store APIs related to optimization algorithms
- Creates a paddle.metric directory that is used to create APIs related to evaluation index calculation
- Creates a paddle.incubate directory that is used to store incubating codes. APIs may be adjusted. This directory stores codes related to complex number computation and high-level APIs
- Creates an alias in the paddle directory for all APIs in the paddle.tensor and paddle.framework directories. For example, paddle.tensor.creation.ones can use paddle.ones as an alias
-
The added APIs are as follows:
- Adds eight networking APIs in the paddle.nn directory: interpolate, LogSoftmax, ReLU, Sigmoid, loss.BCELoss, loss.L1Loss, loss.MSELoss, and loss.NLLLoss
- Adds 59 tensor-related APIs in the paddle.tensor directory: add, addcmul, addmm, allclose, arange, argmax, atan, bmm, cholesky, clamp, cross, diag_embed, dist, div, dot, elementwise_equal, elementwise_sum, equal, eye, flip, full, full_like, gather, index_sample, index_select, linspace, log1p, logsumexp, matmul, max, meshgrid, min, mm, mul, nonzero, norm, ones, ones_like, pow, randint, randn, randperm, roll, sin, sort, split, sqrt, squeeze, stack, std, sum, t, tanh, tril, triu, unsqueeze, where, zeros, and zeros_like
- Adds device_guard that is used to specify a device. Adds manual_seed that is used to initialize a random number seed
-
Some of the APIs in the original fluid directory have not been migrated to the paddle directory
- The following API under fluid.contrib directory are kept in the original location, not migrated:BasicGRUUnit, BasicLSTMUnit, BeamSearchDecoder, Compressor, HDFSClient, InitState, QuantizeTranspiler, StateCell, TrainingDecoder, basic_gru, basic_lstm, convert_dist_to_sparse_program, ctr_metric_bundle, extend_with_decoupled_weight_decay, fused_elemwise_activation, fused_embedding_seq_pool, load_persistables_for_increment, load_persistables_for_inference, match_matrix_tensor, memory_usage, mixed_precision.AutoMixedPrecisionLists, mixed_precision.decorate, multi_download, multi_upload, multiclass_nms2, op_freq_statistic, search_pyramid_hash, sequence_topk_avg_pooling, shuffle_batch, tree_conv, var_conv_2d
- The following APIs related to LodTensor are still under development and have not been migrated yet:LoDTensor, LoDTensorArray, create_lod_tensor, create_random_int_lodtensor, DynamicRNN, array_length, array_read, array_write, create_array, ctc_greedy_decoder, dynamic_gru, dynamic_lstm, dynamic_lstmp, im2sequence, linear_chain_crf, lod_append, lod_reset, sequence_concat, sequence_conv, sequence_enumerate, sequence_expand, sequence_expand_as, sequence_first_step, sequence_last_step, sequence_mask, sequence_pad, sequence_pool, sequence_reshape, sequence_reverse, sequence_scatter, sequence_slice, sequence_softmax, sequence_unpad, tensor_array_to_tensor
- The following APIs related to distributed training are still under development, not migrated yet
- The following APIs in fluid.nets directory will be implemented with high level API, not migrated:nets.glu, nets.img_conv_group, nets.scaled_dot_product_attention, nets.sequence_conv_pool, nets.simple_img_conv_pool
- The following APIs are to be improved, not migrated:dygraph.GRUUnit, layers.DecodeHelper, layers.GreedyEmbeddingHelper, layers.SampleEmbeddingHelper, layers.TrainingHelper, layers.autoincreased_step_counter, profiler.cuda_profiler, profiler.profiler, profiler.reset_profiler, profiler.start_profiler, profiler.stop_profiler
- The following APIs are no longer recommended and are not migrated:DataFeedDesc, DataFeeder, clip.ErrorClipByValue, clip.set_gradient_clip, dygraph_grad_clip.GradClipByGlobalNorm, dygraph_grad_clip.GradClipByNorm, dygraph_grad_clip.GradClipByValue, initializer.force_init_on_cpu, initializer.init_on_cpu, io.ComposeNotAligned.with_traceback, io.PyReader, io.load_params, io.load_persistables, io.load_vars, io.map_readers, io.multiprocess_reader, io.save_params, io.save_persistables, io.save_vars, io.xmap_readers, layers.BasicDecoder, layers.BeamSearchDecoder, layers.Decoder, layers.GRUCell, layers.IfElse, layers.LSTMCell, layers.RNNCell, layers.StaticRNN, layers.Switch, layers.While, layers.create_py_reader_by_data, layers.crop, layers.data, layers.double_buffer, layers.embedding, layers.fill_constant_batch_size_like, layers.gaussian_random_batch_size_like, layers.get_tensor_from_selected_rows, layers.load, layers.merge_selected_rows, layers.one_hot, layers.py_reader, layers.read_file, layers.reorder_lod_tensor_by_rank, layers.rnn, layers.uniform_random_batch_size_like, memory_optimize, release_memory, transpiler.memory_optimize, transpiler.release_memory
High-level APIs
- Adds a paddle.incubate.hapi directory. Encapsulates common operations such as networking, training, evaluation, inference, and access during the model development process. Implements low-code development. Uses the imperative programming implementation mode of MNIST task comparison. High-level APIs can reduce 80% of executable codes.
- Adds model-type encapsulation. Inherits the layer type. Encapsulates common basic functions during the model development process, including:
- Provides a prepare API that is used to specify a loss function and an optimization algorithm
- Provides a fit API to implement training and evaluation. Implements the execution of model storage and other user-defined functions during the training process by means of callback
- Provides an evaluate interface to implement the inference and evaluation index calculation on the evaluation set
- Provides a predict interface to implement specific test data inference
- Provides a train_batch interface to implement the training of single-batch data
- Adds a dataset interface to encapsulate commonly-used data sets and supports random access to data
- Adds encapsulation of common Loss and Metric types
- Adds 16 common data processing interfaces including Resize and Normalize in the CV field
- Adds lenet, vgg, resnet, mobilenetv1, and mobilenetv2 image classification backbone networks in the CV field
- Adds MultiHeadAttention, BeamSearchDecoder, TransformerEncoder, TransformerDecoder, and DynamicDecode APIs in the NLP field
- Releases 12 models based on high-level API implementation, including Transformer, Seq2seq, LAC, BMN, ResNet, YOLOv3, VGG, MobileNet, TSM, CycleGAN, Bert, and OCR
Performance Optimization
- Adds a
reshape+transpose+matmul
fuse so that the performance of the INT8 model is improved by about 4% (on the 6271 machine) after Ernie quantization. After the quantization, the speed of the INT8 model is increased by about 6.58 times compared with the FP32 model on which DNNL optimization (including fuses) and quantization are not performed
Debugging Analysis
- To solve the problem of program printing contents being too lengthy and low utilization efficiency during debugging, considerably simplifies the printing strings of objects such as programs, blocks, operators, and variables, thus improving the debugging efficiency without losing effective information
- To solve the problem of insecure third-party library APIs
boost::get
and difficulty in debugging due to exceptions during running, adds theBOOST_GET
series of macros to replace over 600 riskyboost::get
in Paddle. Richens error message duringboost::bad_get
exceptions. Specifically, adds the C++ error message stack, error file and line No., expected output type, and actual type, thus improving the debugging experience
Bug Fixes
- Fix the bug of wrong computation results when any slice operation exists in the while loop
- Fix the problem of degradation of the transformer model caused by inplace ops
- Fix the problem of running failure of the last batch in the Ernie precision test
- Fix the problem of failure to correctly exit when exceptions occur in context of fluid.dygraph.guard