Improvements include...
Add GatherND operator (#1089)
Add lane reduction (#1180)
Expose get_queue method for context in API (#1161)
ReverseSequence op (#1177)
Refactor Pooling and implement ONNX LpPool and GlobalLpPool (#1152)
Reduce with runtime compilation (#1150)
Half2 overloads (#1157)
Fix file download for resnet50 example (#1164)
Fix problem with incomplete types with older clang versions (#1174)
Fix out-of-bounds access when generate uses nonpacked tensors (#1160)
parallelize the ref implementation of the gemm operator (#1142)
scatter operator refactoring to include reduction (#1124)
fix a bug in create tensor_view with vec data type (#1155)
Fix comparisons in migraphx::value class (#1146)
Python Binding for the Manual Graph Buidling (#1143)