Action Required
- KFServing has added object selector on pod mutator webhook configuration which requires minimally Kubernetes 1.15 to take effect.
- The generated KFServing InferenceService openAPI schema validation now includes markers like
x-kubernetes-list-map-keys
andx-kubernetes-map-type
which requires minimally Kubernetes 1.16, if you are on kubernetes 1.15 or lower version please install KFServing with--validate=false
flag. - Tensorrt inference server has been renamed to Triton inference server, if you are using
tensorrt
predictor on inference service yaml please rename totriton
. - KFserving has removed the default percentage based queue proxy resource limit due to #844, please set queue proxy requests/limits in the knative
config-deployment.yaml
config map which is introduced in knative 0.16 or add the queue proxy resource limit annotation if you are on lower version and your cluster has resource quota turned on, we highly recommend upgrading linux kernel if you are hitting the same cpu throttling issue. - The default S3 credential name has been updated to follow the convention from
awsAccessKeyID
andawsSecretAccessKey
toAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
, if you have secrets configured with the old way please update accordingly. - KFServing has stopped maintaining the model server image versions in the configmap, user now can set the corresponding model server version on
runtimeVersion
field if you need the version different from the default.
New features
- Add batcher module as sidecar #847 @zhangrongguo
- Add Default LivenessProbe to Tensorflow Predictor #925 @salanki
- Remove framework image version list from configmap #917 @yuzisun
- Record Events when InferenceService goes in and out of readiness state #876 @ifilonenko
- Triton inference server rename and integrations #747 @deadeyegoodwin
- Alibi explainer upgrade to 0.4.0 #803 @cliveseldon
- Make default request logger url more flexible #837 @ryandawsonuk
- Allow customized url paths on data plane #907 @Iamlovingit
- Add object selector for KFServing pod mutator webhook configuration #893 @yuzisun
- Update logger to CloudEvents V1 protocol #886 @cliveseldon
- Set ContainerConcurrency to Parallelism #806 @salanki
Bug Fixes
- Disable retries in Istio VirtualService #807 @salanki
- Remove default queue proxy resource limit and Add KFServing benchmarking #894 @yuzisun
- Enhance SDK watch API to avoid traceback #889 @jinchihe
- Update KNative annotation when modifying minReplicas to 0 #963 @salanki
- Allow configurable region name when creating minio client #823 @harshavardhana
- Return 503 from healthhandler when model is not ready #818 @kolasanichaitanya
- Updated S3 credential variable names to commonly used en var names #704 @karlschriek
- Fix duplicated volume issue when attaching GCS secret #766 @kangwoo
Documentations
- Add BERT example for triton inference server integration #750 @yuzisun
- Add KFServing Debugging guide #829 @yuzisun
- Add new KFServing sample for GCP IAP #853 @owennewo
- Add KFServing on Kubeflow with Istio-Dex Example #821 #822 @sachua
- Add Outlier Detection and Drift Detection Examples #764 @cliveseldon
- Update pipeline sample to point to mnist e2e one #926 @animeshsingh
- Add custom gRPC sample #921 @Iamlovingit
- Add custom inference example using BentoML #800 @yubozhao
- Update KFServing roadmap for Q3/Q4 #861 @yuzisun