New Algorithms or improvements
- Improved SAR single node for top k recommendations. User can decide if the recommended top k items to be sorted or not.
New utilities or improvements
- Added data related utility functions like movielens data download in Python and PySpark.
- Added new data split method (timestamp based split) added.
New Notebooks or improvements
- Added an O16N notebook for Spark ALS movie recommender on Azure production services such as Databricks, Cosmos DB, and Kubernetes Services.
- Added SAR deep dive notebook with single-node implementation demonstrated.
- Added Surprise SVD deep dive notebook.
- Added Surprise SVD integration test.
- Added Surprise SVD ranking metrics evaluation.
- Made quick-start notebooks consistent in terms of running settings, i.e., experiment protocols (e.g., data split, evaluation metrics, etc.) and algorithm parameters (e.g., hyper parameters, remove seen items, etc.).
- Added a comparison notebook for easy benchmarking different algorithms.
Other features
- Updated SETUP with Azure Databricks.
- Added SETUP troubleshooting for Azure DSVM and Databricks.
- Updated READMEs under each notebook directory to provide comprehensive guidelines.
- Added smoke/integration tests on large movielens dataset (10mil and 20mil).
- Updated the Spark settings of CI/CD machine to eliminate unexpected build failures such as "no space left issue".