github salesforce/TransmogrifAI 0.7.0

3 years ago

Bug fixes:

  • Fix flaky ModelInsight tests #407
  • Remove logging of tokens of text fields #420, #438, #447, #474
  • Add validation prepare call before model selection when no DAG is passed #424, #429
  • Fix Days.daysBetween int overflow #471

New features / updates:

  • Downsample the number of training samples to maxTrainingSample for regression #413 and multi-class classification #414
  • Refactor InsightLOCOTest #412
  • Enable more loss types for OpLinearRegression #421
  • Add property-based tests for regression model selection #427
  • Add option to calculate LOCO for dates/texts by leaving out their entire vector #418
  • Add Chinese and Korean examples to TextTokenizerTest #442
  • Add support for ignoring text that looks like IDs in SmartTextVectorizer #448, #455
  • Add a unary estimator for detecting names in text fields and transforming to likely gender #445
  • Allow result features to be removed by raw feature filter #458
  • Metadata changes for sensitive feature information #457
  • Add MinVarianceFilter which checks that computed features have a minimum variance #463, #465
  • Allow TextStats length distribution to be token-based and refactor for testability #464
  • Use Spark job grouping to distinguish steps of the machine learning flow #467, #468, #470
  • Add categorical detection to be coverage based in addition to unique count based #473
  • Remove duplicate features using sanity checker feature to feature correlations #476, #479
  • Lift the upper bound on number of hash features #477
  • Enable Html stripping on text-like features #478

Dependency updates (#402, #466):

  • Update Apache Spark version to 2.4.5
  • Avro is a built-in data source in Spark 2.4, so no longer using the spark-avro package
  • Avro to 1.8.2
  • XGBoost to 0.90
  • MLeap to 0.14.0
  • json4s to 3.5.3
  • JUnit to 4.12
  • chill to 0.9.3
  • gradle-avro-plugin to 0.16.0

Miscellaneous:

  • Add ROADMAP.md #394

Don't miss a new TransmogrifAI release

NewReleases is sending notifications on new releases.