pypi DataProfiler 0.4.3
v0.4.3

latest releases: 0.13.4, 0.13.3, 0.13.2...
4 years ago

Runtime Changes

Migrating from v0.4.2 to v0.4.3 should result in a 30-90% reduction in profiling time.
Largely dependent on system resources and data size.

Notes

  • Remove requirement for tensorflow-addons
  • Library now works with tensorflow nightly (Python 3.9)
  • Added example on generating a new data labeler

Profiler

  • Multiprocessing data preprocessing
  • Improved histogram accuracy
  • Reduced histogram generation runtime
  • Option to set the bin count for histogram
  • Expanded precision and switch to precision estimation (as opposed to exact calculations)
  • Limit pool size based on cpu and memory limitations

Data

  • Improved JSON detection method
    • Option (default) pulls metadata and data separately (data.meta and data.data)
    • data.meta would be part of the JSON which contains no records
    • data.data would be part of the JSON which contains records
    • Added option to select keys which represent records

Report

  • Precision report now contains additional details
"precision": {
   'min': int,
   'max': int,
   'mean': float,
   'var': float,
   'std': float,
   'sample_size': int,
   'margin_of_error': float,
   'confidence_level': float		
},

Bug fixes

  • Fixed error in merging options
  • Fixed issue related to merging DateTimeColumns
  • Fixed multiprocessing on OSX
  • Fixed row calculations if min_true_samples is greater than zero

Don't miss a new DataProfiler release

NewReleases is sending notifications on new releases.