github GoogleCloudPlatform/gcsfuse v3.3.0
Gcsfuse v3.3.0

one day ago

Buffered Read: Accelerate Large Sequential Reads

In this release, we are introducing the Buffered Read feature, designed to accelerate performance of applications that perform large, sequential reads on files stored in Google Cloud Storage. This is helpful for reading model weights during inference or large scale media processing applications that read data sequentially.

This feature improves throughput (2-5x) by asynchronously prefetching parts of a GCS object parallelly into an in-memory buffer, serving subsequent reads from this buffer instead of making network calls. This asynchronous and parallel buffering approach improves throughput by saturating network bandwidth without requiring additional application-side parallelism.

  • Feature Enablement: This feature is disabled by default and can be enabled with --enable-buffered-read flag or read:enable-buffered-read config. Buffered reads are ignored if file cache is enabled. Over time, we will work towards getting this enabled by default.
  • Use Cases: Single-threaded applications reading large (> 100MB) files sequentially. E.g. For reading models during inference (prediction) of an AI/ML model.
  • Memory usage:
    • Buffered readers will use CPU memory for storing the prefetched blocks.
    • Memory usage is capped at 320 MB per file-handle and 640 MB (40 x 16MB) globally. The global memory limit is configurable via the --read-global-max-blocks (default: 40).
    • This memory is automatically released when a file handle is closed or a random read access pattern is detected.
  • CPU Usage: The CPU overhead is typically proportional to the performance gain achieved.
  • Known Limitations:
    • Workload combining sequential and random reads (which also includes some model serving techniques such as Pipeline parallelism that do random reads at the start before switching to large sequential reads) may not benefit and could automatically fallback to default reads. We plan to improve buffered read for such scenarios in the future releases. Please reach out to us for improving the performance in such scenarios.
    • Please consider available system memory when enabling buffered reads to avoid out of memory(OOM) issues as we can go up to 640MB of memory usage by default. Please reduce --read-global-max-blocks(default:40) to avoid Out of Memory(OOM) issues.

Bug Fixes

  • Resolved sporadic mount failures and enhanced stability by improving the retry mechanism for stalled API calls to the backend. (#3561, #3684)
  • Fix for GCSFuse not returning errors in unmount of bucket (e.g. resource busy error) when mounted with gcsfuse (#3768) introduced in 3.2.0.

Improvements

  • Streaming Writes now support retrying stalled write operations.
  • Improved stability for writes(#3710)
  • Logging/monitoring improvements
    1. Efficient log collection: Metrics are more efficient now with CPU usage reduced from 15% to <2% for small single threaded reads and memory allocation is reduced from 34% to 0%.
    2. Mount logs display block size used in streaming writes in MiB instead of bytes for improved readability.
    3. Added error log for unsupported values of flag: log-format(#3751)
    4. Made logs efficient by downgrading the logging level of some unimportant logs to trace instead of higher logging level. (#3746. #3749)

Dependency Upgrades / CVE fixes

  • Dependency upgrades (#3740)

What's Changed

  • docs: Add information regarding ESTALE error in troubleshooting guide by @vipnydav in #3654
  • refactor(metrics): use optimized metrics implementation by @anushka567 in #3638
  • chore(shadow-reviewer): Delete shadow-review workflow. by @kislaykishore in #3657
  • chore(metrics): disable exemplar filter for metrics by @anushka567 in #3656
  • chore: clean up timeout from MRD by @ashmeenkaur in #3646
  • test: Move proxy server out of integration tests folder by @Tulsishah in #3650
  • fix(buffered Read): Adding missing fields while creating read manager by @Tulsishah in #3660
  • fix: Kokoro tests failure by @Tulsishah in #3662
  • fix(metrics): Prevent panic from unrecognized histogram attributes by @anushka567 in #3666
  • chore(metrics): add missing metric attribute values by @anushka567 in #3671
  • fix(Buffered Read): Unexpected EOF from buffered reader causes reads to fail before completion by @Tulsishah in #3674
  • ci: Fix broken e2e test in unfinalized_object package by @gargnitingoogle in #3670
  • refactor: explicit_dir test package migration [GKE-GCSFuse Test migration] by @ashmeenkaur in #3669
  • feat(Migrate auth TPC): Make default value of flag true by @Tulsishah in #3610
  • feat(buffered-read): adding hidden flag to control min-block to start buffered-read by @raj-prince in #3679
  • feat(bufferedread): Handle edge cases for non-blocking TryGet in Buffered Reader by @vipnydav in #3680
  • fix(Buffered read): Make buffered read path thread safe by @Tulsishah in #3677
  • feat(bufferedread): Update block reservation logic by @vipnydav in #3689
  • test(bufferedread): Add setup and helper methods for Buffered Read integration tests by @vipnydav in #3685
  • test: Stabilize mount_timeout test by waiting for unmount by @vipnydav in #3664
  • fix(bash installation): use latest bash 5.3 and improve logging in case of error. by @meet2mky in #3691
  • fix(buffered read): Intermittent nil pointer crash in downloadTask during Reads by @Tulsishah in #3694
  • feat(bufferedread): Add integration tests for fallback mechanism in buffered reader by @vipnydav in #3693
  • docs(OnDuty): Add note on gcs-fuse-gke-sidecar service on troubleshooting by @charith87 in #3698
  • test(Buffered Read): Add e2e test for Header, Footer and Body Read by @Tulsishah in #3692
  • feat(bufferedread): Limit the total workers in buffered reader by readGlobalMaxBlocks by @vipnydav in #3697
  • refactor: implicit_dir test package migration [GKE-GCSFuse Test migration] by @ashmeenkaur in #3701
  • fix: fix implicit and explicit dir tests [GKE-GCSFuse test migration] by @ashmeenkaur in #3706
  • fix: Disable e2 test for renaming unfinalized objects by @gargnitingoogle in #3707
  • refactor: list large dir test package migration [GKE-GCSFuse Test migration] by @PranjalC100 in #3708
  • chore(deps): Upgrade go sdk dependency to storage@v1.56.1 by @meet2mky in #3710
  • fix(metrics): prevent panic when recording histograms with no attributes by @vipnydav in #3713
  • revert(enable-write-stall): enable write stall in streaming writes by @meet2mky in #3712
  • chore(bufferedread): Add required metrics for buffered reader by @vipnydav in #3715
  • refactor: write large files test package migration [GKE-GCSFuse Test migration] by @PranjalC100 in #3711
  • test(write_large_files): Add slow writer integration tests. by @meet2mky in #3699
  • perf(metrics): Don't create an unnecessary histogramRecord by @kislaykishore in #3716
  • fix(review reminder): fix auto reminder issue when PR is raised from fork gcsfuse. by @meet2mky in #3719
  • Adding concurrent read test by @raj-prince in #3695
  • feat(tracing): Add tracing for HTTP/1 and gRPC protocols by @kislaykishore in #3722
  • test(bufferedread): support mounted directory in buffered read tests by @vipnydav in #3725
  • fix: re-use read handle from previous read. by @ashmeenkaur in #3714
  • feat: Retry stalled GetStorageLayout API call for zonal buckets by @gargnitingoogle in #3561
  • test: Add buffered read integration tests in mounted directory script by @vipnydav in #3726
  • feat(bufferedread): Implement the logic for newly added metrics for buffered read by @vipnydav in #3731
  • feat: Retry stalled Folder API calls for zonal buckets by @gargnitingoogle in #3684
  • feat(bufferedread): Handle clobbered error for buffered reader by @vipnydav in #3738
  • feat(bufferedread): Modify default mount config for buffered read as per experiments by @vipnydav in #3733
  • style(control client): Remove unused method parameter by @kislaykishore in #3744
  • feat(bufferedread): Enabling both file cache and buffered read should disable buffered read by @vipnydav in #3737
  • feat(bufferedread): Update severity of log "scheduleNextBlock: could not get block from pool" from warning to trace by @vipnydav in #3749
  • fix(log level): Downgrade info log to trace. by @kislaykishore in #3746
  • feat(Buffered Read): Make flag for Random seek count by @Tulsishah in #3750
  • refactor: update parsing the yaml file & Zonal bucket case handling [GKE-GCSFuse Test migration] by @PranjalC100 in #3753
  • refactor: gzip test package migration [GKE-GCSFuse Test migration] by @PranjalC100 in #3754
  • fix: Hide flag create-empty-file on gcsfuse --help by @gargnitingoogle in #3755
  • chore(logging): Add log for unsupported log formats by @alleaditya in #3751
  • feat(bufferedread): Update default value of readGlobalMaxBlocks to 40 by @vipnydav in #3760
  • refactor: readonly test package migration [GKE-GCSFuse Test migration] by @ashmeenkaur in #3759
  • feat(bufferedread): Add buffered read config to useragent by @vipnydav in #3762
  • ci: Reenable e2e test to move unfinalized object by @gargnitingoogle in #3757
  • chore(zonal buckets): Revert zero byte reader logic to get size of unfinalized object by @anushka567 in #3736
  • ci: Preserve gcsfuse log file for e2e package concurrent_operations on failure by @gargnitingoogle in #3763
  • fix(unmount): Fixing false unmount success bug by @raj-prince in #3768
  • test: Additional UTs for file handle by @abhishek10004 in #3659
  • ci(debug e2e failure): Enable trace logs in e2e package mount-timeout by @gargnitingoogle in #3770
  • refactor: rename dir limit test package migration [GKE-GCSFuse Test migration] by @PranjalC100 in #3730
  • chore(dep upgrade): Dependency upgrades by @kislaykishore in #3740
  • fix(control client retry): Fix handling of gax-retries for non-ZB by @gargnitingoogle in #3769
  • refactor: migrate packages to use stretchr testify and clean up test_setup package by @ashmeenkaur in #3772
  • fix(config): Use BlockSizeMb value in form of MB instead of converting it to bytes during rationalization. by @meet2mky in #3773
  • docs: update troubleshooting guide with not implemented functions by @ashmeenkaur in #3771
  • fix(e2e): changing the e2e client from http2 to http1 by @raj-prince in #3775
  • revert: disabling google lib auth flag by @Tulsishah in #3777

New Contributors

Full Changelog: v3.2.0...v3.3.0

Don't miss a new gcsfuse release

NewReleases is sending notifications on new releases.