Streaming Writes:
Streaming writes is a new write path that uploads data directly to Google Cloud Storage (GCS) as it's written. The previous, and currently default write path temporarily stages the entire write in a local directory, uploading to GCS on close/fsync. This reduces both latency and disk space usage, making it particularly beneficial for large, sequential writes such as checkpoints. To enable streaming writes, use the following:
Command-line flag: --enable-streaming-writes
Configuration file: write: enable-streaming-writes:true
This will become the default write path in the future.
Memory Usage: Each file opened for streaming writes will consume approximately 64MB of RAM during the upload process. This memory is released when the file handle is closed. This should be considered when planning resource allocation for applications using streaming writes.
Important Considerations and Caveats:
-
New files, Sequential Writes: Streaming writes are designed for sequential writes to a new, single file only. Modifying existing files, or doing out-of-order writes (whether from the same file handle or concurrent writes from multiple file handles) will cause GCSFuse to automatically revert to the existing behavior of staging writes to a temporary file on disk. An informational log message will be emitted when this fallback occurs.
-
File System Semantics Change:
- FSync Operation does not finalize the Object: When streaming writes are enabled, the fsync operation will not finalize the object on GCS. Instead, the object will be finalized only when the file is closed. This is a key difference from the previous behavior and should be considered when using streaming writes. Relying on fsync for data durability with streaming writes enabled is not recommended. Data is only guaranteed to be on GCS after the file is closed.
- Read Operations During Write: Today the application can read the data when the writes are in progress for that file. With buffered writes, the application will not be able to read until the object is finalized i.e, fclose() is called. Applications should ensure that they do not attempt to read from a file while it is being written to using streaming writes.
-
Write Stalls and Chunk Uploads: Streaming writes do not currently implement chunk-level timeouts or retries. Write operations may stall, and chunk uploads that encounter errors will eventually fail after the default 32-second deadline.
Bug Fixes & Improvements:
- Fixes a bug that causes rename operation to fail when a file is open with staged writes and streaming writes enabled. (#2975)