github openziti/ziti v0.19.0

latest releases: v1.7.2, v1.7.1, v1.7.0...
4 years ago

Release 0.19.0

Breaking Changes

  • Edge session validation is now handled at the controller, not the edge router
  • Routing across the overlay is now handled in parallel, rather than serially. This changes the
    syntax and semantics of a couple of control plane messages between the controller and the
    connected routers. See the section below on Parallel Routing for additional details.
  • API Session synchronization improvements and pluggability

Bug fixes

  • ziti ps now supports router-disconnect and router-reconnect, which disconnects/reconnects the
    router from the controller. This allows easier testing of various failure states. Requires that
    --debug-ops is passed to ziti-router on startup.
  • Golang SDK hosted service listeners are now properly closed when they receive close notifications
  • Golang SDK now recovers if the session is gone
  • Golang SDK now stops some go-routines that were previously left running after the SDK context was
    closed
  • Fix session leak caused by using half close when tunneling UDP connections
  • Fix connection leak caused by not closing the UDP connection when it's activity timer expires

API Changes

  • Fabric Xctrl instances are now notified when the control channel reconnects
  • Fabric Xctrl instances may now provide message decoders for the trace infrastructure so that
    custom messages will be properly displayed in trace logs

Edge Session Validation

Before 0.19, edge sessions (note: network sessions, not API sessions) would be sent to edge routers
after they were created. When the edge router received a dial or bind request it would verify that
the session was valid, then request the controller to create a fabric session.

This approach has two downsides.

  1. There is a race condition where the edge router may receive a dial/bind request before it has
    received the session from the controller. It thus has to wait awhile before declaring the
    session invalid.
  2. Sessions need to be managed across multiple edge routers, since we don't know where the client
    will connect. This adds a lot of control channel traffic.

Since the edge router makes a request to the controller anyway, we can pass the session token and
fingerprints up to the controller and do the verification there. This allows us to minimize the
amount of state the edge router needs to keep synchronized with the controller and removes the race
condition.

Parallel Routing

Prior to 0.19, the Ziti controller would send a Route message to the terminating router first, to establish terminator endpoint connectivity. If the destination endpoint was unreachable, the entire session setup would be abandoned. If the terminator responded successfully, the controller would then proceed to work through the chain of routers sending Route messages and creating the appropriate forwarding table entries. This all happened sequentially.

In 0.19 route setup for session creation now happens in parallel. The controller sends Route commands to all of the routers in the chain (including the terminating router), and waits for responses and/or times out those responses. If all of the participating routers respond affirmatively within the timeout period, the entire session creation succeeds. If any participating router responds negatively, or the timeout period occurs, the session creation attempt fails, updating configured termination weights. Session creation will retry up to a configured number of attempts. Each attempt will perform a fresh path selection to ensure that failed terminators can be excluded from subsequent attempts.

Configuration of Parallel Routing

The terminationTimeoutSeconds timeout parameter has been removed and will be ignored. The routeTimeoutSeconds controls the timeout for each route attempt.

#network:
  #
  # routeTimeoutSeconds controls the number of seconds the controller will wait for a route attempt to succeed.
  #
  #routeTimeoutSeconds:  10

You'll want to ensure that your participating routers' getSessionTimeout in the Xgress options is configured to a suitably large enough value to support the configured number of routing attempts, at the configured routing attempt timeout. In the router configuration, the getSessionTimeout value is configured for your Xgress listeners like this:

listeners:
  # basic ssh proxy
  - binding:            	proxy
    address:            	tcp:0.0.0.0:1122
    service:            	ssh
    options:
      getSessionTimeout:	120s

The new parallel routing implementation also supports a configurable number of session creation attempts. Prior to 0.19, the number of attempts was hard-coded at 3. In 0.19, the number of retries is controlled by the createSessionRetries parameter, which defaults to 3.

network:
  #
  # createSessionRetries controls the number of retries that will be attempted to create a circuit (and terminate it)
  # for new sessions.
  #
  createSessionRetries: 5

API Session Synchronization

Prior to 0.19 API Sessions were only capable of being synchronized with connecting/reconnecting
edge routers in a single manner. In 0.19 and forward improvements allow for multiple strategies to be defined
within the same code base. Future releases will be able to introduce configurable and negotiable
strategies.

The default strategy from prior releases, now named 'instant', has been improved to
fix issues that could arise during edge router reconnects where API Sessions would become invalid
on the reconnecting edge router. In addition, the instant strategy now allows for invalid
synchronization detection, resync requests, enhanced logging, and synchronization statuses for edge routers.

Edge Router Synchronization Status

The GET /edge-routers list and GET /edge-routers/<id> detail responses now include a syncStatus
field. This value is updated during the lifetime of the edge router's connection to the controller
and will provide insight on its status.

The possible syncStatus values are as follows:

  • "SYNC_NEW" - connection accepted but no strategy actions have been taken
  • "SYNC_QUEUED" - connection handed to a strategy and waiting for processing
  • "SYNC_HELLO_TIMEOUT" - sync failed due to a hello timeout, requeued for hello
  • "SYNC_HELLO" - controller edge hello being sent
  • "SYNC_HELLO_WAIT" - hello received from router and queued for processing
  • "SYNC_RESYNC_WAIT" - router requested a resync and queued for processing
  • "SYNC_IN_PROGRESS" - synchronization processing
  • "SYNC_DONE" - synchronization completed, router is now in maintenance updates
  • "SYNC_UNKNOWN" - state is unknown, edge router misbehaved, error state
  • "SYNC_DISCONNECTED" - strategy was disconnected before finishing, error state

Don't miss a new ziti release

NewReleases is sending notifications on new releases.