Retrying failed flows - Introduction

dynamic insight has an automatic retry logic for failed flows due to for example timeouts, systems down, rate limits of an API or service that may be reached. This makes it very convenient that if something goes wrong along the way your data and any temporally generated data is stored and can be reused in order to continue where the routine Flow execution was unintentionally stopped.


Reasons for failure

  • Flow is setup or built incorrectly
  • Flow is built to process data in a certain way, but there may be data inconsistencies from time to time, as usual in data handling.
  • Systems or APIs that flows interact with may be down or temporarily out of service.
  • Systems or APIs may have rate limits allowing only a certain number of API calls per time unit, e.g. 100 per minute is often used.


How flow failure is handled - smart retries

Whenever a flow run fails, the following steps happen:

  1. The failed Flow will be put into the Flow Runs reporting table with the respective failure status ("error" by default). 

  2. The failed Flow Run (and its id) will be put into a queue of a queuing system where it is temporarily stored to be retried.

  3. The queued information will contain a description of the exact location where the flow run failed, be it inside a connector, or inside a looper, etc,, along with the intermediate variable results. This is important and special as it allows the following: If a flow failed during uploading 990 of 1000 contacts to e.g. your ERP, you don't have to re-run the whole flow once the error is resolved, but rather just the remaining 10 contacts.

  4. Then, a periodically executed retry-handler will try to continue the execution of a failed Flow at the exact place where it failed with best practice backoff timeouts of 1, 3, 10, 20 minutes time delay.

  5. After 4 unsuccessful retries the flow run will be finally marked as "error" and not retried anymore. Furthermore, the Flow is set form the status "live" to the status "paused", which means it doesn't run anymore and an email is sent to the user who created the flow informing about the failure.