Skip to main content

Observe and improve

Tower's observability features help you monitor the health of your data system, quickly identify issues, and take targeted action where needed. This guide explains how to use Tower's observability tools to maintain optimal performance.

Understanding system health

In Tower, you can assess the health of your data system by observing your apps and understanding which of them are healthy or need attention.

Tower uses a simple status model to help you quickly assess system health.

App StatusMeaningAction Required
RunningRun in progressNone - normal operation
SuccessfulLatest run completed without errorsNone - normal operation
FailedLatest run encountered errorsInvestigate and fix
DisabledApp manually deactivated (coming soon)None - intended state

Except for the Running and Disabled app statuses, the remaining two app statuses - Successful and Failed - are based on the statuses of the last completed run.

Last Run StatusResulting App StatusDescription
ExitedSuccessfulLast Run completed successfully
ScheduledSuccessfulA new Run is planned for future execution
ErroredFailedLast Run failed due to system-level issues
CrashedFailedLast Run failed due to issues in user code

Monitoring system health in the Tower UI

The Tower Home Page serves as your observability dashboard, providing multiple ways to assess system health.

Home Page

The Home Page includes:

  1. App State Summary - Shows counts of apps in each state, allowing you to quickly identify if any apps need attention

  2. Run History Chart - Displays successfully exited vs. errored/crashed runs over time, helping you spot trends or sudden increases in failures Run Barchart

  3. App Cards - Provide detailed status for individual apps, including their latest status and run history App Card

Using the CLI to monitor system health

In addition to observability capabilities in the Tower UI, Tower also provides them in the CLI.

Coming soon

Finding apps that need attention

To identify and fix problematic apps:

  1. Click the "Failed" app status in the App Status Summary to filter the list to only failed apps
  2. Select a specific date range from the dropdown in the Run History Chart to focus on recent failures
  3. Click any app card to view detailed run information

Investigating and fixing issues

When you identify a failed app:

  1. Navigate to the App Details page by clicking the app card App Details

  2. Review the list of runs, focusing on the most recent failed run

  3. Click on the failed run to open the Run Details page Run Details

  4. Analyze the logs to identify the specific error:

    • For Crashed runs: Focus on user code-level exceptions, input data issues, or configuration problems
    • For Errored runs: Note the error and contact Tower administrators if it appears to be a platform issue

Common Issues and Solutions

Issue TypeCommon SignsTypical Solutions
Data format changesSchema validation errorsUpdate code to handle new formats
Resource limitsMemory/CPU errorsOptimize code or request increased resources
AuthenticationPermission denied errorsUpdate or rotate credentials
System dependenciesImport or library errorsUpdate dependencies or contact admin

After making code changes, test locally using the guidance in our Test guide before redeploying the app to Tower.