Skip to main content

Working with Tables

This guide demonstrates Tower's table capabilities using two example apps:

  1. Writing Ticker Data to Iceberg - Shows how to create and write to tables
  2. Trimming Ticker Table - Shows how to read and delete data from tables

Creating and Writing to Tables

The first example app uses Yahoo Finance's public API and the yfinance library to:

  • Download major indicators for stock tickers (e.g., 'AAPL')
  • Track Open, Close, and Volume data
  • Save the data in an Iceberg table named "daily_ticker_data"

Creating Tables

To create a table, follow these steps:

  1. Define the schema in Arrow Schema format
  2. Call the create_if_not_exists() method
SCHEMA = pa.schema([
("ticker", pa.string()),
("date", pa.string()),
("open", pa.float64()),
("close", pa.float64()),
("volume", pa.int64()),
])

table = tower
.tables("daily_ticker_data")
.create_if_not_exists(SCHEMA)

Using Different Catalogs and Namespaces

To create a table in a specific catalog and namespace:

mytable = tower
.tables('mytable', catalog='mycatalog', namespace='mynamespace')
.create_if_not_exists(SCHEMA)

If you're certain the table doesn't exist, use the create() method:

mytable = tower
.tables('mytable', catalog='mycatalog', namespace='mynamespace')
.create(SCHEMA)

Writing to Tables

To write data to a table:

  1. Get a reference to the table using load()
  2. Use either upsert() or insert()
  3. Provide data as an Arrow Table
# Get table reference
table = tower.tables("daily_ticker_data").load()

Upserting Data

Use upsert() to:

  • Update existing rows that match join columns
  • Insert new rows that don't have matches
table = table.upsert(data, join_cols=['ticker','date'])

Inserting Data

Use insert() to simply add new rows:

table = table.insert(data)

Reading and Deleting Data

The second example app demonstrates how to:

  • Inspect an Iceberg table
  • Remove records older than a specified time window

Reading Data

For best performance and memory efficiency:

  1. Create a Polars LazyFrame using to_polars()
  2. Build your query plan
  3. Execute with collect() when needed
# Create LazyFrame
df = table.to_polars()

# Build and execute query
ticker_stats = df.group_by("ticker").agg([
pl.count().alias("row_count"),
pl.col("date").str.to_date().max().alias("latest_date")
]).collect()

Eager Reading

For smaller tables or when you need all data in memory:

# Read entire table into memory
df = table.read()

Deleting Data

Use the delete() method with a predicate:

cutoff_date_str = calculate_cutoff_date(df, time_window_days)
table.delete(f"date < '{cutoff_date_str}'")

You can provide predicates as:

  • String expressions
  • Lists of Arrow Expressions