Tables
Tower Tables make it easy for users to onboard to Apache Iceberg. They provide methods for accessing and processing tabular and semi-structured data (e.g. tables with nested fields, via the VARIANT data type).
Overview
Tower offers two main components for working with tables:
- The
Table
class: A wrapper around Iceberg tables that provides methods for reading and writing data - The
tables
helper function: A convenient way to create and access tables
Creating Tables
To create a table, you need to:
- Define its schema in Arrow Schema format
- Use either
create_if_not_exists()
orcreate()
methods
Here's a basic example:
import pyarrow as pa
import tower
SCHEMA = pa.schema([
("col1", pa.string()),
("col2", pa.float64()),
...
])
mytable = tower.tables('mytable').create_if_not_exists(SCHEMA)
The returned mytable
object is of the Tower Table
class, which provides a unified interface for working with different types of tables. Currently, Tower supports Apache Iceberg tables.
Catalogs and Namespaces
Tower Tables are aware of the catalogs defined in Tower. Using the tables
helper saves you from writing boilerplate code to set environment variables.
The example above assumes you're creating tables in the 'default' namespace of the 'default' catalog. For more examples of table creation with different catalogs and namespaces, see our Working with Tables guide.
Table Operations
TableReference Methods
The tables
helper returns a TableReference
object with these methods:
Table Creation
- create_if_not_exists() - Creates a table with specified schema if it doesn't exist
- create() - Creates a table with specified schema (fails if table exists)
Table Access
- load() - Gets a reference to an existing table and loads its metadata
Table Methods
Once you have a table reference, you can perform these operations:
Schema Operations
- schema() - Gets the table's schema
Reading Data
- to_polars() - Returns a Polars LazyFrame for efficient data processing
- read() - Reads the entire table into memory as a Polars DataFrame
Writing Data
Data Management
- delete() - Removes data from the table based on specified conditions