Skip to main content

Ingest data from Apache Iceberg

This guide describes how to batch ingest data from Apache Iceberg to RisingWave using the Iceberg source in RisingWave. Apache Iceberg is a table format designed to support huge tables. For more information, see Apache Iceberg.

Beta feature

The Iceberg source connector in RisingWave is currently in Beta. Please contact us if you encounter any issues or have feedback.

Syntax

CREATE SOURCE [ IF NOT EXISTS ] source_name 
WITH (
connector='iceberg',
connector_parameter='value', ...
);
note

You don’t need to specify the column name for the Iceberg source, as RisingWave can derive it from the Iceberg table metadata directly. Use DESCRIBE statement to view the column names and data types.

Parameters

FieldNotes
typeRequired. Allowed values: appendonly and upsert.
s3.endpointOptional. Endpoint of the S3.
  • For MinIO object store backend, it should be http://${MINIO_HOST}:${MINIO_PORT}.
  • For AWS S3, refer to S3.
s3.regionOptional. The region where the S3 bucket is hosted. Either s3.endpoint or s3.region must be specified.
s3.access.keyRequired. Access key of the S3 compatible object store.
s3.secret.keyRequired. Secret key of the S3 compatible object store.
database.nameRequired. Name of the database that you want to ingest data from.
table.nameRequired. Name of the table that you want to ingest data from.
catalog.nameConditional. The name of the Iceberg catalog. It can be omitted for storage catalog but required for other catalogs.
catalog.typeOptional. The catalog type used in this table. Currently, the supported values are storage, rest, hive and jdbc. If not specified, storage is used. For details, see Catalogs.
warehouse.pathConditional. The path of the Iceberg warehouse. Currently, only S3-compatible object storage systems, such as AWS S3 and MinIO, are supported. It's required if the catalog.type is not rest.
catalog.urlConditional. The URL of the catalog. It is required when catalog.type is not storage.

Data type mapping

RisingWave converts data types from Iceberg to RisingWave according to the following data type mapping table.

Iceberg TypeRisingWave Type
booleanboolean
integerint
longbigint
floatreal
doubledouble
stringvarchar
datedate
timestamptztimestamptz
timestamptimestamp
decimaldecimal

Catalogs

Iceberg supports these types of catalogs:

  • Storage catalog: The Storage catalog stores all metadata in the underlying file system, such as Hadoop or S3. Currently, we only support S3 as the underlying file system.

  • REST catalog: RisingWave supports the REST catalog, which acts as a proxy to other catalogs like Hive, JDBC, and Nessie catalog. This is the recommended approach to use RisingWave with Iceberg tables.

  • Hive catalog: RisingWave supports the Hive catalog. You need to set catalog.type to hive to use it. See the full example in this configuration file.

  • Jdbc catalog: RisingWave supports the JDBC catalog. See the full example in this configuration file.

Examples

Firstly, create an append-only Iceberg table, see Append-only sink from upsert source for details.

Secondly, create an Iceberg source:

CREATE SOURCE iceberg_source 
WITH (
connector = 'iceberg',
warehouse.path = 's3a://my-iceberg-bucket/path/to/warehouse,
s3.endpoint = 'https://s3.ap-southeast-1.amazonaws.com',
s3.access.key = '${ACCESS_KEY}',
s3.secret.key = '${SECRET_KEY},
catalog.name='demo',
database.name='dev',
table.name='table'
);

Then, you can query the Iceberg source:

SELECT * FROM iceberg_source;

Help us make this doc better!

Was this page helpful?

Happy React is loading...