Docs Cloud Manage Iceberg Tune Performance for Iceberg Topics Tune Performance for Iceberg Topics Page options Copy as Markdown Copied! View as plain text Ask AI about this topic Add MCP server to VS Code This guide covers strategies for optimizing the performance of Iceberg topics in Redpanda, including improving downstream query performance, tuning the Iceberg translation pipeline, and monitoring translation throughput. After reading this page, you will be able to: Apply partitioning and compaction strategies to improve query performance Choose appropriate lag target values for your workload Identify translation performance signals using Iceberg metrics Prerequisites You must be familiar with how Iceberg topics work in Redpanda. See About Iceberg Topics. Optimize query performance Query engines read Parquet files from object storage to process Iceberg table data. Partitioning, compaction, and schema design affect how efficiently those reads perform. Use custom partitioning To improve query performance, consider implementing custom partitioning for the Iceberg topic. Use the redpanda.iceberg.partition.spec topic property to define the partitioning scheme: # Create new topic with five topic partitions, replication factor 3, and custom table partitioning for Iceberg rpk topic create <new-topic-name> -p5 -r3 -c redpanda.iceberg.mode=value_schema_id_prefix -c "redpanda.iceberg.partition.spec=(<partition-key1>, <partition-key2>, ...)" Valid <partition-key> values include a source column name or a transformation of a column. The columns referenced can be Redpanda-defined (such as redpanda.timestamp) or user-defined based on a schema that you register for the topic. The Iceberg table stores records that share different partition key values in separate files based on this specification. For example: To partition the table by a single key, such as a column col1, use: redpanda.iceberg.partition.spec=(col1). To partition by multiple columns, use a comma-separated list: redpanda.iceberg.partition.spec=(col1, col2). To partition by the year of a timestamp column ts1, and a string column col1, use: redpanda.iceberg.partition.spec=(year(ts1), col1). To learn more about how partitioning schemes can affect query performance, and for details on the partitioning specification such as allowed transforms, see the Apache Iceberg documentation. Partition by columns that you frequently use in queries. Columns with relatively few unique values (low cardinality) are good candidates for partitioning. If you must partition based on columns with high cardinality, for example timestamps, use Iceberg’s available transforms such as extracting the year, month, or day to avoid creating too many partitions. Too many partitions can be detrimental to performance because more files need to be scanned and managed. Compact Iceberg tables Over time, Iceberg translation can produce many small Parquet files, especially with low-throughput topics or short lag targets. Compaction merges small files into larger ones, reducing the number of metadata operations query engines must perform and improving read performance. Automatic compaction: Some catalog and data platform services, such as AWS Glue and Databricks, automatically compact Iceberg tables. Manual or scheduled compaction: Tools like Apache Spark can run compaction jobs on a schedule. This is useful if your catalog or platform does not compact automatically. If you observe degraded read performance or a high number of small files, investigate whether your catalog or platform supports automatic compaction or schedule periodic compaction jobs. Avoid high column count A high column count or schema field count results in more overhead when translating topics to the Iceberg table format. Small message sizes can also increase CPU utilization. To minimize the performance impact on your cluster, keep to a low column count and large message size for Iceberg topics. Tune translation performance Translation is the process in which Redpanda converts topic data into Parquet files for the Iceberg table. Each round of translation processes one topic partition at a time. Under typical conditions, Iceberg translation has the following performance characteristics: Throughput: Approximately 5 MiB/s per core. Flush threshold: 32 MiB. Each translation process uploads its on-disk data when accumulated data reaches this threshold. This is the primary control for Parquet file size, and is managed by Redpanda Cloud. Lag target: Controlled by iceberg_target_lag_ms (default: 1 minute). Redpanda tries to commit all data produced to an Iceberg-enabled topic within this window. The flush threshold and lag target together determine the size of the Parquet files written to object storage. Larger Parquet files generally improve downstream query performance by reducing the number of metadata operations query engines must perform. Tune the lag target In Redpanda Cloud, datalake_translator_flush_bytes is managed by Redpanda Cloud and is not user-tunable. To adjust the size of Parquet files written to object storage, increase the lag target. A larger lag target gives translators more time to accumulate data before committing, resulting in larger Parquet files with more records per file. You can configure the lag target at the cluster level or per-topic: Cluster-wide: edit iceberg_target_lag_ms in the Redpanda Cloud Console. For instructions, see Configure Cluster Properties. Per-topic: set the redpanda.iceberg.target.lag.ms topic property. The topic property overrides the cluster default for that topic. Increasing the lag target means Iceberg tables receive new data less frequently. Choose a lag value that balances file efficiency against how current your downstream data must be. To check the current cluster-wide value: rpk cluster config get iceberg_target_lag_ms To check topic-level overrides: rpk topic describe <topic-name> -c Optimize message size Redpanda has validated 32 MiB as the maximum recommended message size for Iceberg-enabled topics. With large messages, each Parquet file contains fewer records because the flush threshold is reached sooner. This can reduce the efficiency of analytical queries that need to scan many records. If query latency is a concern and your workload produces large messages, consider: Reducing individual message sizes if your data model allows it. Increasing iceberg_target_lag_ms to produce Parquet files with more records per file. See Tune the lag target. Size clusters for Iceberg workloads When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster’s CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically increases the scheduling priority of Iceberg translation to help it catch up with incoming data. However, this does not substitute for adequate cluster resources. You may need to increase the size of your Redpanda cluster to accommodate the additional workload. To ensure that your cluster is sized appropriately, contact the Redpanda Customer Success team. Monitor translation performance Use the following Iceberg metrics to understand whether translation is keeping pace with incoming data: redpanda_iceberg_translation_raw_bytes_processed: Total raw bytes consumed for translation input. Use this to monitor input throughput and compare against the expected 5 MiB/s per core baseline. redpanda_iceberg_translation_parquet_bytes_added: Total bytes written to Parquet files. Divide by redpanda_iceberg_translation_files_created to estimate the average file size produced by your workload. redpanda_iceberg_translation_files_created: Number of Parquet files created. A high file creation rate relative to bytes added indicates many small files. Consider increasing iceberg_target_lag_ms. redpanda_iceberg_translation_parquet_rows_added: Total rows written to Parquet files. Useful for understanding record-level throughput. redpanda_iceberg_translation_translations_finished: Number of completed translator executions. A stalling or zero rate indicates translation has stopped. For metrics related to DLQ files, invalid records, and catalog commit failures, see Troubleshooting metrics. If translation consistently lags despite available CPU headroom, the workload may be partition-bound. Each core translates its assigned partitions independently, so distributing data across more partitions allows more cores to contribute to translation and can improve total throughput. Back to top × Simple online edits For simple changes, such as fixing a typo, you can edit the content directly on GitHub. Edit on GitHub Or, open an issue to let us know about something that you want us to change. Open an issue Contribution guide For extensive content updates, or if you prefer to work locally, read our contribution guide . Was this helpful? thumb_up thumb_down group Ask in the community mail Share your feedback group_add Make a contribution 🎉 Thanks for your feedback! Migrate to Iceberg Topics Troubleshoot Iceberg Topics