Blog

This Week in Databend #113

PsiACEOct 1, 2023

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's On In Databend

Stay connected with the latest news about Databend.

Loading to Table with Extra Columns

By default, COPY INTO loads data into a table by matching the order of fields in the file to the corresponding columns in the table. It's essential to ensure that the data aligns correctly between the file and the table.

load extra

If your table has more columns than the file, you can specify the columns into which you want to load data.

When working with CSV format, if your table has more columns than the file and the additional columns are at the end of the table, you can load data using the

FILE_FORMAT

option

ERROR_ON_COLUMN_COUNT_MISMATCH

If you are interested in learning more, please check out the resources below:

Docs | Example 5: Loading to Table with Extra Columns

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Introducing Read Policies to Parquet Reader

There is a drawback of using the

arrow-rs

APIs: When we try to prefetch data for prewhere and topk push downs, we can't reuse the deserialized blocks.

In order to improve the logic of row group reading and reuse the prefetched data blocks at the final output stage, we have done a lot of refactoring and introduced some read policies.

NoPrefetchPoliy

No prefetch stage at all. Databend reads, deserializes, and outputs data blocks you need directly.

PredicateAndTopkPolicy

Databend prefetches columns needed by prewhere and topk at the prefetch stage. It deserializes them into a DataBlock and evaluates them into RowSelection. It then slices the DataBlock by batch size and stores the resulting VecDeque in memory.

Databend reads the remaining columns specified by RowSelection at the final stage, and it outputs DataBlocks in batches. Then, it merges the prefetched blocks and projects the resulting blocks according to the output_schema.

TopkOnlyPolicy

It's similar to the

PredicateAndTopkPolicy

, but Databend only evaluates the topk at the prefech stage.

If you are interested in learning more, please check out the resources below:

PR #13020 | refactor: introduce ReadPolicy to parquet reader

Highlights

We have also made these improvements to Databend that we hope you will find helpful:

Added spill info to query log.
Added support for unloading data into a compressed file with COPY INTO.
Introduced the
```
GET /v1/background/:tenant/background_tasks
```
HTTP API for querying background tasks.
Read Example 4: Filtering Files with Pattern to understand how to use Pattern to filter files.

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Fixing issues detected by SQLsmith

In the last month, SQLsmith has discovered around 40 bugs in Databend. Databend Labs is actively working to fix these issues and improve system stability, even in uncommon scenarios. Your involvement in this effort, which may include tasks like type conversion or handling special values, is encouraged and can be facilitated by referring to past fixes.

Issues | Found by SQLsmith

Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.com/i-m-feeling-lucky to get started.

New Contributors

We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.

@zenus fixed an issue where schema mismatch was not detected during the execution of
```
COPY INTO
```
in #13010.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/databendlabs/databend/compare/v1.2.128-nightly...v1.2.137-nightly

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!

This Week in Databend #114

This Week in Databend #112

This Week in Databend #113

PsiACEOct 1, 2023

What's On In Databend

Loading to Table with Extra Columns

Code Corner

Introducing Read Policies to Parquet Reader

Highlights

What's Up Next

Fixing issues detected by SQLsmith

New Contributors

Changelog

Subscribe to our newsletter

Get Started

Talk to us

Products

Resources

Community

Company

Solutions