This Week in Databend #126
PsiACEJan 1, 2024
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's New
Stay informed about the latest features of Databend.
New Filter Execution Framework
In the new filter execution framework, Databend introduces a groundbreaking concept, defining it as the "Immutable Index".
🚀 The Immutable Index enables us to avoid generating temporary selection buffer when encountering AND and OR operations. This not only reduces memory fragmentation but It also eliminates the cyclic copying from temporary selection to final selection.
Tests indicate a reduction in query time through the implementation of this optimization.
If you would like to learn more, please contact the Databend team or refer to the resources listed below:
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Optimize Query Performance
Databend enhances query performance by providing Aggregate Index, Cluster Key, and Virtual Column, allowing users to optimize for specific query scenarios.
- Aggregate Index can pre-aggregate data to speed up aggregation query operations, such as sum, average, max, and min. It is especially useful for scenarios that require frequent aggregation calculations.
- Cluster Key guide Databend on how to organize data at the storage level. Rows with similar key values are physically stored together, reducing the number of reads during queries and thus speeding up query performance.
- Virtual columns can extract nested fields from Variant data and store this data in separate storage files. It is very useful for optimizing complex computations and conditional queries, reducing the computational load at runtime.
By properly applying these tools, Databend can significantly improve the speed and efficiency of data retrieval, providing users with fast and flexible options for query performance optimization.
Highlights
We have also made these improvements to Databend that we hope you will find helpful:
- Added support for spilling Top-N sorting.
- Supported the use of conditional statements to build directed acyclic graphs when creating background tasks.
- Added new Binary data type.
- Added new stream_status HTTP API to check the status of streams.
- Added support for to defining default behavior with during Parquet load.
MISSING_FIELD_AS
- Read Docs | Continuous Data Pipelines to learn how to use Stream and Pipeline for continuous data ingestion.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Databend Roadmap for 2024 - Come & Join the Discussion!
In 2023, Databend scaled significantly. The largest single table in Databend managed to handle hundreds of thousands of segments, several ten million blocks, tens of trillions of records, encompassing 7PB of raw data and over 300TB of index data.
In 2024, our vision is Compute Where Data Lives: Swift, Smart, Seamless. Explore our ongoing journey and future plans for Databend. Join the discussion and contribute your ideas!
Task | Status | Comments |
---|---|---|
Enhancements to Concurrency and Scheduler | Planned | Aiming for faster, more efficient task handling and improved system responsiveness. |
GEOMETRY Data type | Planned | |
TPC-DS Performance | In Progress | Continuously optimizing for better performance benchmarks. |
Multi-Statement Transactions | Not Specified | |
Stored Procedures(Python) | Not Specified | Adding Python support for versatile data analysis alongside SQL. |
Unify Storage, Warehouse, and Compute | Not Specified | Creating a cohesive data platform for AI and cloud computing, provisioning CPU & GPU resources. |
Issue #14167 | Databend Roadmap for 2024 (Discussion)
Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.com/i-m-feeling-lucky to get started.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.268-nightly...v1.2.277-nightly
Subscribe to our newsletter
Stay informed on feature releases, product roadmap, support, and cloud offerings!