Blog

This Week in Databend #91

PsiACEApr 30, 2023

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's On In Databend

Stay connected with the latest news about Databend.

New datatype: BITMAP

Databend has added support for the bitmap datatype.

BITMAP
is a type of compressed data structure that can be used to efficiently store and manipulate sets of boolean values. It is often used to accelerate count distinct.

> CREATE TABLE IF NOT EXISTS t1(id Int, v Bitmap) Engine = Fuse;
> INSERT INTO t1 (id, v) VALUES(1, to_bitmap('0, 1')),(2, to_bitmap('1, 2')),(3, to_bitmap('3, 4'));
> SELECT id, to_string(v) FROM t1;

┌──────────────────────┐
│ id │ to_string(v)
│ Int32 │ String │
├───────┼──────────────┤
10,1
21,2
33,4
└──────────────────────┘

Our implementation of the BITMAP data type utilizes

RoaringTreemap
, a compressed bitmap with u64 values. Using this data structure brought us improved performance and decreased memory usage in comparison to alternative bitmap implementations.

If you are interested in learning more, please check out the resources listed below.

Improving Hash Join Performance with New Hash Table Design

We optimized our previous hash table implementation for aggregation functions, but it significantly limited hash join operation performance. To improve hash join performance, we implemented a dedicated hash table optimized for it. We allocated a fixed-size hash table based on the number of rows in the build stage and replaced the value type with a pointer that supports CAS operations, ensuring memory control without the need for Vec growth. The new implementation significantly improved performance. Check out the resources below for more information:

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Rust Compilation Challenges and Solutions

Compiling a medium to large Rust program is not a breeze due to the accumulation of complex project dependencies and boilerplate code.

To address these challenges, Databend team implemented several measures, including observability tools, configuration adjustments, caching, linker optimization, compile-related profiles, and refactoring.

If you are interested in learning more, please check out the resources listed below.

Highlights

Here are some noteworthy items recorded here, perhaps you can find something that interests you.

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Contributors Wanted for Function Development

We are currently working on improving our functions, and we need your help!

We have identified four areas that require attention, and we would be extremely grateful for any assistance that you can provide.

If you are interested in contributing to any of these areas, please refer to the following resources to learn more about how to write scalar and aggregate functions:

We appreciate any help that you can provide, and we look forward to working with you.

Issue #11220 | Tracking: functions

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.com/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/datafuselabs/databend/compare/v1.1.14-nightly...v1.1.23-nightly

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!