Optimizing Databend Binary Builds with Profile-guided Optimization
PsiACEFeb 13, 2023
Recently someone in the community suggested that we try profile-guided optimization (#9387). Let's see how we can use Rust to build a PGO-optimized Databend!
Background
Profile-guided Optimization is a compiler optimization technique, which collects typical execution data (possible branches) during program execution and then optimizes for inlining, conditional branches, machine code layout, register allocation, etc.
The reason to introduce this technique is that static analysis techniques only consider code performance improvements without actually executing the program. However, these optimizations may not be fully effective. In the absence of runtime information, the compiler cannot take into account the actual execution of the program.
PGO allows data to be collected based on application scenarios in a production environment, so the optimizer can optimize the speed for hot code paths and size for cold code paths and produce faster and smaller code for applications.
rustc supports PGO by building data collection into the binaries, then collecting perf data during runtime to prepare for the final compilation optimization. The implementation relies entirely on LLVM.
Workflow
Follow the workflow below to generate a PGO-optimized program:
- Compile the program with instrumentation enabled.
- Run the instrumented program to generate a file.
profraw
- Convert the file into a
.profraw
file using LLVM's.profdata
tool.llvm-profdata
- Compile the program again with the profiling data.
Preparations
The data collected during the run will be eventually converted with
llvm-profdata
llvm-tools-preview
rustup
rustup component add llvm-tools-preview
After the installation,
llvm-profdata
PATH
~/.rustup/toolchains/<toolchain>/lib/rustlib/<target-triple>/bin/
Step-By-Step
The following procedure uses Databend's SQL logic tests for demonstration purposes only to help us understand how it works, so you may not get positive results for performance. Use a typical workload for your production environment.
The caveat, however, is that the sample of data fed to the program during the profiling stage must be statistically representative of the typical usage scenarios; otherwise, profile-guided feedback has the potential to harm the overall performance of the final build instead of improving it.
- Make sure there is no left-over profiling data from previous runs.
rm -rf /tmp/pgo-data
- Build the instrumented binaries (with profile), using the
release
environment variable in order to pass the PGO compiler flags to the compilation of all crates in the program.RUSTFLAGS
RUSTFLAGS="-Cprofile-generate=/tmp/pgo-data" \
cargo build --release --target=x86_64-unknown-linux-gnu
- Instrumented binaries were run with some typical workload and we strongly recommend using workload that is statistically representative of the real scenario. This example runs SQL logic tests for reference only.
- Start a stand-alone Databend via a script, or a Databend cluster. Note that a production environment is more likely to run in cluster mode.
- Import the dataset and run a typical query workload.
BUILD_PROFILE=release ./scripts/ci/deploy/databend-query-standalone.sh
ulimit -n 10000;ulimit -s 16384; cargo run -p sqllogictests --release -- --enable_sandbox --parallel 16 --no-fail-fast
- Merge the files into a
.profraw
file with.profdata
.llvm-profdata
llvm-profdata merge -o /tmp/pgo-data/merged.profdata /tmp/pgo-data
- Use the file for guiding optimizations. In fact, you can notice that both builds use the
.profdata
flag, because in an actual runtime case we always use the release build binary.--release
RUSTFLAGS="-Cprofile-use=/tmp/pgo-data/merged.profdata -Cllvm-args=-pgo-warn-missing-function" \
cargo build --release --target=x86_64-unknown-linux-gnu
- Run the compiled program again with the previous workload and check the performance:
BUILD_PROFILE=release ./scripts/ci/deploy/databend-query-standalone.sh
ulimit -n 10000;ulimit -s 16384; cargo run -p sqllogictests --release -- --enable_sandbox --parallel 16 --no-fail-fast
References
- https://en.wikipedia.org/wiki/Profile-guided_optimization
- https://doc.rust-lang.org/rustc/profile-guided-optimization.html
- https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
- https://learn.microsoft.com/en-us/cpp/build/profile-guided-optimizations?view=msvc-170
Subscribe to our newsletter
Stay informed on feature releases, product roadmap, support, and cloud offerings!