KantoDB user manual
Welcome to the KantoDB user manual.
KantoDB is an OLTP1-oriented hybrid SQL database server and programming library. It is written in the Rust programming language, and uses RocksDB for underlying data storage.
You can use it as a library like SQLite, or as a database server like Postgres.
KantoDB is still very early in development. See development roadmap for our plans.
This document is intended to serve two audiences:
- software developers writing applications that use KantoDB for data storage and processing, with SQL or abstractions such as an ORM
- administrators (/devops/SRE) managing servers and services that run KantoDB
For sake of space, this manual assumes general knowledge of computers, SQL, and Linux.
If you find a bug in this manual, or in KantoDB, please tell us about it on the community forums.
Online Transaction Processing; as opposed to OLAP (Online Analytical Processing), OLTP workloads have more writes, point queries (one or few affected rows), and use indexes more, as opposed to OLAP workloads that have read-heavy range scans that read a significant portion of the total data on every run.
Motivation
Why a new database system?
The major open source general-purpose SQL databases as of 2025 are Postgres, SQLite, and MariaDB (formerly known as MySQL). They are all awe-inspiring projects with strong abilities in what they focus on. However, we feel like each one of them also has significant limitations, oddities, historical baggage, and architectural decisions that have become outdated.
Instead of criticizing in detail these projects, each of which we have high respect for, we'd rather simply ask, "Can we (the community) do better?" -- and if so, what would "better" look like. We'll only focus on comparison where we're looking to specifically do things differently, and hopefully can justify those decisions well enough to not incite a flame war.
We invite you to join us on this journey of hopefully building the next 40 years of databases.
How can I trust my data to be safe?
KantoDB is new, still in development, and has no significant adoption yet. As of 2025Q1, you should not trust your data to it.
We do not wish to make claims about durability and functionality without first having test coverage sufficient for the claims. We intend for the evidence to speak for itself. We aim to use formal methods as much as we can muster, and good testable design (sans I/O1, fuzzed, etc) for the rest.
Yes, we know we are crazy.
Version v0 will be primarily intended for bootstrapping the development, getting all the tooling in place, and for making major architectural decisions. This is where we worry about row formats, schema evolution, and such.
Version v1 will focus on being a viable production-ready Postgres/SQLite replacement, using RocksDB for data storage. RocksDB is an industry standard library for ordered key-value storage and provides us with production ready functionality needed in a SQL database engine.
Version v2 is far in the future, but will hopefully implement a novel disk storage format tuned for modern NVMe storage, written in pure Rust.
Sans I/O (popularized by https://sans-io.readthedocs.io/) refers to the idea of separating logic from input/output calls, and for KantoDB is mostly applicable with the from-scratch v2 storage format. (We could abstract out the whole RocksDB API, but it's quite large, complicated, and much more work than "read 2 MiB from disk at this offset".)
Is KantoDB Open Source?
Emphatically yes. Apache 2 licensed, about as permissive as it gets.
We have no intent of rug-pulling the community with a non-open-source license later, but naturally nobody trusts statements like that anymore.
Software architecture
KantoDB is programmed in Rust. (For v0 and v1, we're using the RocksDB library for data storage, which is written in C++.)
KantoDB stands on the shoulders of giants fairly tall people.
- DataFusion
- Apache Arrow
- sqlparser
- pgwire
- RocksDB
KantoDB uses the DataFusion library for query planning and execution. DataFusion in turn uses the Apache Arrow in-memory data exchange format. Arrow is a column-oriented format designed with SIMD optimizations in mind. This makes KantoDB essentially a hybrid row/columnar database; with RocksDB, the on-disk storage is still row-based, but as we scan a table or index and load data into memory, it is made columnar.
Single node only
KantoDB is not a distributed database. It does not use Raft or Paxos or such distributed consensus mechanisms, it does not coordinate writes across a multiple server machines, and it does not "scale out". We intend to focus on easy disaster recovery over scale-out or high availability. This limitation keeps the software much simpler.
We do not care about scaling OLTP beyond a 448-core 24 TiB RAM server with a dozen 2 TiB NVMe drives (2025 numbers). That is a huge amount of data and processing ability most companies will never grow beyond.
We aim to compete with Postgres and SQLite1, not with CockroachDB, Spanner, or such.
We care deeply about running as a library (like SQLite) with relatively low resource requirements. Think cheap cloud VM not embedded hardware, for now. This will also be core part of our developer experience, allowing you to run application tests against the library even when production is using a standalone server.
We care about easy, automatic and streaming backups to cloud object storage, such as S3 -- and disaster recovery from it.
We intend to support easy-to-run read-only asynchronous replicas, if that comes naturally.
We intend to support easy OLAP integration by exposing archived tables directly from object storage, as Delta Lake or such, when reads can lag and throughput is king. This may even be our chosen backup format.
Both at the same time; we are building both a database server and a library.
Server
The database server provides a Postgres wire protocol compatible TCP server, implemented with the pgwire
library.
This allows multiple client machines (for example, a pool of web servers) to use the database server for storing data and coordinating activity.
A notable difference from Postgres is that KantoDB's per-connection and per-query overheads are smaller (note: pending benchmark results).
Postgres spawns a relatively heavy-weight process per connection; in KantoDB connections are just async
contexts in Tokio, and currently live queries use a worker in a thread pool.
(Current limitations: no TLS, no UNIX domain sockets.)
Library
The Rust crate (library) kanto
is a programming library for application development, much in the style of SQLite.
It stores the database in a directory of local files.
Differences from SQLite:
- The
kanto
library is written in Rust, not C, and thus can be used natively only from Rust programs. - Bindings for other programming languages can be made later, for example for C, C++, JavaScript (NodeJS, Deno), Python.
- RocksDB databases can only be opened by one process at a time; SQLite lets multiple processes open the database, managing locking at a more fine grained level. (Note that multi-process SQLite databases come with performance caveats, and generally should be avoided, but are useful as an admin side channel; we do not currently have a solution like that.)