About Projects

Introducing OrangeDB: A Distributed Key-Doc Database

A couple of months ago, I started building a small LSM tree-based storage engine, which I call Parrot.

If you’re unfamiliar with LSM trees—unlike B-trees, which require frequent reorganizations—LSM (Log-Structured Merge) trees write data as sequential logs. One of the main problems with B/B+ trees is write amplification. Inserting a single row might trigger multiple rearrangements across various index trees, leading to costly random disk writes, especially in persistent systems.

LSM trees avoid this by leveraging sequential writes, making write operations extremely fast. But there’s a tradeoff—read performance. Since records are spread across multiple levels, a read operation might need to scan through several files until it finds the desired record.

Read more about LSM here.


What Makes Parrot Different?

Parrot isn’t just another LSM tree engine. It includes several optimizations aimed at speeding up I/O and improving concurrency.

zero disk architecture

Introducing OrangeDB

Orange is a distributed key-document database built on top of Parrot. It provides:


Sharding, Consistency, and Replication

OrangeDB combines the architectures of MongoDB and Cassandra, striking a balance between high availability and consistency.

An Orange cluster consists of three main components:

Read Consistency Levels

OrangeDB supports configurable read strategies:

Deployment Options

OrangeDB is available in multiple flavors.

Standalone Mode

Run a single OrangeDB instance with gRPC endpoints. Use the Go client to send OQL or gRPC-based queries.

app % orange server --port 8000
  ___                             
 / _ \ _ __ __ _ _ __   __ _  ___ 
| | | | '__/ _' | '_ \ / _' |/ _ \
| |_| | | | (_| | | | | (_| |  __/
 \___/|_|  \__,_|_| |_|\__, |\___|
                       |___/      
Running project: `orange`
Starting server at 127.0.0.1:8000 

You can interact with it using the REPL:

app % orange repl --port 8000 --address localhost
# Welcome to orange
> SELECT * FROM users WHERE _ID = 10

Sharded Cluster on Kubernetes

Step 1: Deploy OrangeDB Operator

kubectl apply -f https://raw.githubusercontent.com/nagarajRPoojari/orangectl/main/dist/install.yaml

Step 2: Define Orange Cluster

# orange.yaml
apiVersion: ctl.orangectl.orange.db/v1alpha1
kind: OrangeCtl
metadata:
  name: orangectl-sample
  labels:
    app.kubernetes.io/name: orangectl
    app.kubernetes.io/managed-by: kustomize
spec:
  namespace: default

  router:
    name: router
    labels:
      app: orange-router
      tier: control
    image: np137270/gateway:latest
    port: 8000
    config:
      ROUTING_MODE: "hash"
      LOG_LEVEL: "info"

  shard:
    name: shard
    labels:
      app: orange-shard
      tier: data
    image: np137270/orange:latest
    count: 2                # Number of shards
    replicas: 2             # Replicas per shard
    port: 52001
    config:
      STORAGE_PATH: "/app/data"
      CACHE_ENABLED: "true"
kubectl apply -f orange.yaml
kubectl port-forward svc/router 8000:8000

Step 3: Try Some Queries

echo "Creating schema..."
curl -X POST http://localhost:8000/  \
     -H "Content-Type: application/json" \
     -d '{"query": "create document test {\"name\":\"STRING\"}"}'

echo "Inserting documents from ID 1 to 20..."
for i in $(seq 1 20); do
  curl -s -X POST http://localhost:8000/ \
       -H "Content-Type: application/json" \
       -d "{\"query\": \"insert value into test  {\\\"_ID\\\": $i, \\\"name\\\": \\\"hello-$i\\\"}\"}"
done

echo "Searching for document with _ID = 13..."
curl -X POST http://localhost:8000/  \
     -H "Content-Type: application/json" \
     -d '{"query": "select * from test where _ID = 13"}'

What’s Next?

OrangeDB is still evolving. We’re working on:

Feedback, contributions, and critiques are welcome.