Lewati ke konten utama
Versi: Saat ini

Connectors

A connector tells Honeyframe how to reach an external system — a database, an object store, a vector index, an LLM API, or a webhook target. Connectors are configured once at the project level and reused across datasets, recipes, and agent tools.

The connector implementations live under paas/backend/connectors/. Each connector is registered in a central registry; the catalog endpoint (GET /api/connectors/catalog) returns every type the running platform supports along with its config schema.

Queryable vs non-queryable

Connectors split into two groups based on the is_queryable flag.

Queryable — can be the source of a dataset or a SQL query:

  • PostgreSQL (postgresql) — also the platform's own metadata store.
  • MySQL (mysql) and MariaDB (mariadb).
  • Microsoft SQL Server (mssql).
  • Oracle (oracle) — schema/table names must be uppercase.
  • Snowflake (snowflake).
  • BigQuery (bigquery).
  • MongoDB (mongodb) — document store, queryable through the dataset surface.
  • Elasticsearch (elasticsearch) — full-text search and aggregations.
  • REST API (api_rest) — generic HTTP connector for sources without a first-class driver.
  • CSV upload (csv) — accepts user-uploaded CSV/Excel and persists rows as a managed dataset. Subject to the nginx client_max_body_size (default 200 MB).
  • Object storages3_storage, gcs_storage, oss_storage. Queryable through the Lakehouse layer (DuckDB over Delta/Parquet), not direct SQL.

Non-queryable — cannot back a dataset; used by other surfaces (agents, knowledge bases, automation):

  • LLM providersopenai_llm, anthropic_llm, ollama_llm. Used by Agent Builder, the SQL chat surface, and Knowledge Base retrieval.
  • Vector storeschroma, faiss. Persist embeddings for the Knowledge Base.
  • Orchestrationn8n_webhook (fire events at an n8n workflow).
  • Messagingtwilio_messaging (used by the send_whatsapp agent tool).

Legacy type aliases (e.g. rdspostgresql) are recognised by the registry for backward compatibility with older installs. New connectors should use the canonical type names listed above.

Configuring a connector

In the platform UI:

  1. Open the Connectors page in the sidebar (org admins only).
  2. Click + New Connector and pick a type from the catalog.
  3. Fill in the connection parameters. Sensitive fields (passwords, API keys, service-account JSON) are encrypted at rest using the org's licensed sign key.
  4. Test verifies the connection without saving. The result lands in the test history pane.
  5. Save writes the connector to the data_connectors table. It becomes selectable when creating datasets, recipes, or agent tools.

Programmatically:

curl -X POST https://platform.your-domain.com/api/connectors \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Production Postgres",
"type": "postgresql",
"config": {
"host": "db.example.com",
"port": 5432,
"database": "analytics",
"username": "honeyframe_ro",
"password": "<redacted>",
"sslmode": "require"
},
"output_schemas": ["public", "marts"]
}'

output_schemas is the list of schemas the platform may discover and read from. Leave it empty to default to the connector's "owned" schemas; set it explicitly to restrict what shows up in the dataset browser.

Auto-sync schedule

Queryable connectors can be created with an attached schedule via the bootstrap endpoint:

curl -X POST https://platform.your-domain.com/api/connectors/bootstrap \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Production Postgres",
"type": "postgresql",
"config": {...},
"schedule_cron": "0 2 * * *",
"schedule_mode": "incremental"
}'

schedule_mode is incremental (only changed rows since the last successful run, requires a watermark column) or full (truncate + reload). The scheduler runs every minute; the cron expression decides which connectors fire. Skipped runs are logged but not retried — the next cron tick is the next chance.

Permission model

Connectors are project-level resources with no per-connector ACL. Anyone with project access can see the list of connectors and reference them when building datasets. Sharing of the data flows through datasets — see Users & Groups — not connectors.

The connector is the credential. The dataset is the unit of access control.

API reference

EndpointDescription
GET /api/connectors/catalogAvailable connector types with config schemas. Use this to render a creation form.
GET /api/connectorsList active connectors in the project.
GET /api/connectors/{id}One connector's full config (sensitive fields are returned masked).
POST /api/connectorsCreate.
POST /api/connectors/bootstrapCreate + attach a schedule in one call.
PATCH /api/connectors/{id}Update name, description, config, or schedule.
DELETE /api/connectors/{id}Remove. Datasets that reference the connector keep their cached schemas but can no longer sync.
POST /api/connectors/{id}/testVerify connectivity without saving. Returns {ok, latency_ms, error?}.
GET /api/connectors/{id}/modelsDiscover tables/collections the connector exposes — used by the dataset browser.

Authentication uses the same JWT format as the rest of the API — see Authentication under the Developer property.

Adding a new connector type

  1. Create paas/backend/connectors/<name>.py and subclass the appropriate base (SQLBase, StorageBase, LLMBase, VectorStoreBase, or BaseConnector for a one-off).
  2. Implement the required methods (test, read_schema, read_rows, etc. — pattern off the existing connectors).
  3. Register the class in paas/backend/connectors/registry.py. Set is_queryable=False if it should not appear in the dataset browser.
  4. Add a config schema entry; the catalog endpoint serves it directly to the frontend, so no separate frontend form code is needed.
  5. Add a unit test covering test() and one read or write path.

The connector framework does not require a server restart for new types — registry registration happens at module import time, which means the next process start picks the new type up.

Connection pooling

SQL connectors maintain per-process connection pools. The defaults are conservative (5 idle, 20 max) and tuned for the platform's mostly-read workload. Tune on a per-connector basis via the connector config:

{
"pool_size": 10,
"max_overflow": 30,
"pool_recycle": 1800
}

Object-storage and HTTP connectors use the underlying SDK's pooling — typically a per-thread session.