Skip to content

Rosetta Server (RAGFlow)

Who is this for? Engineers and maintainers working with the RAGFlow-based knowledge retrieval layer.

When should I read this? When you need to understand, configure, or debug the Rosetta Server API. For deployment, see Deployment.


RAGFlow Documentation (Tested)

Maintenance Rule

MUST update this document on new features and capabilities discovered and tested with exact specs but brief. Document what works and what does not (also briefly).

Source of Truth for This Section

Derived from code in:

Metadata Condition (Public API Shape)

For public API payloads/params, metadata_condition uses:

{
  "logic": "and",
  "conditions": [
    {
      "name": "tags",
      "comparison_operator": "contains",
      "value": "bootstrap"
    }
  ]
}

Notes:

Supported operators (from meta_filter):

Query and Filter Capabilities (Code-Derived)

GET /datasets/{dataset_id}/documents (sdk/doc.py) supports:

POST /retrieval (sdk/doc.py) supports:

POST /dify/retrieval (sdk/dify_retrieval.py) supports:

POST /document/list (document_app.py) supports:

Known Issue (Observed): Filter by Non-Existing Document Name Returns False “You Don’t Own” Error

Known Issue (Observed): Metadata Update Fails “You Don’t Own” Error

How to Call It (REST)

Use named parameters exactly as shown below.

Canonical list endpoint contract:

Canonical retrieval endpoint contract:

List documents with metadata filter:

curl -sS -X GET "$RAGFLOW_BASE_URL/api/v1/datasets/$DATASET_ID/documents" \
  -H "Authorization: Bearer $RAGFLOW_API_KEY" \
  --get \
  --data-urlencode "page=1" \
  --data-urlencode "page_size=50" \
  --data-urlencode "run=FAIL" \
  --data-urlencode "run=UNSTART" \
  --data-urlencode "suffix=md" \
  --data-urlencode "metadata_condition={\"logic\":\"and\",\"conditions\":[{\"name\":\"tags\",\"comparison_operator\":\"contains\",\"value\":\"bootstrap\"}]}"

Retrieval with metadata filter:

curl -sS -X POST "$RAGFLOW_BASE_URL/api/v1/retrieval" \
  -H "Authorization: Bearer $RAGFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "dataset_ids": ["'"$DATASET_ID"'"],
    "question": "bootstrap rules",
    "top_k": 20,
    "similarity_threshold": 0.2,
    "vector_similarity_weight": 0.3,
    "metadata_condition": {
      "logic": "and",
      "conditions": [
        {"name": "tags", "comparison_operator": "contains", "value": "bootstrap"}
      ]
    }
  }'

Compatibility Note: key/op/value on Public APIs

Public API metadata_condition.conditions[*] expects name/comparison_operator/value. Directly sending key/op/value in metadata_condition is not accepted by those endpoints.

Example (public payload that fails):

{
  "logic": "and",
  "conditions": [
    {
      "key": "tags",
      "op": "contains",
      "value": "bootstrap"
    }
  ]
}

Observed behavior:

Verified Behaviors

Works:

Does not work:

Python SDK Usage

Use cases in ragflow-sdk (from sdk/python/ragflow_sdk):

1) Standard list (high-level SDK, exposed):

Exact signature:

docs = dataset.list_documents(
    page=1,
    page_size=30,
    orderby="create_time",
    desc=True,
    keywords="bootstrap",
    create_time_from=0,
    create_time_to=0,
)

2) Retrieval (high-level SDK, metadata_condition exposed):

Exact signature:

chunks = rag.retrieve(
    dataset_ids=[dataset.id],
    question="bootstrap rules",
    top_k=20,
    similarity_threshold=0.2,
    vector_similarity_weight=0.3,
    metadata_condition={
        "logic": "and",
        "conditions": [
            {"name": "tags", "comparison_operator": "contains", "value": "bootstrap"}
        ],
    },
)

3) Advanced list filters not exposed in DataSet.list_documents():

import json

params = {
    "page": 1,
    "page_size": 50,
    "run": ["FAIL"],
    "suffix": ["md"],
    "metadata_condition": json.dumps({
        "logic": "and",
        "conditions": [
            {"name": "tags", "comparison_operator": "contains", "value": "bootstrap"}
        ],
    }),
}
res = dataset.get(f"/datasets/{dataset.id}/documents", params=params).json()
docs = res["data"]["docs"]

4) Dify retrieval endpoint is not wrapped by a dedicated high-level SDK method in this codebase:

Is Everything Exposed?

Short answer: no.

Ready-to-Use metadata_condition Template

{
  "logic": "and",
  "conditions": [
    {
      "name": "<metadata_field>",
      "comparison_operator": "<operator>",
      "value": "<value>"
    }
  ]
}

Rules:

RAGFlow Filter References

List Operation Filters, see refsrc/ragflow-*/agent/component/list_operations.py Metadata Filters, see refsrc/ragflow-*/common/metadata_utils.py See APIs (note, that doc for method do not reflect actual implementation): refsrc/ragflow-*/api/apps/sdk/doc.py , refsrc/ragflow-*/api/apps/sdk/dify_retrieval.py , refsrc/ragflow-*/api/apps/document_app.py And others.