Liya Engine
Guides

Building a RAG Pipeline

Add your knowledge base to Liya Engine and tune retrieval for your use case.

Liya Engine includes a built-in RAG (Retrieval-Augmented Generation) pipeline. When enabled, the engine retrieves relevant content from your tenant knowledge base before generating a response, grounding answers in your proprietary data.


Enable RAG

RAG is controlled by the enable_rag feature flag. Update it via PATCH /dashboard/account/config:

PATCH https://api.liyaengine.ai/dashboard/account/config
Authorization: Bearer <jwt>
Content-Type: application/json
 
{
  "featureConfig": {
    "enable_rag": true
  }
}

Once enabled, all intents in your enabled domains will query the knowledge base.


Knowledge sources

Knowledge is organised by domain and source type. Built-in source types for the hiring domain:

Source typeDescription
job_descriptionsRole specifications, requirements, responsibilities
company_profilesCompany overviews, values, culture notes
resume_templatesIdeal resume formats and standards
interview_questionsQuestion banks per role or level
career_frameworksCompetency levels, progression paths
industry_standardsSkills taxonomies, certifications, benchmarks

Upload and manage knowledge sources in the Knowledge section of the dashboard, or via the Knowledge API (see Knowledge API).


RAG configuration

Configure retrieval behaviour via ragConfig in PATCH /dashboard/account/config:

{
  "ragConfig": {
    "retrieval_mode": "hybrid",
    "top_k": 5,
    "score_threshold": 0.72,
    "rerank": true,
    "rerank_top_n": 3,
    "chunk_size": 512,
    "chunk_overlap": 64,
    "chunking_strategy": "semantic"
  }
}

Retrieval modes

ModeDescription
semanticDense vector search only — best for conceptual queries
keywordBM25 full-text search only — best for exact term matching
hybridCombines semantic + BM25 with score fusion — recommended default

Key parameters

ParameterDefaultDescription
top_k5Number of chunks retrieved before reranking
score_threshold0.72Minimum similarity score — chunks below this are discarded
rerankfalseEnable cross-encoder reranking of retrieved chunks
rerank_top_n3Number of chunks kept after reranking
chunk_size512Target tokens per chunk
chunk_overlap64Token overlap between adjacent chunks
chunking_strategy"fixed""fixed" or "semantic" — semantic splits on sentence boundaries

Tuning retrieval quality

Start with hybrid + reranking

For most use cases:

{
  "retrieval_mode": "hybrid",
  "top_k": 8,
  "rerank": true,
  "rerank_top_n": 4,
  "score_threshold": 0.65
}

Retrieve more candidates (top_k: 8) then let the reranker select the most relevant 4. This trades a small amount of latency for significantly better precision.

Lower the threshold for recall-heavy use cases

If retrieval is missing relevant content, lower score_threshold:

{ "score_threshold": 0.55 }

Use semantic chunking for long documents

For documents like job descriptions or policy manuals, semantic chunking respects natural boundaries:

{
  "chunking_strategy": "semantic",
  "chunk_size": 768,
  "chunk_overlap": 128
}

RAG per domain

RAG is scoped to the domain of each request. A hiring intent only retrieves from hiring knowledge sources — it never queries fintech or other domain sources. This prevents cross-domain contamination.

For custom domains, RAG requires enable_rag: true in the domain's feature config and at least one knowledge source uploaded. See Custom Domain Knowledge Sources.


Disabling RAG per request

Pass enable_rag: false in the request options to bypass retrieval for a single call:

{
  "input": { ... },
  "options": {
    "enable_rag": false
  }
}

Useful when the query doesn't benefit from your knowledge base (e.g. general chat).

On this page