Building a RAG Pipeline
Add your knowledge base to Liya Engine and tune retrieval for your use case.
Liya Engine includes a built-in RAG (Retrieval-Augmented Generation) pipeline. When enabled, the engine retrieves relevant content from your tenant knowledge base before generating a response, grounding answers in your proprietary data.
Enable RAG
RAG is controlled by the enable_rag feature flag. Update it via PATCH /dashboard/account/config:
Once enabled, all intents in your enabled domains will query the knowledge base.
Knowledge sources
Knowledge is organised by domain and source type. Built-in source types for the hiring domain:
| Source type | Description |
|---|---|
job_descriptions | Role specifications, requirements, responsibilities |
company_profiles | Company overviews, values, culture notes |
resume_templates | Ideal resume formats and standards |
interview_questions | Question banks per role or level |
career_frameworks | Competency levels, progression paths |
industry_standards | Skills taxonomies, certifications, benchmarks |
Upload and manage knowledge sources in the Knowledge section of the dashboard, or via the Knowledge API (see Knowledge API).
RAG configuration
Configure retrieval behaviour via ragConfig in PATCH /dashboard/account/config:
Retrieval modes
| Mode | Description |
|---|---|
semantic | Dense vector search only — best for conceptual queries |
keyword | BM25 full-text search only — best for exact term matching |
hybrid | Combines semantic + BM25 with score fusion — recommended default |
Key parameters
| Parameter | Default | Description |
|---|---|---|
top_k | 5 | Number of chunks retrieved before reranking |
score_threshold | 0.72 | Minimum similarity score — chunks below this are discarded |
rerank | false | Enable cross-encoder reranking of retrieved chunks |
rerank_top_n | 3 | Number of chunks kept after reranking |
chunk_size | 512 | Target tokens per chunk |
chunk_overlap | 64 | Token overlap between adjacent chunks |
chunking_strategy | "fixed" | "fixed" or "semantic" — semantic splits on sentence boundaries |
Tuning retrieval quality
Start with hybrid + reranking
For most use cases:
Retrieve more candidates (top_k: 8) then let the reranker select the most relevant 4. This trades a small amount of latency for significantly better precision.
Lower the threshold for recall-heavy use cases
If retrieval is missing relevant content, lower score_threshold:
Use semantic chunking for long documents
For documents like job descriptions or policy manuals, semantic chunking respects natural boundaries:
RAG per domain
RAG is scoped to the domain of each request. A hiring intent only retrieves from hiring knowledge sources — it never queries fintech or other domain sources. This prevents cross-domain contamination.
For custom domains, RAG requires enable_rag: true in the domain's feature config and at least one knowledge source uploaded. See Custom Domain Knowledge Sources.
Disabling RAG per request
Pass enable_rag: false in the request options to bypass retrieval for a single call:
Useful when the query doesn't benefit from your knowledge base (e.g. general chat).