Methodology

Built from a chat, line by line.

RAGTAG (Retrieval Augmented Graph Tax Answer Generator) takes a tax question, retrieves the relevant Finnish statutes, court rulings and Vero guidance, and answers with citations. Every decision below is tagged either to a paper or to a numbered point from Taxxa’s team chat on 23.05. We will present the demo live.

Chat · 23.05

  • Pricing#01

    €60 per user per month. Queries have to be cheap.

  • Corpus#02

    Connect Finlex, Vero, case law. EU-lex out of scope.

  • Authority#03

    Case laws refer to Finlex. Vero is just an interpreter. Case laws can overwrite Vero.

  • Retrieval#04

    Can't bring 1,000 chunks per question. 25M-page DB.

  • Timeline#05

    Active now, not then, not later.

  • Stack#06

    Run RAG locally first. DeepSeek is good and cheap.

  • Extraction#07

    Reference-extraction by regex / NLP is a good idea. We aren't doing it.

Part I

What we tried. And what chat point killed it.

01

Bitemporal graph on Neo4j every edge with its own validity window.

Schema from SAT-Graph RAG (JURIX 2025). The whole ontology in a graph database.

  • SAT-Graph RAG · arXiv:2505.00039
  • LRMoo · IFLA
Killed by#01

Two days into the ontology, still zero answers. We kept the bitemporal idea and rebuilt it in SQLite as version_chain (see RAGTAG #08).

02

BGE-M3 hybrid retrieval dense + sparse + ColBERT, fused at k=60.

BAAI's multilingual model with three signal heads. RRF for fusion, HyDE for English to Finnish query expansion.

  • BGE-M3 · BAAI
  • ColBERT · SIGIR 2020
  • RRF · SIGIR 2009
  • HyDE · Gao et al. 2022
Killed by#01

Voyage voyage-3-large already ranked the right chunk in the top 30 on every eval question. Adding a second stack failed the cost test.

03

Courtroom-style debate prosecutor, defense, judge.

Three LLMs arguing per conflict, pattern from AgenticSimLaw.

  • AgenticSimLaw · arXiv:2601.21936
  • Multi-Agent Debate · NeurIPS 2023
Killed by#03#01

Chat #03 handed us the rule directly. One integer compare resolves every conflict in our eval set; seven LLM turns cost real money.

04

63,660-node constellation WebGL force layout of the whole corpus.

Attribution view inspired by Anthropic Circuit Tracer.

  • @cosmos.gl/graph
  • Anthropic Circuit Tracing · 2025
Killed by#04

Judges need one answer's reasoning, not the corpus shape. The reasoning panel animates the 5 to 10 nodes that actually mattered.

05

Live SPARQL fallback CRAG escalation into Semantic Finlex.

Self-RAG reflection tokens to enforce citation coverage at draft time.

  • CRAG · arXiv:2401.15884
  • Self-RAG · ICLR 2024
  • data.finlex.fi/sparql
Killed by#01

Cold SPARQL hit 9 seconds on the public endpoint. We can surface 'unsure' offline via AmendmentCaveat instead (RAGTAG #08).

06

EU-lex contradictions primacy of EU law over national act.

Ingest EUR-Lex directives, model a transposes edge, surface EU vs national conflicts.

  • EUR-Lex · hierarchy of norms
  • Lex superior · UN OLA
Killed by#02

Chat #02 declared this out of scope. Adding it later is not an architectural rewrite: the `transposes` edge type is already in the EdgeType enum, the authority-rank lattice extends to an EU tier with one number, and the strategy router treats it as another `cross_source` route. New corpus, not new infrastructure.

Part II

RAGTAG. Ten pieces. Each cites file paths or papers.

01

RAGTAG

Three deterministic extraction passes structural, anchor, regex.

Edges are emitted by three rule-based passes over HTML and the document tree: structural (parent_of from the heading hierarchy), anchor (cross-references inside an <a href> attribute), and regex (text citations like '§ 102 AVL', 'KHO 2025:46'). No model in the batch graph build.

  • src/extraction/structural_edges.py
  • src/extraction/anchor_edges.py
  • src/extraction/citations_regex.py
  • src/extraction/definition_edges.py
Answers#07
02

RAGTAG

Two SQLite tables plus a local LanceDB nodes, edges, chunks. Joined by section_id.

1,967,776 nodes and 2,180,769 typed edges live in two SQLite tables (nodes, edges). 402,088 embedded chunks live in LanceDB on the filesystem (no remote service). Everything joins on section_id with an O(1) lookup.

  • scripts/load_graph.py (nodes, edges CREATE TABLE)
  • findings/04a_index_sanity.md (402,088 chunks)
  • findings/04b_load_report.md (1.97M nodes, 2.18M edges)
  • src/indexing/vector_store.py (LanceDB)
Answers#01
03

RAGTAG

Section-anchored chunking 800 to 1,500 tokens, 2,000 hard max, never split mid-citation.

The chunk unit is the SECTION (§). Children are greedily packed under their § head and never split across sentence, item, or citation boundaries. The result: every chunk carries its own legal anchor.

  • pipeline/chunks.py
Answers#04
04

RAGTAG

Multilingual embeddings via Voyage voyage-3-large, 1,024-dim, asymmetric query / document.

Hosted but cheap. Asymmetric (input_type='query' vs 'document') to avoid the quality cliff Voyage warns about. Same embedding space carries Finnish, Swedish, and English.

  • src/indexing/voyage_client.py (MODEL = voyage-3-large)
  • voyageai.com
Answers#02
05

RAGTAG

Strategy router, six presets case_law, recency, definition, cross_source, multi_hop, default.

A keyword and regex classifier on the question text picks one ExpansionStrategy. Each preset sets seed depth, edge types, BFS direction, max hops, and per-edge degree caps. Default falls back to vector-only retrieval.

  • src/retrieval/strategy.py
Answers#04
06

RAGTAG

Bounded BFS with hub-skip interprets_in > 30, cites_out > 15, parent_of_in > 50.

Default max_hops = 1. Hub nodes (widely cited statutes) are not expanded through. Final candidate set is truncated to fit a 25k-token context.

  • src/retrieval/graph_expand.py
Answers#04
07

RAGTAG

Two reranking paths, one cross-encoder v2 uses bge-reranker-v2-m3, v1 uses metadata signals.

v2 runs BAAI/bge-reranker-v2-m3 (a multilingual cross-encoder) over 30 to 40 candidates and combines 0.6 cross-encoder + 0.3 cosine + 0.1 metadata. v1 (the default path in the API sidecar today) uses a metadata reranker: authority_rank, recency, term overlap.

  • src/retrieval/cross_encoder_rerank.py (bge-reranker-v2-m3)
  • src/retrieval/rerank.py (metadata reranker)
Answers#04
08

RAGTAG

Temporal correctness version_chain, as_of, AmendmentCaveat. All deterministic.

Every SECTION carries a chronological version_chain of muutetaan, kumotaan, lisätään steps. GraphStore.text_at(section_id, as_of) plays it forward. Every cited chunk on a stale ancestor emits an AmendmentCaveat (suspect, stale, repealed). A separate check_temporal_mismatches function compares the drafted answer against the section's chain via difflib; no LLM in this check.

  • src/indexing/graph_store.py (text_at)
  • src/models.py (VersionStep, AmendmentCaveat)
  • src/agents/verifier.py (check_temporal_mismatches)
  • src/retrieval/pipeline_v2.py (wires both)
Answers#05
09

RAGTAG

Authority is one integer Finlex 100, Treaty 90, KHO 80, Vero 60.

Ranks are assigned at ingestion from source / source_subcorpus and stored on every node. Conflict surfacing compares the integer; the team's lattice (Finlex over Vero, KHO can overwrite Vero) drops out of this directly.

  • src/extraction/authority.py
  • findings/03_authority_ranks.md
  • Lex superior · UN OLA
Answers#03
10

RAGTAG

Generation via DeepSeek-V4-Flash hosted on Featherless. Query-rewrite is cached.

The drafter is deepseek-ai/DeepSeek-V4-Flash served via Featherless. Per-question query rewrites are cached in process, which lowers cost on repeated framings. Per-query answers are not cached today; a localStorage history in the Next.js UI lets the user recall past questions but does not skip the call.

  • src/retrieval/generate.py (MODEL = deepseek-ai/DeepSeek-V4-Flash)
  • src/retrieval/query_rewrite.py (in-process cache)
  • Featherless · featherless.ai
Answers#06#01

Receipts

Q1
Capital income > €30k

Single cite, TVL § 124. No graph hop. The baseline case.

Q12
Meal voucher VAT

Three cites: KHO 2025:46, KVL:004/2024, Vero ohje. Two graph hops via cites and interprets.

Q41
Avainhenkilö 48 vs 84 months

Rank-100 Finlex statute outranks rank-60 Vero kannanotto. Verifier picks Finlex.

Cost
Local · hosted · brief cap

€0.005 local · €0.04 hosted · €1 brief cap. The cost meter UI is a char-count heuristic, not API billing.

From the corpus

two things we found

Mojibake recovered through the graph

About 1.7% of chunks were double-encoded: the HTML sniffer mis-detected UTF-8 as Latin-1 and produced 'päätös → pรครคtรถs'-style chunks in LanceDB. We caught it by tracing RAG hits back to source files, fixed the parse layer to force UTF-8, and re-embedded the affected slice. The graph spine made the recovery surgical, not corpus-wide.

source · scripts/reingest_corrupted_chunks.py · pipeline/html_utils.py

Not every tax question is in the law

Eval question N49 asks about the account-number range commonly used for trade receivables and payables (myyntisaamiset / ostovelat) in the Finnish chart of accounts. Our system returned the correct legal answer (no universally binding range exists), which did not match the question-bank reference. The reference traces to KILA practice and platform-specific defaults, not Finlex. Honest UX would surface that the law is silent here and the convention lives elsewhere.

source · eval/questions.json · question N49

References

  • SAT-Graph RAGde Martim, JURIX 2025 · arXiv:2505.00039
  • TG-RAGHan et al., 2025 · arXiv:2510.13590
  • LRMoo v1.1.1IFLA, 2026 · cidoc-crm.org/LRMoo
  • Semantic FinlexSeCo Aalto + MoJ · data.finlex.fi/sparql
  • Self-RAGAsai et al., 2024 · ICLR 2024
  • CRAGYan et al., 2024 · arXiv:2401.15884
  • HyDEGao et al., 2022 · Precise Zero-Shot Dense Retrieval
  • AgenticSimLawJan 2026 · arXiv:2601.21936
  • Multi-Agent DebateDu et al., NeurIPS 2023 · arXiv:2305.14325
  • BGE-M3Chen et al., BAAI · bge-model.com
  • bge-reranker-v2-m3BAAI · bge-model.com
  • ColBERTKhattab + Zaharia · SIGIR 2020
  • RRFCormack et al. · SIGIR 2009
  • Voyage voyage-3-largeVoyage AI · voyageai.com
  • Anthropic Circuit TracingAnthropic, 2025 · transformer-circuits.pub
  • UN OLA · lex superiorUN Office of Legal Affairs · a_cn4_l682.pdf

Timeline · phone-friendly

The same story as a two-minute scroll on your phone. ragtag-timeline.vercel.app