TruthAnchor - 기술 문서

4계층 아키텍처 상세

TruthAnchor는 모든 요청을 4개의 독립된 계층을 순차적으로 통과시켜 할루시네이션을 방지합니다. 각 계층은 독립적으로 운영되며, 장애 시 Circuit Breaker를 통해 Graceful Degradation이 적용됩니다.

사용자 입력을 전처리하고 안전성을 검증합니다.

Intent Classification - 사용자 의도를 분류 (금융 질의, 일반 대화, 불법 요청 등)
PII Masking - 주민등록번호, 전화번호, 계좌번호 등 개인정보를 자동 감지 및 마스킹
Prompt Injection Detection - 악의적인 프롬프트 인젝션 공격을 탐지하고 차단
Input Validation - 입력 길이, 형식, 인코딩 검증

services/input-governance/

RAG를 통해 관련 근거를 검색하고, LLM으로 응답을 생성합니다.

Hybrid RAG - BM25 + Vector Search (Qdrant) 결합 검색
LLM Generation - Claude/GPT를 통한 응답 생성 (멀티 모델 폴백)
Citation Linker - 응답의 각 문장에 출처(citation)를 연결
Domain Adapter - 도메인별 특화 프롬프트 적용

services/evidence-generation/

생성된 응답의 정확성과 규정 준수를 검증합니다.

Compliance Guardrails - 금융 규정 준수 검증 (8개 규칙)
Numeric Verification - 금리, 환율 등 수치 데이터의 정확성 검증
Uncertainty Scoring - 응답의 불확실성 점수 계산
Citation Coverage - 응답 내 근거 인용 범위 측정

services/output-verification/

위험 응답을 전문가에게 에스컬레이션하고 감사 로그를 기록합니다.

Audit Logger - 모든 요청/응답의 감사 이력 기록
Escalation Router - 불확실성 임계값 초과 시 HITL(Human-in-the-Loop) 라우팅
Escalation Policy - 도메인별 에스컬레이션 정책 관리

services/escalation/

API Reference

Base URL: https://your-domain.com

모든 API 호출에는 X-API-Key 헤더가 필요합니다. (Health, Metrics 제외)

POST /api/v1/chat

채팅 요청을 처리합니다. 동기 또는 스트리밍 모드를 지원합니다.

Request Body

Field	Type	Required	설명
`message`	string	Yes	사용자 메시지
`domain`	string	No	도메인 (banking, insurance, securities 등). 기본값: banking
`conversation_id`	string	No	대화 세션 ID. 생략 시 자동 생성
`stream`	boolean	No	스트리밍 모드 활성화. 기본값: false
`metadata`	object	No	추가 메타데이터 (커스텀 필드)

Response (200 OK)

{
  "answer": "응답 텍스트...",
  "confidence": 0.94,
  "uncertainty_score": 0.06,
  "citations": [
    {
      "source": "document.pdf",
      "page": 3,
      "text": "인용된 원문...",
      "relevance_score": 0.92
    }
  ],
  "guardrail_passed": true,
  "guardrail_details": {
    "triggered_rules": [],
    "blocked": false
  },
  "model": "claude-sonnet-4-20250514",
  "processing_time_ms": 142,
  "conversation_id": "conv-001",
  "message_id": "msg-abc123",
  "layers": {
    "l1_input_governance": {"status": "pass", "duration_ms": 12},
    "l2_evidence_generation": {"status": "pass", "duration_ms": 98},
    "l3_output_verification": {"status": "pass", "duration_ms": 28},
    "l4_escalation": {"status": "none", "duration_ms": 4}
  }
}

Error Codes

Status	Code	설명
400	INVALID_REQUEST	잘못된 요청 형식
401	UNAUTHORIZED	유효하지 않은 API Key
429	RATE_LIMITED	분당 요청 한도 초과
500	INTERNAL_ERROR	서버 내부 오류
503	SERVICE_DEGRADED	서비스 저하 상태 (Circuit Breaker)

POST /api/v1/chat/feedback

응답에 대한 사용자 피드백을 전송합니다.

Request Body

Field	Type	Required	설명
`conversation_id`	string	Yes	대화 세션 ID
`message_id`	string	Yes	응답 메시지 ID
`rating`	string	Yes	"good" 또는 "bad"
`comment`	string	No	자유 형식 코멘트

Response (200 OK)

{"status": "ok", "feedback_id": "fb-xyz789"}

GET /api/v1/health

시스템 상태를 계층별로 확인합니다. 인증 불필요.

Response (200 OK)

{
  "status": "healthy",
  "version": "1.0.0",
  "layers": {
    "input_governance": {"status": "healthy", "latency_ms": 2},
    "evidence_generation": {"status": "healthy", "latency_ms": 5},
    "output_verification": {"status": "healthy", "latency_ms": 3},
    "escalation": {"status": "healthy", "latency_ms": 1}
  },
  "components": {
    "qdrant": "connected",
    "neo4j": "connected",
    "postgresql": "connected",
    "redis": "connected",
    "kafka": "connected"
  },
  "timestamp": "2026-03-11T09:00:00Z"
}

GET /api/v1/metrics

Prometheus 형식의 메트릭을 반환합니다. 인증 불필요.

Response (200 OK, text/plain)

# HELP truthanchor_requests_total Total number of requests
# TYPE truthanchor_requests_total counter
truthanchor_requests_total{method="POST",endpoint="/api/v1/chat"} 12847

# HELP truthanchor_latency_seconds Request latency in seconds
# TYPE truthanchor_latency_seconds histogram
truthanchor_latency_seconds_bucket{le="0.1"} 9821
truthanchor_latency_seconds_bucket{le="0.2"} 12102
truthanchor_latency_seconds_bucket{le="0.5"} 12780

# HELP truthanchor_hallucination_detected_total Hallucinations detected
# TYPE truthanchor_hallucination_detected_total counter
truthanchor_hallucination_detected_total 47

# HELP truthanchor_guardrail_triggered_total Guardrail rule triggers
# TYPE truthanchor_guardrail_triggered_total counter
truthanchor_guardrail_triggered_total{rule="CG-001"} 23

불확실성 점수 (Uncertainty Score)

불확실성 점수는 LLM 응답의 신뢰도를 0.0 ~ 1.0 범위로 정량화합니다. 점수가 낮을수록 응답이 더 신뢰할 수 있음을 의미합니다.

계산 공식

U = w₁ * (1 - RAG_Relevance) + w₂ * (1 - Citation_Coverage) + w₃ * Semantic_Entropy + w₄ * Domain_Penalty

구성요소	가중치	설명
`RAG_Relevance`	w₁ = 0.30	검색된 문서의 질의 관련성 평균 점수
`Citation_Coverage`	w₂ = 0.25	응답 문장 중 근거 인용이 있는 문장의 비율
`Semantic_Entropy`	w₃ = 0.25	여러 응답 후보 간 의미적 분산도
`Domain_Penalty`	w₄ = 0.20	도메인 외 질문에 대한 패널티

임계값 기준

범위	동작
0.0 ~ 0.3	높은 신뢰 - 응답 그대로 반환
0.3 ~ 0.5	중간 신뢰 - 면책 문구 추가
0.5 ~ 0.7	낮은 신뢰 - 경고 표시 및 추가 검증 권고
0.7 ~ 1.0	매우 낮은 신뢰 - 전문가 에스컬레이션 (HITL)

Citation Coverage 메커니즘

Citation Coverage는 LLM 응답의 각 문장이 RAG 검색 결과에 의해 뒷받침되는지 측정합니다.

처리 과정

LLM 응답을 문장 단위로 분할합니다.
각 문장을 RAG에서 검색된 문서 청크와 의미적 유사도를 비교합니다.
유사도가 임계값(0.75) 이상인 청크를 해당 문장의 citation으로 연결합니다.
Citation이 연결된 문장 수 / 전체 문장 수 = Citation Coverage 비율

Citation Coverage = (Citation이 있는 문장 수) / (전체 응답 문장 수)

목표 Citation Coverage: 98% 이상 (Phase 3 기준)

Guardrail Rules (CG-001 ~ CG-008)

가드레일 규칙은 LLM 응답이 금융 규정 및 안전성 기준을 충족하는지 검증합니다. 규칙은 config/guardrails/에 YAML로 정의됩니다.

ID	규칙명	심각도	설명
`CG-001`	수익 보장 차단	Critical	"확정 수익", "보장된 이익" 등 수익을 보장하는 표현을 차단합니다.
`CG-002`	투자 면책 문구	High	투자 관련 응답에 면책 문구가 포함되어 있는지 확인합니다.
`CG-003`	금리 정확성 검증	High	언급된 금리가 허용 오차 범위(±0.5%p) 내인지 검증합니다.
`CG-004`	PII 유출 방지	Critical	응답에 마스킹되지 않은 개인정보가 포함되지 않도록 합니다.
`CG-005`	비금융 질의 대응	Medium	금융 도메인 외 질문에 대해 적절한 안내 메시지를 반환합니다.
`CG-006`	프롬프트 인젝션 방어	Critical	시스템 프롬프트 유출, 역할 변경 시도 등을 차단합니다.
`CG-007`	고액 거래 에스컬레이션	High	일정 금액(1억원) 이상의 거래 관련 질문을 전문가에게 라우팅합니다.
`CG-008`	환율 정확성 검증	Medium	환율 데이터의 시점과 정확성을 검증합니다.

Circuit Breaker & Graceful Degradation

TruthAnchor는 외부 서비스 장애 시 Circuit Breaker 패턴을 적용하여 서비스 안정성을 보장합니다.

Circuit Breaker 상태

상태	설명	동작
CLOSED	정상 상태	모든 요청을 정상 처리합니다.
HALF-OPEN	복구 시도 중	제한된 수의 요청을 테스트로 전달하여 복구 여부를 확인합니다.
OPEN	장애 발생	외부 서비스 호출을 중단하고 폴백 응답을 반환합니다.

Graceful Degradation 단계

Level 0 (Normal) - 모든 기능 정상 동작
Level 1 (Primary LLM Fallback) - 주 LLM 장애 시 보조 LLM으로 전환
Level 2 (Cache-Only) - LLM 전면 장애 시 캐시된 응답만 반환
Level 3 (Static Response) - 모든 외부 서비스 장애 시 정적 안내 메시지 반환

SDK Integration Guide (Python)

설치

pip install truthanchor-sdk

기본 사용법

from truthanchor import TruthAnchorClient

client = TruthAnchorClient(
    api_key="ta_xxxxxxxxxxxx",
    base_url="https://your-domain.com"
)

# 동기 요청
response = client.chat(
    message="예금 금리 비교해 주세요",
    domain="banking"
)
print(response.answer)
print(f"신뢰도: {response.confidence}")
print(f"불확실성: {response.uncertainty_score}")

# 스트리밍 요청
for chunk in client.chat_stream(message="대출 한도 조회"):
    print(chunk.text, end="", flush=True)

# 피드백 전송
client.send_feedback(
    conversation_id=response.conversation_id,
    message_id=response.message_id,
    rating="good"
)

비동기 사용법

import asyncio
from truthanchor import AsyncTruthAnchorClient

async def main():
    client = AsyncTruthAnchorClient(api_key="ta_xxxxxxxxxxxx")
    response = await client.chat(message="보험 상품 추천")
    print(response.answer)

asyncio.run(main())

에러 처리

from truthanchor.exceptions import (
    RateLimitError,
    AuthenticationError,
    ServiceDegradedError
)

try:
    response = client.chat(message="질문")
except RateLimitError as e:
    print(f"Rate limit exceeded. Retry after {e.retry_after}s")
except AuthenticationError:
    print("Invalid API key")
except ServiceDegradedError as e:
    print(f"Service degraded: {e.degradation_level}")

Webhook & Event 설정

TruthAnchor는 Kafka를 통해 내부 이벤트를 발행하며, 외부 시스템에 Webhook으로 알림을 전송할 수 있습니다.

이벤트 유형

이벤트	Topic	설명
`escalation.created`	truthanchor.escalation	에스컬레이션 발생 시
`guardrail.triggered`	truthanchor.guardrail	가드레일 규칙 트리거 시
`circuit_breaker.opened`	truthanchor.system	Circuit Breaker 오픈 시
`usage.threshold`	truthanchor.billing	사용량 임계값 도달 시 (80%, 100%)

Webhook Payload 예제

{
  "event": "escalation.created",
  "timestamp": "2026-03-11T09:15:30Z",
  "data": {
    "case_id": "esc-001",
    "reason": "uncertainty_threshold_exceeded",
    "uncertainty_score": 0.82,
    "conversation_id": "conv-123",
    "priority": "high"
  }
}

Rate Limiting & Security

Rate Limiting

플랜	RPM (분당 요청)	월 요청 한도	API Key 수
Free	10	500	2
Starter	60	5,000	5
Enterprise	300	50,000	20

보안 헤더

Rate Limit 관련 응답 헤더:

X-RateLimit-Limit: 60          # 분당 허용 요청 수
X-RateLimit-Remaining: 45      # 남은 요청 수
X-RateLimit-Reset: 1710147600  # 리셋 타임스탬프 (Unix)

보안 조치

모든 통신은 TLS 1.3으로 암호화됩니다.
API Key는 SHA-256으로 해시되어 저장됩니다.
입력 데이터의 PII는 처리 전 마스킹됩니다.
감사 로그는 PostgreSQL에 30일간 보관됩니다.
프롬프트 인젝션 공격은 Layer 1에서 실시간 탐지/차단됩니다.
반복된 인증 실패 시 IP 기반 임시 차단이 적용됩니다.