用 NLI 框架檢驗技能覆蓋：從模糊判斷到結構化推理

21 January 2026
NLI,
Skill Verification,
Pydantic,
LLM,
RAG

用 NLI 框架檢驗技能覆蓋：從模糊判斷到結構化推理

「履歷素材真的能證明這個技能嗎？」這是一個 Natural Language Inference 問題。本文分享如何用 Pydantic 強制 Schema、Chain-of-Thought 推理、和 Embedding 初篩來建構可靠的技能驗證系統。

什麼是 NLI (Natural Language Inference)

NLI 是判斷兩個句子之間邏輯關係的任務：

Premise (前提):  "我用 Kubernetes 部署了 50 個微服務到生產環境"
Hypothesis (假設): "這個人有 Kubernetes 經驗"

判斷: 蘊含 (Entailment) → COVERED

三種關係：

Entailment (蘊含): 前提支持假設 → COVERED
Neutral (中立): 前提不能確定假設 → IMPLIED/WEAK
Contradiction (矛盾): 前提否定假設 → MISSING

為什麼履歷驗證是 NLI 問題

Premise = 你的履歷素材
Hypothesis = "候選人具備技能 X"
Task = 判斷 Entailment / Neutral / Contradiction

這比單純的關鍵字匹配更可靠：

方法	優點	缺點
關鍵字匹配	快速	「提到 K8s」≠「會用 K8s」
Embedding 相似度	語意理解	只看相似，不看邏輯關係
NLI	判斷邏輯支持	需要更強的模型

增強版 NLI 架構

Input: skill + evidence_texts
              ↓
┌─────────────────────────────────────┐
│ 1. Embedding Pre-filter             │
│    cosine(skill, evidence) >= 0.8   │
│    → 過濾無關證據，減少 token       │
└────────────────┬────────────────────┘
                 ↓
┌─────────────────────────────────────┐
│ 2. Chain-of-Thought Analysis        │
│    對每條證據獨立分析：              │
│    - supports_skill: bool           │
│    - reasoning: str                 │
└────────────────┬────────────────────┘
                 ↓
┌─────────────────────────────────────┐
│ 3. Final Judgment                   │
│    綜合所有證據分析做最終判斷        │
│    - status: COVERED/IMPLIED/WEAK/  │
│              MISSING                │
│    - confidence: 0-1                │
└─────────────────────────────────────┘

Pydantic 強制 Schema

用 Pydantic 定義輸出結構，確保 LLM 回傳格式一致：

class EvidenceAnalysis(BaseModel):
    """分析單條證據"""
    evidence_text: str = Field(description="被分析的證據")
    supports_skill: bool = Field(description="是否支持技能")
    reasoning: str = Field(description="為什麼支持/不支持")


class NLIVerificationResult(BaseModel):
    """完整 NLI 驗證結果"""
    # Chain-of-thought: 先分析每條證據
    evidence_analysis: list[EvidenceAnalysis]
    
    # 最終判斷
    supports: bool
    status: EntailmentStatus
    confidence: float = Field(ge=0.0, le=1.0)
    reason: str

為什麼用 Pydantic 而非 JSON 字串

# ❌ 舊方法：解析 JSON 字串，可能格式錯誤
result = json.loads(response)
status = result.get("status", "MISSING")  # 可能 typo: "COVERD"

# ✅ 新方法：Pydantic 強制驗證
structured_llm = llm.with_structured_output(NLIVerificationResult)
result = structured_llm.invoke(prompt)  # 保證類型正確

Chain-of-Thought Prompt

讓 LLM 先逐條分析，再做最終判斷：

NLI_PROMPT_TEMPLATE = """You are performing Natural Language Inference (NLI) 
to determine if resume evidence supports a skill hypothesis.

## Hypothesis
The candidate has demonstrated the skill: "{skill}"

## Evidence (Premise)
{evidence}

## Instructions
1. Analyze each piece of evidence separately
2. For each, determine if it supports the hypothesis
3. Aggregate to make a final judgment

## Status Definitions
- COVERED: Direct, explicit evidence (entailment)
- IMPLIED: Indirect evidence, can be inferred (weak entailment)
- WEAK: Mentioned but not demonstrated (neutral)
- MISSING: No evidence (contradiction/no evidence)
"""

Chain-of-Thought 的效果

方法	輸出	可解釋性
直接判斷	`{"status": "COVERED"}`	❌
Chain-of-Thought	每條證據分析 + 最終判斷	✅

當結果是 WEAK 時，你可以看到是哪條證據不夠強：

{
  "evidence_analysis": [
    {
      "evidence_text": "Familiar with container technologies",
      "supports_skill": false,
      "reasoning": "Only mentions familiarity, not actual usage"
    },
    {
      "evidence_text": "Deployed services using Docker",
      "supports_skill": true,
      "reasoning": "Docker experience, but Kubernetes not mentioned"
    }
  ],
  "status": "WEAK",
  "reason": "Docker experience implies container knowledge, but no direct K8s evidence"
}

Embedding Pre-filter

當證據很多時，先用 embedding 相似度過濾：

def prefilter_by_similarity(skill, evidence_texts, threshold=0.8):
    skill_embedding = embed_query(skill)
    
    filtered = []
    for text in evidence_texts:
        text_embedding = embed_document(text)
        
        # Cosine similarity
        similarity = np.dot(skill_embedding, text_embedding) / (
            np.linalg.norm(skill_embedding) * np.linalg.norm(text_embedding)
        )
        
        if similarity >= threshold:
            filtered.append(text)
    
    return filtered

為什麼 Pre-filter 有效

10 條證據	5 條相關證據
LLM 處理 10 條	LLM 處理 5 條
~8000 tokens	~4000 tokens
成本 2x	成本 1x

同時減少雜訊，讓 NLI 更準確。

LangChain 整合

from langchain_openai import ChatOpenAI

def verify_skill_with_nli(skill, evidence_texts):
    # 取得 LangChain 模型
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)
    
    # 用 Pydantic schema 做結構化輸出
    structured_llm = llm.with_structured_output(NLIVerificationResult)
    
    # 建構 prompt
    prompt = NLI_PROMPT_TEMPLATE.format(
        skill=skill,
        evidence=format_evidence(evidence_texts)
    )
    
    # 調用 LLM
    result = structured_llm.invoke(prompt)
    
    return result

CLI 使用

# 標準 NLI (快速)
uv run career-kb verify --skills "Python,Kubernetes"

# 增強版 NLI (Chain-of-Thought)
uv run career-kb verify --skills "Python,Kubernetes" --enhanced-nli

與驗證流程的整合

Step 1: Skill Graph 快速推斷
        ↓ (如果不是 IMPLIED)
Step 2: Hybrid Search 找證據
        ↓
Step 3: Context Compression
        ↓
Step 4: NLI 判斷 ← 本文重點
        ↓
Step 5: Self-RAG Retry (對 WEAK/MISSING)

增強版 NLI 讓 Step 4 更可靠，減少 Step 5 的 retry 次數。

效能考量

模式	Tokens	延遲
標準 NLI	~500	~1s
增強 NLI (Chain-of-Thought)	~1500	~2s
+ Embedding Pre-filter	減少 40%	+0.2s

建議：

快速掃描用標準 NLI
最終確認用增強版 NLI

常見問題

1. 為什麼不用專門 NLI 模型？

傳統 NLI 模型（如 DeBERTa-NLI）：

優點：快速、便宜
缺點：短句子、英文、無推理過程

LLM-based NLI：

優點：長文本、多語言、可解釋
缺點：較慢、API 成本

對履歷驗證，可解釋性比速度重要。

2. threshold 0.8 怎麼來的？

經驗值。太低會保留太多無關證據，太高會過濾掉邊緣相關的。

0.7: 保留較多，NLI 負擔重
0.8: 平衡點 ← 推薦
0.9: 可能過濾掉邊緣相關證據

3. 如果所有證據都被過濾掉？

if not filtered_evidence:
    return NLIVerificationResult(
        evidence_analysis=[],
        supports=False,
        status=EntailmentStatus.MISSING,
        confidence=0.9,
        reason="No semantically similar evidence found"
    )

這本身就是一個信號：素材庫中沒有相關經驗。

總結

技術	功能
Pydantic Schema	強制輸出格式
Chain-of-Thought	可解釋的逐條分析
Embedding Pre-filter	降低成本和雜訊
LangChain Integration	簡化模型切換

增強版 NLI 讓技能驗證從「黑箱判斷」變成「可追溯推理」。

Career Knowledge Base 是一個本地優先的履歷知識庫系統，使用 Python + LanceDB + LangChain 建構。

← Previous
用 Reflection Chain 生成高品質履歷
Next →
手刻 GraphRAG：用 NetworkX + LangChain 建構知識圖譜增強檢索