github muratcankoylan/Agent-Skills-for-Context-Engineering v1.1.0
v1.1.0 - LLM-as-a-Judge Skills & Advanced Evaluation

latest release: v2.0.0
3 months ago

🎉 What's New

New Skill: Advanced Evaluation

A comprehensive skill for mastering LLM-as-a-Judge evaluation techniques. Based on research from Eugene Yan's LLM-Evaluators.

Covers:

  • Direct scoring vs. pairwise comparison selection
  • Position, length, and verbosity bias mitigation
  • Metric selection (Cohen's Îș, Spearman's ρ, Kendall's τ)
  • Production evaluation pipeline design
  • 10 actionable guidelines for reliable evaluation

📁 skills/advanced-evaluation/

New Example: LLM-as-Judge Skills

A complete TypeScript AI SDK-6 implementation demonstrating the Advanced Evaluation skill in practice.

Includes:

  • 3 evaluation tools: directScore, pairwiseCompare, generateRubric
  • EvaluatorAgent class with full evaluation workflows
  • 19 passing tests with real OpenAI API calls
  • Position bias mitigation with automatic position swapping
  • Zod schemas for type-safe inputs/outputs

📁 examples/llm-as-judge-skills/

Quick Start

cd examples/llm-as-judge-skills
npm install
cp env.example .env # Add your OPENAI_API_KEY
npm test

Skills Applied

This example demonstrates how multiple skills work together:

  • advanced-evaluation - Core evaluation patterns
  • tool-design - Zod schemas and error handling
  • context-fundamentals - Structured evaluation prompts
  • evaluation - Foundational evaluation concepts

Contributors


Full Changelog: v1.0.0...v1.1.0

Full Changelog: https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering/commits/v1.1.0

Don't miss a new Agent-Skills-for-Context-Engineering release

NewReleases is sending notifications on new releases.