muratcankoylan/Agent-Skills-for-Context-Engineering v1.1.0
v1.1.0 - LLM-as-a-Judge Skills & Advanced Evaluation

on GitHub

latest release: v2.0.0

3 months ago

🎉 What's New

New Skill: Advanced Evaluation

A comprehensive skill for mastering LLM-as-a-Judge evaluation techniques. Based on research from Eugene Yan's LLM-Evaluators.

Covers:

Direct scoring vs. pairwise comparison selection
Position, length, and verbosity bias mitigation
Metric selection (Cohen's κ, Spearman's ρ, Kendall's τ)
Production evaluation pipeline design
10 actionable guidelines for reliable evaluation

📁 skills/advanced-evaluation/

New Example: LLM-as-Judge Skills

A complete TypeScript AI SDK-6 implementation demonstrating the Advanced Evaluation skill in practice.

Includes:

3 evaluation tools: directScore, pairwiseCompare, generateRubric
EvaluatorAgent class with full evaluation workflows
19 passing tests with real OpenAI API calls
Position bias mitigation with automatic position swapping
Zod schemas for type-safe inputs/outputs

📁 examples/llm-as-judge-skills/

Quick Start

cd examples/llm-as-judge-skills
npm install
cp env.example .env # Add your OPENAI_API_KEY
npm test

Skills Applied

This example demonstrates how multiple skills work together:

advanced-evaluation - Core evaluation patterns
tool-design - Zod schemas and error handling
context-fundamentals - Structured evaluation prompts
evaluation - Foundational evaluation concepts

Contributors

@muratcankoylan

Full Changelog: v1.0.0...v1.1.0

Full Changelog: https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering/commits/v1.1.0

Check out latest releases or
releases around muratcankoylan/Agent-Skills-for-Context-Engineering v1.1.0

Don't miss a new Agent-Skills-for-Context-Engineering release

NewReleases is sending notifications on new releases.

Get notifications