Home/Library/Using LLM-as-a-Judge For Evaluation: A Complete Guide – Hamel's Blog - Hamel HusainEvaluation, Testing & ObservabilityUsing LLM-as-a-Judge For Evaluation: A Complete Guide – Hamel's Blog - Hamel HusainDetailsPublisherHamel HusainDomainEngineering & ArchitectureCategoryEvaluation, Testing & ObservabilityType GroupBenchmarks & DatasetsTypeBenchmarkBest ForDeveloperSkill LevelIntermediateAccessFreeTopicAgent evaluationRelated in Evaluation, Testing & ObservabilityWebArena: Realistic Web EnvironmentEmergentmindWebArena: A Realistic Web Environment for Building Autonomous Agents - ADSHarvardPublished in Transactions on Machine Learning Research (05/2025)OpenreviewGAIA:A Benchmark for General AI Assistants - ar5iv - arXivarXivSynthesizing Agent Trajectories via Test-Time Exploration ...arXivAgent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM AgentsarXivOpen ResourceSave to pathBack to library