Generating Benchmarks for Factuality Evaluation of Language Models Paper • 2307.06908 • Published Jul 13, 2023 • 8
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis Paper • 2307.12856 • Published Jul 24, 2023 • 36
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 24