McEval Collection McEval: Massively Multilingual Code Evaluation • 2 items • Updated Nov 11, 2024 • 1
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24, 2024 • 42
TableBench Collection TableBench: A Comprehensive and Complex Benchmark for Table Question Answering • 8 items • Updated Nov 11, 2024 • 2
FuzzCoder: Byte-level Fuzzing Test via Large Language Model Paper • 2409.01944 • Published Sep 3, 2024 • 45