====== Perplexity statistics ====== Mean PPL(Q) : 9.061287 ± 0.057476 Mean PPL(base) : 7.534124 ± 0.048206 Cor(ln(PPL(Q)), ln(PPL(base))): 96.76% Mean ln(PPL(Q)/PPL(base)) : 0.184569 ± 0.001623 Mean PPL(Q)/PPL(base) : 1.202700 ± 0.001952 Mean PPL(Q)-PPL(base) : 1.527163 ± 0.016298 ====== KL divergence statistics ====== Mean KLD: 0.164457 ± 0.000676 Maximum KLD: 9.411176 99.9% KLD: 2.855778 99.0% KLD: 1.234465 99.0% KLD: 1.234465 Median KLD: 0.104662 10.0% KLD: 0.009321 5.0% KLD: 0.002701 1.0% KLD: 0.000294 Minimum KLD: 0.000001 ====== Token probability statistics ====== Mean Δp: -5.112 ± 0.033 % Maximum Δp: 77.844% 99.9% Δp: 38.199% 99.0% Δp: 18.699% 95.0% Δp: 6.729% 90.0% Δp: 2.723% 75.0% Δp: 0.066% Median Δp: -1.070% 25.0% Δp: -7.642% 10.0% Δp: -18.119% 5.0% Δp: -27.580% 1.0% Δp: -59.212% 0.1% Δp: -82.003% Minimum Δp: -99.539% RMS Δp : 13.476 ± 0.061 % Same top p: 81.696 ± 0.102 %