====== Perplexity statistics ====== Mean PPL(Q) : 7.730788 ± 0.049731 Mean PPL(base) : 7.534124 ± 0.048206 Cor(ln(PPL(Q)), ln(PPL(base))): 99.61% Mean ln(PPL(Q)/PPL(base)) : 0.025768 ± 0.000567 Mean PPL(Q)/PPL(base) : 1.026103 ± 0.000582 Mean PPL(Q)-PPL(base) : 0.196664 ± 0.004583 ====== KL divergence statistics ====== Mean KLD: 0.019576 ± 0.000122 Maximum KLD: 7.128579 99.9% KLD: 0.530688 99.0% KLD: 0.163194 99.0% KLD: 0.163194 Median KLD: 0.011057 10.0% KLD: 0.000745 5.0% KLD: 0.000248 1.0% KLD: 0.000038 Minimum KLD: -0.000096 ====== Token probability statistics ====== Mean Δp: -0.494 ± 0.010 % Maximum Δp: 60.888% 99.9% Δp: 19.533% 99.0% Δp: 8.890% 95.0% Δp: 4.150% 90.0% Δp: 2.365% 75.0% Δp: 0.435% Median Δp: -0.028% 25.0% Δp: -1.070% 10.0% Δp: -3.708% 5.0% Δp: -6.092% 1.0% Δp: -14.010% 0.1% Δp: -33.372% Minimum Δp: -79.450% RMS Δp : 3.873 ± 0.030 % Same top p: 93.423 ± 0.065 %