====== Perplexity statistics ====== Mean PPL(Q) : 8.131662 ± 0.051148 Mean PPL(base) : 7.534124 ± 0.048206 Cor(ln(PPL(Q)), ln(PPL(base))): 98.65% Mean ln(PPL(Q)/PPL(base)) : 0.076323 ± 0.001049 Mean PPL(Q)/PPL(base) : 1.079311 ± 0.001132 Mean PPL(Q)-PPL(base) : 0.597538 ± 0.008682 ====== KL divergence statistics ====== Mean KLD: 0.071229 ± 0.000357 Maximum KLD: 8.778745 99.9% KLD: 1.718901 99.0% KLD: 0.556955 99.0% KLD: 0.556955 Median KLD: 0.043398 10.0% KLD: 0.004498 5.0% KLD: 0.001689 1.0% KLD: 0.000269 Minimum KLD: 0.000002 ====== Token probability statistics ====== Mean Δp: -2.728 ± 0.019 % Maximum Δp: 77.285% 99.9% Δp: 27.656% 99.0% Δp: 12.070% 95.0% Δp: 4.429% 90.0% Δp: 1.963% 75.0% Δp: 0.065% Median Δp: -0.636% 25.0% Δp: -4.334% 10.0% Δp: -10.131% 5.0% Δp: -14.930% 1.0% Δp: -30.238% 0.1% Δp: -67.515% Minimum Δp: -97.118% RMS Δp : 7.864 ± 0.049 % Same top p: 88.252 ± 0.085 %