====== Perplexity statistics ====== Mean PPL(Q) : 7.704878 ± 0.049402 Mean PPL(base) : 7.534124 ± 0.048206 Cor(ln(PPL(Q)), ln(PPL(base))): 99.66% Mean ln(PPL(Q)/PPL(base)) : 0.022411 ± 0.000529 Mean PPL(Q)/PPL(base) : 1.022664 ± 0.000541 Mean PPL(Q)-PPL(base) : 0.170754 ± 0.004204 ====== KL divergence statistics ====== Mean KLD: 0.017657 ± 0.000103 Maximum KLD: 3.435874 99.9% KLD: 0.503178 99.0% KLD: 0.146417 99.0% KLD: 0.146417 Median KLD: 0.010035 10.0% KLD: 0.000649 5.0% KLD: 0.000208 1.0% KLD: 0.000029 Minimum KLD: -0.000210 ====== Token probability statistics ====== Mean Δp: -0.512 ± 0.010 % Maximum Δp: 62.473% 99.9% Δp: 19.128% 99.0% Δp: 8.391% 95.0% Δp: 3.809% 90.0% Δp: 2.168% 75.0% Δp: 0.373% Median Δp: -0.032% 25.0% Δp: -1.110% 10.0% Δp: -3.609% 5.0% Δp: -5.771% 1.0% Δp: -12.795% 0.1% Δp: -33.912% Minimum Δp: -77.756% RMS Δp : 3.707 ± 0.031 % Same top p: 93.767 ± 0.064 %