====== Perplexity statistics ====== Mean PPL(Q) : 8.029877 ± 0.051646 Mean PPL(base) : 7.534124 ± 0.048206 Cor(ln(PPL(Q)), ln(PPL(base))): 98.94% Mean ln(PPL(Q)/PPL(base)) : 0.063727 ± 0.000933 Mean PPL(Q)/PPL(base) : 1.065801 ± 0.000994 Mean PPL(Q)-PPL(base) : 0.495753 ± 0.008026 ====== KL divergence statistics ====== Mean KLD: 0.051126 ± 0.000296 Maximum KLD: 6.852483 99.9% KLD: 1.491291 99.0% KLD: 0.440246 99.0% KLD: 0.440246 Median KLD: 0.028683 10.0% KLD: 0.002148 5.0% KLD: 0.000733 1.0% KLD: 0.000110 Minimum KLD: -0.000059 ====== Token probability statistics ====== Mean Δp: -1.501 ± 0.017 % Maximum Δp: 70.128% 99.9% Δp: 25.463% 99.0% Δp: 12.370% 95.0% Δp: 5.593% 90.0% Δp: 2.967% 75.0% Δp: 0.321% Median Δp: -0.188% 25.0% Δp: -2.480% 10.0% Δp: -7.159% 5.0% Δp: -11.373% 1.0% Δp: -25.561% 0.1% Δp: -59.737% Minimum Δp: -96.409% RMS Δp : 6.613 ± 0.047 % Same top p: 89.569 ± 0.081 %