Performance
The following table summarizes the performance results (perplexity, model size, run time for single token prediction). It is basically designed after the corresponding table on the main page).Model | Measure | F16 | Q2_K | Q3_K_S | Q3_K_M | Q3_K_L | Q4_K_S | Q4_K_M | Q5_K_S | Q5_K_M | Q6_K |
---|---|---|---|---|---|---|---|---|---|---|---|
7B | perplexity | 5.9066 | 6.7764 | 6.4571 | 6.1503 | 6.0869 | 6.0215 | 5.9601 | 5.9419 | 5.9208 | 5.9110 |
7B | file size | 13.0G | 2.67G | 2.75G | 3.06G | 3.35G | 3.56G | 3.80G | 4.33G | 4.45G | 5.15G |
7B | ms/tok@4th, M2 Max | 116 | 56 | 81 | 69 | 76 | 50 | 55 | 70 | 71 | 75 |
7B | ms/tok@8th, M2 Max | 111 | 36 | 46 | 36 | 46 | 36 | 40 | 44 | 46 | 51 |
7B | ms/tok@4th, RTX-4080 | 60 | 15.5 | 18.6 | 17.0 | 17.7 | 15.5 | 16.0 | 16.7 | 16.9 | 18.3 |
7B | ms/tok@4th, Ryzen7950X | 214 | 57 | 58 | 61 | 67 | 68 | 71 | 81 | 82 | 93 |
13B | perplexity | 5.2543 | 5.8545 | 5.6033 | 5.4498 | 5.4063 | 5.3404 | 5.3002 | 5.2785 | 5.2638 | 5.2568 |
13B | file size | 25.0G | 5.13G | 5.27G | 5.88G | 6.45G | 6.80G | 7.32G | 8.36G | 8.60G | 9.95G |
13B | ms/tok@4th, M2 Max | 216 | 103 | 156 | 148 | 144 | 95 | 102 | 132 | 134 | 142 |
13B | ms/tok@8th, M2 Max | 213 | 67 | 83 | 77 | 83 | 68 | 73 | 81 | 84 | 95 |
13B | ms/tok@4th, RTX-4080 | - | 25.3 | 29.2 | 29.3 | 25.5 | 26.2 | 26.2 | 28.6 | 28.9 | 30.0 |
13B | ms/tok@4th, Ryzen7950X | 414 | 109 | 113 | 118 | 129 | 130 | 137 | 156 | 161 | 180 |
Model | Measure | F16 | Q2_K | Q3_K_M | Q4_K_S | Q5_K_S | Q6_K |
---|---|---|---|---|---|---|---|
7B | perplexity | 5.9066 | 6.7764 | 6.1503 | 6.0215 | 5.9419 | 5.9110 |
7B | file size | 13.0G | 2.67G | 3.06G | 3.56G | 4.33G | 5.15G |
7B | ms/tok @ 4th, M2 Max | 116 | 56 | 69 | 50 | 70 | 75 |
7B | ms/tok @ 8th, M2 Max | 111 | 36 | 36 | 36 | 44 | 51 |
7B | ms/tok @ 4th, RTX-4080 | 60 | 15.5 | 17.0 | 15.5 | 16.7 | 18.3 |
7B | ms/tok @ 4th, Ryzen | 214 | 57 | 61 | 68 | 81 | 93 |
13B | perplexity | 5.2543 | 5.8545 | 5.4498 | 5.3404 | 5.2785 | 5.2568 |
13B | file size | 25.0G | 5.13G | 5.88G | 6.80G | 8.36G | 9.95G |
13B | ms/tok @ 4th, M2 Max | 216 | 103 | 148 | 95 | 132 | 142 |
13B | ms/tok @ 8th, M2 Max | 213 | 67 | 77 | 68 | 81 | 95 |
13B | ms/tok @ 4th, RTX-4080 | - | 25.3 | 29.3 | 26.2 | 28.6 | 30.0 |
13B | ms/tok @ 4th, Ryzen | 414 | 109 | 118 | 130 | 156 | 180 |