Performance scores on KRIS-Bench are presented across three knowledge types: Factual, Conceptual, and Procedural, along with their corresponding reasoning dimensions.
# | Model | Factual Knowledge | Conceptual Knowledge | Procedural Knowledge | Overall Score | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Attribute Perception | Spatial Perception | Temporal Perception | Average Score | Social Science | Natural Science | Average Score | Logical Reasoning | Instruction Decomposition | Average Score | |||
1 |
GPT-4o
OpenAI |
83.17 | 79.08 | 68.25 | 79.80 | 85.50 | 80.06 | 81.37 | 71.56 | 85.08 | 78.32 | 80.09 |
2 |
Gemini 2.0
|
66.33 | 63.33 | 63.92 | 65.26 | 68.19 | 56.94 | 59.65 | 54.13 | 71.67 | 62.90 | 62.41 |
3 |
Doubao
ByteDance Doubao |
70.92 | 59.17 | 40.58 | 63.30 | 65.50 | 61.19 | 62.23 | 47.75 | 60.58 | 54.17 | 60.70 |
4 |
BAGEL-Think
ByteDance Seed |
66.42 | 67.75 | 0.00 | 55.77 | 59.63 | 59.38 | 59.44 | 51.19 | 27.33 | 39.26 | 53.36 |
5 |
BAGEL
ByteDance Seed |
58.08 | 54.50 | 0.00 | 47.71 | 52.69 | 52.00 | 52.17 | 49.63 | 30.83 | 40.23 | 47.76 |
6 |
Step1X-Edit
StepFun |
55.50 | 51.75 | 0.00 | 45.52 | 44.69 | 49.06 | 48.01 | 40.88 | 22.75 | 31.82 | 43.29 |
7 |
Emu2
BAAI |
51.50 | 48.83 | 22.17 | 45.40 | 34.69 | 38.44 | 37.54 | 24.81 | 45.00 | 34.91 | 39.70 |
8 |
AnyEdit
ZJU |
47.67 | 45.17 | 0.00 | 39.26 | 38.56 | 42.94 | 41.88 | 36.56 | 26.92 | 31.74 | 38.55 |
9 |
MagicBrush
OSU |
53.92 | 39.58 | 0.00 | 41.84 | 42.94 | 38.06 | 39.24 | 30.00 | 23.08 | 26.54 | 37.15 |
10 |
OmniGen
BAAI |
37.92 | 28.25 | 21.83 | 33.11 | 30.63 | 27.19 | 28.02 | 11.94 | 35.83 | 23.89 | 28.85 |
11 |
InstructPix2Pix
UCB |
30.33 | 21.33 | 0.00 | 23.33 | 22.56 | 26.56 | 25.59 | 19.81 | 14.75 | 17.28 | 22.82 |
Task-Level Performance

Note: If you would like to submit your results, please contact us at yongliang0223@gmail.com.