arc agi 2 seems to be solved
https://supergok.com/arc-agi-2-benchmark-gpt-5-2-poetiq-results/
so scaffolding on top of base model can increase intelligence a lot
and strong base model also leads to better performance
input are problem, sample data, scorer, base model, some code evolution algorithm and calls base model.
output is the code that can solve the puzzles
the stronger the model, the more efficient and more accurate the code evolve component