test driven agent dev?

so the question is how do we ensure agent behavior is intended consistently?

and we need test

how to test

give a test case, with prompt/context, and expected final output

how to organize test cases and how to run tests?