data + code is already intelligence

but that’s digital intelligence

now multimodal llm brings recognition intelligence and instruction following intelligence

1) recognition means it can understand text, image and video. it can extract key information from the raw information. now it can digitize the world.

2) instruction following means it understands the text prompt and translate it to detailed and accurate execution steps.

so combing llm + tools (code execution, function call etc)

it can generate accurate code from ambiguous high level natural language

it can generate intelligence on the fly.

and then it can run the code

so today it’s already intelligent.

the main gap today is the self-learning capability and sophisticated-bility

self-learning means it can be like a student, who know whether the info and knowledge is and then proactively reading them and gain understanding, gain knowledge, and through exercises, gain experiences.

sophisticated-bility means it can quickly realize the wrong path and correct itself, able to perform sophisticated tasks end to end.


<
Previous Post
ftc robot camera
>
Next Post
meta of meta, skills