Despite the advances in artificial intelligence across various areas, there are still aspects where these technologies show significant limitations. According to a recent study, AI's visual language models continue to fail in basic visual acuity tasks. Researchers from Auburn University and the University of Alberta have demonstrated that the GPT-4o, Gemini, and Sonnet models cannot outperform humans in tests that are trivial for most people.

The study, ironically titled "Visual Language Models Are Blind," highlights the deficiencies of these systems. The researchers designed simple visual acuity tests, such as counting the number of intersections in two colored lines or identifying which letter is rounded in various words. Surprisingly, these tasks, which even a small child could easily perform, prove to be a significant challenge for artificial intelligence.

 

Limitations

The study's results are clear: none of the evaluated AI models achieved 100% accuracy in the proposed tests. This demonstrates that, although AI capabilities have significantly improved in many areas, visual acuity remains a weak point.

The study emphasizes that the proposed tasks were not extremely complex. However, AI models like GPT-4o, Gemini, and Sonnet showed significant difficulties in solving these basic problems. This suggests that despite advances in natural language processing and text generation capabilities, AI systems still have a long way to go in terms of visual comprehension and analysis.

The researchers indicate that this limitation could be related to how AI models are trained. Currently, these models are trained with large amounts of textual and visual data but may lack the ability to interpret and understand visual contexts in the same way humans do. This difference in perception and understanding could explain why AI fails in visual acuity tests that are trivial for people.

 

Future Implications

The implications of these findings are significant. In a world where AI is increasingly integrated into our daily lives, from virtual assistants to security systems and medical diagnostics, understanding its limitations is crucial. Visual acuity is a fundamental skill in many practical applications, and the inability of AI models to match human capacity in this area could limit their effectiveness in certain contexts.

Researchers from Auburn University and the University of Alberta suggest that new methodologies should be explored to train AI models to improve their visual comprehension abilities. This could involve developing more sophisticated algorithms that more closely mimic how humans process and understand visual information.

Moreover, this study highlights the importance of interdisciplinary collaboration in the development of AI technologies. Integrating knowledge from fields such as neuroscience and cognitive psychology could provide new perspectives and approaches to overcome the current limitations of visual language models.