AI Struggles with Simple Tasks: The Clock and Calendar Conundrum

5/17/2025

New research shows AI struggles with basic tasks like reading analogue clocks and calculating calendar dates. Despite advancements, AI's timekeeping abilities are surprisingly poor, highlighting significant gaps in its reasoning skills.

AI Struggles with Simple Tasks: The Clock and Calendar Conundrum

AI fails to read clocks and dates accurately, revealing major flaws in its reasoning abilities. This new research underscores the challenges of integrating AI into time-sensitive tasks.

New Research Reveals AI's Struggles with Basic Timekeeping and Calendar Tasks

Recent research has unveiled a surprising set of challenges that artificial intelligence (AI) faces in performing tasks that most humans can accomplish with ease, such as reading an analogue clock and determining the day of the week for a specific date. Despite AI's remarkable capabilities in areas like coding, generating lifelike images, and crafting human-like text, it consistently falters when it comes to interpreting the position of clock hands and performing basic arithmetic for calendar dates.

Insights from the 2025 International Conference on Learning Representations

This revelation was presented at the 2025 International Conference on Learning Representations (ICLR) and was subsequently published on March 18 on the preprint server arXiv, pending peer review. The study, led by researcher Rohit Saxena from the University of Edinburgh, emphasizes a significant disparity between human and AI capabilities in tasks that are considered straightforward for most people. "Our findings highlight a significant gap in the ability of AI to carry out what are quite basic skills for people," Saxena stated.

The Importance of Addressing AI's Shortcomings

These shortcomings in AI's timekeeping abilities pose critical concerns for its integration into real-world applications, particularly in time-sensitive environments such as scheduling, automation, and assistive technologies. The research team conducted an investigation into AI's proficiency in reading clocks and calendars by feeding a custom dataset of images into various multimodal large language models (MLLMs) capable of processing both visual and textual information. The models analyzed included Meta's Llama 3.2-Vision, Anthropic's Claude-3.5 Sonnet, Google's Gemini 2.0, and OpenAI's GPT-4o.

Poor Performance in Time and Date Recognition

The results of the study were disappointing, with these AI models failing to correctly identify the time from clock images or the corresponding day of the week for given dates more than half the time. For instance, when tasked with identifying the correct time, AI systems managed to do so only 38.7% of the time, while they correctly identified calendar dates a mere 26.3% of the time.

Understanding AI's Limitations in Timekeeping Tasks

The researchers attribute AI's poor performance in timekeeping tasks to the nature of its training. Unlike traditional systems that are trained with labeled examples, clock reading demands a different skill set—specifically, spatial reasoning. "The model has to detect overlapping hands, measure angles, and navigate diverse designs like Roman numerals or stylized dials," Saxena explained. While AI can recognize that an image depicts a clock, understanding how to read it remains a challenge.

The Arithmetic Paradox in AI Models

The difficulties extend to calendar-related tasks as well. When presented with challenges such as determining the day for the 153rd day of the year, AI systems displayed a similarly high failure rate. Saxena pointed out that although arithmetic is a fundamental aspect of computing, AI models utilize a different approach. "AI doesn't run math algorithms; it predicts outputs based on patterns it sees in training data," he noted. This inconsistency in reasoning highlights a significant gap in AI's capabilities.

Insights into AI's Learning and Generalization Challenges

This research adds to a growing body of literature that underscores the differences between human and AI understanding. While AI models excel when trained on ample examples, they struggle to generalize or apply abstract reasoning. What may seem like a simple task for humans—such as reading a clock—can be exceedingly difficult for AI systems.

The Need for Targeted Training and Improved Methods

Moreover, the research indicates that AI's limitations are exacerbated when it is trained on limited data, particularly regarding rare events like leap years or obscure calendar calculations. Even though large language models (LLMs) might have access to numerous explanations related to leap years, they often fail to make the necessary connections for completing visual tasks effectively.

Conclusion: The Importance of Human Oversight in AI

Ultimately, this research highlights the necessity for more targeted examples in AI training datasets and calls for a reevaluation of how AI systems handle tasks that require a combination of logical and spatial reasoning. As Saxena cautions, "AI is powerful, but when tasks mix perception with precise reasoning, we still need rigorous testing, fallback logic, and in many cases, a human in the loop."