Skip to content

Why teams miss system signals

Hanisha Arora Jun 3, 2026 bug-stories

Signal ignorance is not when your system stops talking. It is when your team stops listening to them without realising it.

It does not announce itself. It builds gradually behind green pipelines, merged PRs, and teams that feel like they are productive. By the time you notice it, it has usually been happening for weeks. This is what I watched happen. The strange part was that nothing looked obviously broken at first. Most metrics were improving. Delivery looked faster. Coverage looked higher. Productivity looked healthy.

In this article, I will tell you how I learned that AI (artificial intelligence) can quietly change team behaviour. How organisations start missing important signals, and what we changed once we realised speed was hiding our ability to understand our problems.

How AI changed how we work

I was building an autonomous setup for my team. The idea was straightforward. Cover the obvious ground early so later-stage testing becomes faster, more focused, and less painful. 

So, I built an agent for the testing team to identify biases and cases they might miss. Then used the list of cases with agent.md. I also attach existing test cases to an agent trained on business scenarios and let it help developers write test cases as they build. The goal was to make things faster and better from the start.

It worked at first. Developers were shipping faster, test cases were getting generated automatically, and regression cycles looked healthier on paper. Then, a production issue escaped that should have been caught easily. The strange part was that nothing around it looked broken. Pipelines were green. Coverage had increased. Reviews had happened. The AI-generated scenarios had all passed. Then, a production issue escaped that should have been caught easily. This issue was very basic. That new feature was expected to work without a login, but nobody had tested that flow. Sounds like a simple case of bad testing, right? That’s exactly what caught my attention.

In a later discussion, someone raised the concern that testing was slowing development. That was when I started noticing the real issue. Testing was not the problem. Our development system had quietly stopped questioning itself. It had become good at validating known behaviour, while unknown behaviour was going unexplored. 

The problem wasn’t that the agent was poorly trained or outdated. We were continuously improving prompts, attaching business context and expanding scenarios over time. The real limitation was that the system could only reason within the identified behaviours. It identified new cross-linked flows. What it could not naturally do was question assumptions nobody had surfaced yet, explore interactions nobody explicitly mapped or understand behavioural changes that only existed implicitly inside someone's head, like when you add some completely new feature, the system won’t know about it. 

Over time, generated test cases stopped being supported, and it became the boundary of what people explored. Teams were not deliberately handing decision-making to AI, but attention naturally drifted toward the scenarios it generated. If the AI generated a flow, it got tested. If a scenario or risk wasn't covered, we didn't notice because all our attention was on what was in front of us. The team was getting overly dependent on AI to do the work. Coverage increased continuously as more known scenarios were validated. Yet production bugs, repeated regressions, and unexpected side effects continued appearing. We were getting better at testing what we already understood, but not necessarily better at discovering what we had missed.

Shortly later, I was called to a meeting where someone raised a concern that testing had become slow. But when I checked the system, the usual indicators were healthy. 

  • PRs were merging
  • Pipelines were green
  • Release timelines were still within acceptable ranges
  • And deployments were moving regularly. 

Yet production bugs that should never have escaped validation were reaching users.

One example stayed with me. A new feature had been developed as a completely isolated workflow. The generated test cases covered the intended behaviour, regression testing passed, reviews were completed, and no risks were suggested. A few days after release, an unrelated workflow started failing in production. The issue was not in the feature itself. It was an assumption shared between the new workflow and an existing part of the system that nobody had considered during validation. The generated tests had validated exactly what they were asked to validate. The missing question was whether the change could affect something nobody was looking at or thinking about.

I checked with the team. Someone was using AI to generate significantly more test scenarios than they normally would. Someone else was using it to speed up regression cycles. PR changes were being summarised through AI to accelerate reviews. Developers were shipping code much faster than before. Every layer looked like it was improving.

That was the problem.

AI had been adopted everywhere to improve speed, and because the results were consistently useful, the organisation gradually stopped questioning the output deeply. That’s a basic human instinct. The system simply made that effort feel less necessary over time. So, this is where we started losing signals that the system was already sending very clearly.

Most signals were not exactly failures. They are repeated regressions, increasing validation cycles, fixes that create unrelated bugs, or questions that teams slowly stop asking because the generated system already appears confident enough.

Healthy pre-production metrics can be misleading

I was told testing was slowing things down. It was not. The amount of uncertainty entering each release had increased faster than the team’s ability to understand it.

At first, nobody noticed because everything looked productive. AI was helping developers move faster, reviews were happening more quickly, and more code was being merged continuously. From the outside, the system looked healthier than before.

Production disagreed.

The failures were not invisible. Testing cycles were increasing, bugs were returning faster after fixes, regressions were appearing in unrelated areas, testers were exhausted, and production issues were becoming harder to predict. The signals existed everywhere, but the interpretation was skewed toward success.

A faster fix looked like an improvement. Higher throughput looked like efficiency. Increased coverage looked like confidence. The problem was that the success metrics had never been updated to account for what was actually changing beneath the system.

Teams continued to measure delivery speed, PR throughput, and coverage growth as indicators of engineering health, but none of those metrics indicated whether the system was still understandable.

What AI confidence actually costs

Generated test cases were usually reasonable enough that teams stopped exploring outside them. That had changed developer and tester behaviour in subtle ways.

Testing is not just scenario execution. A large part of the job is questioning assumptions, mapping unknowns, exploring strange interactions, and finding things nobody explicitly thought to validate. Earlier, testers spent significant time understanding systems, tracing impacts, and exploring edge cases because that way of thinking catches problems that no generated system would naturally look for. Now the system expected them to validate far more change in nearly the same amount of time.

AI helped, but with a hidden tradeoff. It had reduced execution effort, not uncertainty.

This new way of testing had been optimised around known patterns. Production systems were rarely failing within the known patterns. Meanwhile, testers were cycling through more releases, more fixes, more regressions and more retesting. Fatigue increased, which reduced exploration. The work gradually became operational rather than investigative.

Another problem quietly appeared within this cycle. When AI is used to test faster, it also changes what gets missed. AI confidently validated expected behaviour, and over time, exhausted teams start trusting that coverage more than they realise. This happens when someone tries something, it works, and when it’s tried again, it still works. You start to trust its output. The scenarios they would normally chase get skipped unconsciously, not because people stopped caring, but because the amount of change has increased while the available thinking time has decreased.

Over time, generated confidence reduced the amount of doubts entering validation. This did not just affect testers. The same behaviour appeared across the entire development lifecycle. Once exploration was reduced, teams stopped noticing an entire category of signals. It wasn’t just bugs, but patterns that started to go missing.

Signal ignorance: The signals were always there

Testing cycles were increasing, regressions were spreading, testing scope was becoming unclear, fixes were creating unrelated failures, and production incidents were becoming harder to trace. Our existing metrics and workflows had trained teams to interpret these as speed problems instead of understanding problems.

That led to a predictable response. I started adding more automation. At that time, it felt obvious. If testing is becoming slower, then removing more human effort should help. More scenarios could be generated, more validation could happen earlier, and hence, more obvious issues could be caught before they reach testers.

Some things did improve. The regression cycles became faster. Fewer straightforward issues were reaching production, and the amount of testing we could perform per release increased significantly. Yet the same complaints kept returning. Users were still frustrated. Unexpected issues kept appearing after releases. Fixes continued creating problems in places nobody had associated with the original change.

That was when I started questioning the assumption itself. Maybe the system was not struggling because we were testing too little. Maybe it was struggling because we were understanding too little.

I had misdiagnosed the problem.

The actual problem wasn’t the code

Most discussions around AI-assisted development focus on code quality. I think the bigger issue is organisational misinterpretation.

The hardest production issues I have seen recently were not caused by obviously broken systems. They came from systems that looked operationally healthy from the outside. Code was written with partial AI assistance, reviews were approved using AI summaries, test scenarios were generated from prior context, and pipelines remained green throughout the process. Meanwhile, side effects were poorly mapped, dependencies were weakly understood, assumptions survived unchallenged, and state transitions were barely explored.

The problem was that AI accelerated output faster than teams evolved. Developers were frustrated by side effects in workflows they thought were isolated. Testers were spending more time trying to understand what should be tested than actually testing it. Users were still reporting issues despite healthier release metrics. Meanwhile, surrounding metrics continued to show improvement, from coverage to faster releases. It was the metrics which weren't changing fast enough. They measured activity and throughput well, but they captured very little about shared understanding.

Finding the balance between human and AI thinking 

I restructured things to have AI function as an acceleration layer rather than a replacement layer. The tools mostly stayed the same. What changed was where human thinking was required to remain inside the workflow.

  • The agent now asks before generating: Before writing a single test case, developers explain what they are building, what business behaviour changes, and which existing flows could be impacted. Interestingly, this improved human understanding more than AI output. Forcing clarity of intent before generation significantly improved downstream testing quality because the generated cases became grounded in business behaviour rather than only in code paths.
  • A pre-push hook runs an AI code review: The review is not focused on syntax. It checks whether test cases exist, whether generated context matches what was actually built, whether there are untested branches or side effects, and whether existing flows could get impacted. The goal is not to replace thinking. It is to interrupt blind momentum long enough for thinking to happen again.
  • Testers got their time back: By the time something reaches testers, the obvious validation paths are already covered. That gives testers time for the work that still depends heavily on human reasoning, including exploration, impact analysis, pattern recognition, questioning assumptions, and connecting unrelated failures.We also added one behavioural rule that significantly changed things. AI could help after the initial test plan is in place, not before. That one change brought investigative thinking back into the process. Testers stopped becoming validators of generated output and went back to being the people asking uncomfortable questions.
  • Changed what I consider successful metrics: One of the biggest problems was that our metrics still belonged to an older workflow, like throughput, release speed, coverage rate, etc. With these, I added things like:
  • How often bugs returned after fixes
  • How many unrelated areas broke after a release
  • How much retesting was happening per release
  • Whether the same category of regressions kept reappearing
  • Asked developers to fill a template with the release, which answered - what they have built, why they have built it, and how it can be validated in production.

The goal was not to slow delivery down. The goal was to notice when system complexity began to grow faster than shared understanding.

Lessons learned 

The mindset shift is harder than the tooling shift. Some days, the team still slips back because velocity feels urgent and thinking feels slow. And I do not think this gets fully solved. Teams just become better at noticing when they are drifting toward signal ignorance again.

Healthy metrics do not necessarily indicate a healthy understanding. Coverage can increase, releases can move faster, and pipelines can stay green. Yet teams can still lose sight of how changes ripple through the system.

If you and your team are trying to spot this early, here are a few warning signals:

  • The same problems keep appearing 
  • Metrics are improving, but confidence in the releases is not
  • People can explain what changed, but struggle to explain what else might be affected
  • Teams respond to uncertainty by adding more process, automation or tooling without first understanding the source of the uncertainty
  • Teams are surprised by outcomes that should have been predictable in hindsight 

In my case, the backlog was a signal. The repeated regressions were signals. The exhausted testers were signalling. The unclear scope of testing was a signal. 

The signals were always there. The team just kept interpreting them as speed problems instead of understanding problems.

Hanisha Arora

Hanisha Arora is a product advocate. She works where product thinking meets technical reality. She writes about the often-overlooked gap between how products are planned and how they are actually built. You can find her on LinkedIn or on her blog.