57% of finance users say AI results not ‘trustworthy’

For AI users in highly regulated sectors like finance and accounting ‘trusting something partially is not adequate’ with high levels of double checking and worries about lack of audit trail

The trust ceiling for artificial intelligence (AI) is still a challenge with high risks for regulated sectors like finance and accounting, where ‘trusting something partially is not adequate’, warn experts at UnlikelyAI. These untrustworthy results can lead to heightened risk of potential governance failures, bad decision making and a lack of accountability.

Nearly two thirds of senior finance and accounting leaders do not trust AI generated results completely, highlighting concerns from users in regulated industries like accounting and finance where compliance with regulatory environment means inadequate, inaccurate responses can be high risk.

Despite widespread adoption, AI’s promise to make tasks quicker and better is failing to materialise due to employee time spent checking, verifying or redoing AI-generated work.

These are the findings of The AI Trust Report, major cross-industry research by consultancy UnlikelyAI, which identified a significant gap between stated confidence in workplace AI and actual behaviour.

Launching the research at an exclusive event in London, William Tunstall-Pedoe, CEO and founder of UnlikelyAI, said: ‘If systems are not trustworthy and make the wrong decisions there is a huge economic impact – 57% do not trust them completely, and in regulated industries, trusting something partially is not adequate. If you don’t trust it completely, you cannot use it.’

Trust levels are relatively low, with the finance sector only trusting AI systems 43% of the time, and Tunstall-Pedoe warned of ‘high friction with constant checking and uncertainty about outputs’. The level of compliance friction was highest in the finance sector at 70%, compared with only 54% in the public sector, while security concerns and lack of explainability, for example for an audit trail, were also major concerns.

‘Trust is still only partial and companies are facing astronomic costs, but 30% said they would increase their AI budgets if they could trust it,’ he added.

‘Humans have an error rate, but hallucination, that is not a human error. There is a higher bar for automated systems and I think that higher bar is legitimate. There are intrinsic limitations of the technology, even after many years it is still a problem.’

An expert in the AI field, Tunstall-Pedoe warned there are ‘four negative effects of dependence on AI – AI blindess, dependency, burnout and analysis paralysis’.

A lack of trust in the results produced by the AI tools is shaping usage with many hours a week being spent on AI verification with low levels of confidence in the outcome of AI driven results. The research found employees are spending on average two hours and 41 minutes using AI each week, but nearly as long checking or redoing the results at two hours and 30 minutes.

A third of respondents said they experienced ‘AI burnout’ from repeatedly verifying outputs, while 31% experienced ‘analysis paralysis’, unsure whether to trust the AI result or their own judgment.

Almost all respondents (99%) spend at least some time checking AI outputs each week – citing everything from quick sense checks (minor verifying, 18%) to redoing some or all of the task manually in order to verify it (20%) or even ignoring the output entirely (18%).

So far, AI is not making work better, either, according to the research. Just 57% of respondents reported any kind of return on investment (ROI) on their organisation’s current AI investments, while 13% are yet to see a clear positive ROI and do not think they will, and some even say ‘it has been actively bad for [their] organisation so far’.

Tunstall-Pedoe added: ‘These findings highlight a critical challenge: there has to be a better way to use AI. Large language models (LLMs) have strengths in specific, limited areas, but there’s a huge lack of understanding about when to use them and when to look to other, less-fallible models. That’s where this trust gap is coming from.’

Best models score only 39% on complex, real world finance questions written by experienced professionals, according to data from PRBench Finance, validating why finance leaders are so wary about the outcomes,

Another issue for businesses is unauthorised use of AI by employees, relying on off-the-shelf systems, rather than advanced LLMs, for example, but these too have to be used with adequate guardrails in place.

‘It is important to set ground rules within teams for when AI is and isn’t appropriate,’ he added. ‘Most people don’t realise that not all AI is built the same. LLMs are great for creativity and summarisation, but they are weak at accuracy and explainability. Training staff on the strengths and weaknesses of different systems builds confidence and prevents misuse.’

The panel discussion also focused on accountability, raising important governance and compliance issues in regulated sectors, where security concerns are a priority, especially when handling confidential client data and financial information, and critically, the question of who has accountability if AI goes wrong.

This raises the serious question about how an accurate, reliable audit trail can be produced when the data is AI generated, an area which is a major focus for the future speed of rollout and trust.

When asked whether now was the right time to ramp up AI investment if only 41% trust the output, Tunstall-Pedoe said: ‘There is a simple trade off, what is your appetite for spending money, what is your appetite for risk? Some of the mistakes being made now are less defensible now, but I think that will change quickly.’

Sara White,

Editor, Business & Accountancy Daily