Everyone is talking about AI, but people aren’t talking about the humans that use AI and the data that fuels it. In this post, I aim to correct that by discussing:
A (Very) Brief History of AI; From Sci-fi to ChatGPT… I am not an AI expert and so this will not be a discussion of the technology behind AI; instead, I will be considering AI from a layman’s perspective. Until recently, a layman’s understanding of AI largely came from science-fiction and ideas about dystopian realities in which humans struggle to live alongside superintelligences and killer robots. These stories often involve a ‘general AI’ exhibiting human-like intelligence that somehow becomes conscious, starts setting its own goals and inevitably wreaks havoc. We have not seen the emergence of ‘general AI’. We have instead seen the steady development of ‘specific AI’; tools focussed on solving specific and clearly defined problems. Until recently, the only actual AI that we were aware of was developed to play games like chess, go or Jeopardy. The Turing Test encouraged the development of AI tools that processed language in a way that appeared human, but this was still a closed problem with a well-defined goal. Over the last couple of decades, the development of machine learning has led to emergence of algorithmic decision-making and the possibility of rapidly efficient processes that no longer require direct human input. As impressive as these tools were, they were far from perfect. The use of bad quality data feeding poorly understood algorithms has led to error and injustice in various corners of society, as outlined by Cathy O’Neil’s ‘Weapons of Math Destruction’. On 30th November 2022, ChatGPT was unleashed on the world. Since then, we’ve seen an explosion of interest in the potential for AI to solve our problems and make us more efficient. However, ChatGPT is still a ‘specific AI’ tool. It is a large language model designed to process huge volumes of text and output useful content based upon the prompt we have provided. ChatGPT and the similar generative AI tools that have followed (e.g. Gemini, Copilot, Apple Intelligence etc.) are changing the world, but they do not work in isolation. To work well, they need good data, and they need humans that understand them and their limitations. AI & HumansAI tools support humans in achieving their goals. A primary goal for all human endeavour is the pursuit of wisdom. Wisdom is developed from our knowledge of the world; our knowledge comes from the information that we consume and that information is the product of the data on which it is based. This DIKW (Data > Information > Knowledge > Wisdom) pyramid is a helpful reminder that our goals do not lie within the data itself but in the process of turning that data into wisdom. However, a key challenge in this process is turning information, which exists as pixels on a screen or ink on a page, into knowledge, which exists as neurons firing inside our brains. As a data visualisation expert, I like to explain that the human visual system offers this bridge from information to knowledge and I encourage everyone to develop their understanding of how our brains process information that is presented to us though graphs, tables, images or text. We’ll now consider how humans can interact with AI to develop their collective knowledge through a process known as distributed cognition.
The concept of distributed cognition invites us to consider a technology and its user as a singular system. We cannot evaluate the efficiency and effectiveness of AI tools without also evaluating the humans that are using these tools. Technology-assisted humans are more powerful at processing information and making decisions than humans, or technology, alone. As an example, take a moment to consider the power of numerical processing as we increase the technology available to a person:
We can consider AI as a metatool. It doesn’t do anything useful in isolation, but it can make all the other tools we use more powerful. For this to be the case, we must understand the role of the human as well as the AI. Prompt engineering is emerging as an important topic in supporting humans in using GenAI tools. People need to understand what these Large Language Models can and can’t do. They also need to understand how to prompt to give the outputs that genuinely meet their desired goals. As an example of this, Microsoft use the acronym GCSE to guide us writing effective prompts when using its GenAI tool, Copilot; this encourages us to ensure that our prompts define our Goal, Context, Source & Expectations. A key question at the heart of the interaction between humans and AI is “Should we trust AI?”. Humans are very inconsistent in the levels of trust in algorithms. We will blindly follow a sat-nav that risks taking us down one-way streets or into canals, yet we struggle to entrust our journey to a driverless car (even though they can be shown to be safer than human drivers). However, to understand if we should trust AI we need to consider if we can trust the data it is based on. AI's Data ProblemThe importance of the phrase, “Garbage in, garbage out”, cannot be overstated. Any tool, system or process, must be considered as involving 3 distinct steps:
AI offers an incredibly powerful process; however, this process can only feed from the data available to it and the quality of its outputs can only ever be as good as (or worse than) these inputs. With this in mind, we clearly need our AI to be built upon good data, but how do we define ‘good’ data? ‘Good’ data is not always easy to identify by either humans or AI. Take this example, taken from a tweet from Jon Schwabish in June 2020. Donald Trump had tweeted this: At first, there doesn’t appear to be anything wrong or misleading about this. It’s a simple dataset of only 5 data points and Trump is correctly claiming that May-2020, towards the end of his first term, was a record month in seeing the most job gains in recorded history. It is only when you reflect on what was happening in the world at the time that you can begin to see the issue with this data. Here’s the same data but with the previous month included too: Given the context of the Covid-19 Pandemic, and the fact that more than 20m job were lost from the US economy in Apr-2020, Trump’s gain of ~2.5m jobs no longer seems as impressive as he is claiming. This is an example of ‘cherry-picking’ data to make a point that is not the truth of the broader dataset. Another example of misleading data comes from the fact that correlation does not necessarily equate to causation. As this example from Tyler Vigen’s Spurious Correlations shows; just because two datasets correlate does not mean that 1 causes the other. These are just 2 examples of ways in which seemingly ‘good’ data can be misleading and we cannot expect AI to identify the problems with these datasets. AI cannot access ‘truth’, it can only access data and this means that it will inevitably output the same bias that is contained within the data it is processing. As concluded by this UN article on Racism & AI, “Bias from the past leads to bias in the future” . However, as shown by the notorious incident in which Google Gemini began generating images of black and Asian Nazi soldiers, there is no easy fix, and attempts to correct the AI algorithms rather than correcting the data, can easily lead to an overcorrection. This leads us to the challenge of Availability Bias. For humans, what you see is all there is, or “WYSIATI” as Daniel Kahneman described it. This rather obvious point highlights the fact that we can only make decisions-based on the information available to us and are blind to incorporate any perspectives of which we were not aware. This is why it is important to refer to a broad range of information, opinions and perspectives before making any significant decision. For AI, the data is all there is. AI tools do not have access to any ‘truth’ beyond the data available to them. Availability bias can take various forms including:
Countering bias is a very difficult thing to do. However, the first step towards overcoming bias is to be aware of our own bias. When making decisions, all humans should reflect on their biases, the biases of the people they interact with and the biases contained within the data they work with. ‘Good’ data reflects the reality of the world around us and is free from bias. However, this will not happen organically and requires a significant financial and cultural investment by any organisation that utilises data (i.e. all organisations!). As a precursor to the successful utilisation or AI, an organisation must have a Data Strategy that incorporates and embeds established good practice in:
All modern businesses understand the value of data as an asset. Technology giants and recent startups have developed within a content in which these technical requirements for ‘good’ data have are becoming the norm. However, for ‘legacy’ organisations, this presents a massive hurdle. Much of the data held in ‘legacy’ systems by traditional organisations, is not machine readable and so cannot be utilised by AI. This may be because the data is incomplete, inconsistent and/or siloed. If data is not held in a complete and consistent format, with keys that are common across tables, and databases that are accessible to one another, then it cannot be utilised by AI. Addressing this challenge presents huge costs to ‘legacy’ organisations; however it cannot be ignored and must be tackled before any attempt at enabling AI. A further challenge for all organisation is the knowledge gap that exists between AI/Data Professionals and Business Leaders. Business Leaders are not aware, and cannot be expected to be fully cognizant of technical challenges of working with data to enable AI. As a result of this, organisations can fall prey to “Magpie Syndrome” where the pursuit of shiny new AI toys ignores the data transformation that is required to enable these tools. To address these concerns we obviously need better data, but we also need more data literate humans. Humans need Data LiteracyThere is a significant challenge facing humans when utilising data. Put simply, we are lazy, and while data can reflect the world at the incomprehensibly complex place that it is, we just want our decisions to be simple. Modern human history has seen exponential growth in technology, but evolution has not allowed our minds to develop at the same pace. As Stephen Pinker states in ‘How the Mind Works’, “The mind is a system of organs of computation, designed by natural selection to solve the kinds of problem our ancestors faced in the foraging way of life, in particular, understanding and outmanoeuvring objects, animals, plants, and other people”. Our minds, which are primarily adapted to a hunter-gatherer way of life, have been thrust into a complex world of technology, data and AI. We cannot be expected to know how to successfully navigate this world without a help and this is the reason we must all work to develop our data literacy. Data literacy is a concept that lends itself to multiple interpretations. However, put simply, it is the ability to:
- Numerical skills - IT skills - Analytical skills - Subject matter knowledge The importance of this has been stressed by Jain Piyanka who said that “everybody needs data literacy, because data is everywhere. It’s the new currency, it's the language of the business. We need to be able to speak that.” A core component of data literacy involves nurturing the 3 Cs of: · Curiosity · Creativity · Critical thinking These are necessary tools that all humans possess and that we must fully exercise when utilising data. A key component of this is asking questions. When it is time to make a decision, always ask… What does the data tell us? And then ask: · Who provided this data and/or created this visualisation? · What was the intention of the data provider/report designer? · What is the data source? · Is the data complete? · Is the quality of the input data sufficient for the decision being made? · What analysis/transformation has been applied to the data? While data literacy is accessible to all humans, it is not intuitive and must be learnt. Just one example of the way in which numeracy, as a core component of data literacy, is not intuitive to humans, consider the bat and ball problem. A bat & ball cost £1.10, the bat costs £1 more than the ball, how much does the ball cost? In considering the answer to this question, your gut instinct is likely to offer an answer of 10p, as that is what feels right. It’s only when you engage some higher-level critical thinking that you realise the answer is in fact 5p. For more examples of the limitations of intuition when applied to decision-making, the book ‘Thinking, Fast & Slow’ by Daniel Kahneman offers an invaluable resource. Kahneman outlines two types of thinking:
The challenge facing human beings in making decisions is that we are lazy and will try to avoid using system 2 thinking where possible. You should now be able to see the problems that this can cause. To become data literate, humans must embark on a training and development programme that incorporates the following elements:
In my role as the University of Exeter’s Data Literacy & Training Lead, my time is now focussed on developing the tools and resources needed to facilitate this. Conclusion - Towards AI Literacy? It is important that we acknowledge the risks of AI, because, as outlined by the French Philosopher Paul Virilio - “The invention of the ship was also the invention of the shipwreck… Every technology carries its own negativity, which is invented at the same time as technical progress.” However, the risks of AI are not just about killer robots, and they are not risks that can only be solved by technological experts. The main risk we face in using AI is that poor quality data and untrained humans will encourage AI to produce biased and ineffective outputs that will result in unjust and inappropriate outcomes. In the face of these risks, utilising AI requires a paradigm shift for all of us. We need to fundamentally change the way we work and the questions we ask of the tools we use. More specifically, we need better data and we need more data literate humans. Epilogue - AI Vs Human This blog post has been developed from a presentation I delivered at the University of Exeter’s AI Symposium on 20th March 2025. The 45-min presentation was entitled “Garbage In, Garbage Out: The Importance of Data Literacy in Utilizing AI” and I’d created 34 Powerpoint slides and a 766 word document of rough notes outlining the structure, themes and messages that I aimed to deliver through the spoken presentation. The blog post above took approximately 5 hours to write based on the materials described. I also shared my notes with Microsoft’s Copilot using the following prompt and it created it’s own version of my blog post in about 5 seconds. I’m sharing this as a direct comparison of the capabilities of AI and Humans in developing the same output and invite you to consider the value-added by each approach. Prompt:
Rewrite the content in <<20250320 Rough Notes from AI Symposium Talk>> as a blog post. It should be 1500-2000 words long and written in a positive, friendly and human voice. The overall goal of the blog is to highlight the importance of (a) understanding a human's role when evaluating AI, (b) highlighting AI's data problem and (c) humans' need for data literacy. It should conclude with a simple summary highlighting the main points raised. Output: Garbage In, Garbage Out: The Importance of Data Literacy in Utilizing AI Introduction Welcome to our discussion on the crucial role of data literacy in the age of artificial intelligence (AI). While AI has become a buzzword, it's essential to remember that behind every AI system is a foundation of data. As a data expert, I want to emphasize the importance of understanding the human role in evaluating AI, addressing AI's data problem, and fostering data literacy among individuals. Let's dive into these topics and explore why they matter. The Layman's Perspective of AI AI has come a long way from the realms of science fiction to becoming an integral part of our daily lives. In the past, AI was often depicted as a terrifying superintelligence, as seen in Nick Bostrom's "Superintelligence." These stories usually involved general AI, which is conscious and capable of setting its own goals. However, the reality of AI today is more grounded in specific AI, designed to solve particular problems. Until recently, AI was relatively sedate, with applications like chatbots, chess-playing programs, and game show contestants. The rise of machine learning (ML) and algorithmic decision-making brought about warning signs, but it was the explosion of interest in generative AI, like ChatGPT, that truly captured the public's imagination. ChatGPT, a large language model (LLM), excels at analyzing text and generating human-like responses. In this blog, we'll focus on why AI needs good humans and good data to function effectively. AI & Human Goals To understand AI's role, we can refer to the DIKW pyramid, which stands for Data, Information, Knowledge, and Wisdom. AI operates at the data and information levels, but it requires human input to reach knowledge and wisdom. This concept of distributed cognition highlights the importance of human involvement in AI processes. AI is a metatool, meaning it enhances the capabilities of other tools. However, to harness its full potential, users must be trained to construct good inputs for generative AI. These AI systems are designed to analyze language, not perform quantitative analysis. Therefore, prompt engineering becomes crucial in guiding AI to produce meaningful and accurate outputs. Microsoft's prompt guidance for Copilot, for example, includes defining the goal, providing context, citing sources, and setting expectations. This structured approach helps users interact effectively with AI. Should We Trust AI? Trust in AI is a complex issue. While we may blindly trust a satnav to guide us, we find it challenging to trust driverless cars. This discrepancy highlights the need for a nuanced understanding of AI's capabilities and limitations. AI’s Data Problem The phrase "Garbage in, garbage out" perfectly encapsulates AI's data problem. AI systems rely on data to function, and the quality of the input data directly impacts the output. But what constitutes "good" data? Legacy systems often contain data that is not machine-readable, posing significant costs to convert existing data and processes into formats usable by AI. Additionally, AI is trained to provide answers based on the "truth" present in the data. If the data is biased, the AI will also be biased. Bias in AI can manifest in various ways, such as racist AI or overcorrection, like Google's infamous "black Nazis" incident. AI cannot access the absolute truth; it can only work with the data provided, which may contain biases like cherry-picking, correlation versus causation, and various cognitive biases. To counter bias, we must be aware of our own biases as authors, data providers, and readers. A robust data strategy is essential for developing "good" data, bridging the knowledge gap between AI/data professionals and business leaders. Humans Need Data Literacy Data literacy is the ability to read, understand, create, and communicate data as information. It involves curiosity, creativity, and critical thinking—skills that are not AI's strengths. When making decisions, it's crucial to ask:
Data literacy is not intuitive. For example, consider the classic bat and ball problem: A bat and ball cost £1.10, and the bat costs £1 more than the ball. How much does the ball cost? The answer is 5p, but many people instinctively answer 10p. Another example is the chessboard problem, where the number of grains of rice doubles on each square. By the 64th square, the total number of grains is an astronomical figure—18 quintillion. These examples illustrate the importance of numeracy and the ability to think critically about data. Conclusion - Towards AI Literacy In conclusion, we must acknowledge the risks associated with AI, particularly those related to poor data and human biases. Utilizing AI effectively requires a paradigm shift. We need better data and more data-literate humans to navigate the complexities of AI. By fostering data literacy, we empower individuals to make informed decisions, understand the limitations of AI, and contribute to the development of fair and unbiased AI systems. Let's embrace this journey towards AI literacy and ensure that we harness the power of AI responsibly and ethically.
0 Comments
Leave a Reply. |
ABOUT MEAll content & opinions are those of the author Archives
March 2025
Categories |