top of page

Overcoming AI Bias: Understanding Bias in Machine Learning and Humans



While answering questions this week on a panel on AI, one attendee bravely asked the question ‘Why is this an all-male panel?’ The answer, was of course, “I have no idea”, “it isn’t right”; an oversight by the event team, that it is also likely indicative of the gender gap in senior positions and technology. However, the fact that no one, including me, had spotted this obvious issue deeply bothered me. The incident served as a stark reminder of the biases that permeate our society, and given the subject of the panel, reminded me of the problems with bias in the technologies we create and interact with.


Understanding Machine Learning and Bias

This isn’t just about who sits on panels. The subtler, often overlooked biases in our everyday life affects our AI and machine learning outputs. Firstly, most people I speak to don’t really understand how machine learning works. AI like ChatGPT are algorithms that are ‘trained’ on a back-book of relevant data; by which I mean the algorithm itself learns the patterns in that data and uses those learned patterns to predict future values that you give it. I guess one of the simplest examples is the good old linear regression, or as you are more likely to know it, the ‘line of best fit’. At school you plotted the points on the scatter chart, and then drew a line of best fit. The straight line you drew was as close to as many points as possible. You then worked out the equation of that line from the point it crosses the y-axis and the gradient of the line. And voila, you had a formula you could estimate future values from. That’s machine learning. The machine, a computer, takes the historic data, works out the best mathematical representation of all that data, and uses it to predict future values. Those predicted values are (using a word that hopefully makes more sense to the reader, but I am sure will make statisticians jump up and down in anger telling me I’m wrong) essentially just the forward representation of the ‘average’.


Average is easy

There’s some obvious issues arising. The line of best fit is based on the ‘average’, and the outputs of all generative models are based on probability. ChatGPT works by simply identifying the next most probable word in context. There is no real intelligence, no understanding, just the impression of it. Its seen so much data, its learned from so many examples, its highly likely to be able to pick the most next relevant word. Now LLM creators know this, and in ChatGPT there’s a tuneable feature (or ‘hyper parameter’ as we call it) called ‘Temperature’, which allows it to produce outcomes that apply degrees of randomness either side of the most probable outcome. So maybe instead of producing the most probable word every time, it may produce the most likely, second most likely, fourth most likely etc.

The difference is humans aren’t always average, and we also understand that average often isn’t the best answer.

LLMs may do somewhere between hundreds of millions and even billions of calculations to work out what the next likely, or series of most likely words are in a sentence. For all of that though, the machine only ever knows what it has learned from its training data, just like humans only know what they have learned from experience. If the data we train algorithms on contains our subtle biases, then the outputs will too.


Of VAT and AI

I had started my week rather tired. I had been up until 1am on the Sunday creating a machine learning model to demonstrate how we could auto classify invoice lines, that is the individual line items on an invoice, to check the right VAT (tax) treatment had been applied. The training data was representative of an organisation in the insurance industry. Therefore, my algorithm only really knew insurance-related terms, or I suppose terms that related to the management of office-based businesses. I realised it was time for bed when I started overthinking this. What if I gave my algorithm terms from, say, the medical or policing domains? Sure, it would get many of them right, but not all of them. Insurers probably don’t get many invoices for ‘handcuffs’ or ‘CS Gas’ for example. This showed me how both machines and humans can become biased based on their ‘experiences’.


Like the model, if we’re only exposed to one type of information, we’re likely to excel in that area but might falter when faced with something new. By nature we are all biased, but it becomes a problem when we don’t know we are. That’s unconscious bias.


Bias, bias everywhere….

The difficult question at the MK50 event reminded me that bias in machine learning isn’t just a tech problem; it reflects a much broader human condition. We’re all shaped by our environments, leading to a deep understanding of familiar topics while potentially leaving us puzzled by unfamiliar ones. I could, after all, have given police invoices to my algorithm. It would have processed them and spat out results confidently. But many of the results wouldn’t be right. I think this is like humans and our propensity for unconscious bias. Just as my algorithm could churn through data without understanding its nuances, humans often rely on surface-level impressions or stereotypes to make judgments. This parallel underscores the importance of recognising and addressing biases, both in machine learning models and in ourselves.


Making one model that can do everything?

Just in case you were wondering, the VAT classifier demonstration on the Monday went very well indeed. Lots of organisations struggle with making sure they can account for (and reclaim) the right level of VAT on their purchases and implementing an algorithm like the one I had built can often save many days of manual effort every week.


However, the combined lack of sleep, and the adrenaline from a well-received demonstration got me thinking. Could I create a model that could handle every kind of transaction and get it 100% right every time?


The answer is no.


Imagine trying to train a model with not enough data. This issue, known as underfitting, means the model is too simple to catch the complex patterns in the data, resulting in poor performance. It is like an overconfident trainee. An answer, said confidently enough, is not always the right one. In humans, this is called the Dunning-Kruger effect, is a cognitive bias in which people with limited competence in a particular domain overestimate their abilities. Put simply, they ‘blag’ it.


On the other hand, when we have too much data, the problem of overfitting arises. Our model, burdened with an excess of details, learns too much from the training data to the point where it can’t perform well on new, unseen data. It starts reacting to random fluctuations as if they were important patterns, failing to distinguish between meaningful information and mere noise. It suffers, like humans, from ‘analysis paralysis’. Think of it like learning to play football (soccer) by only watching professional matches on TV. You might pick up on some advanced techniques and strategies used by professional players, trying to imitate their style in every game you play with friends at the local park. However, focusing on these complex moves without mastering the basic skills, like passing and dribbling, will leave you performing poorly. You’ve tailored your play style too closely to the professional level without a solid foundation in the basics, making it hard to adapt to playing football with your friends.


The Unseen Value of Edge Cases in Data and Society

In creating statistical analyses, or training data sets, we often remove the outliers to make sure we get an answer closest to the average. Outliers or ‘edge cases’ are often deemed to be ‘wrong’ or ‘erroneous’. These unusual data points that don’t fit into clear categories remind me of the human propensity for diversity. Just like these edge cases, people don’t always fit neatly into societal boxes, and it’s this diversity that makes the world a better place.


Let’s consider the high-school scatter chart again. Imagine your chart was ‘academic talent score’ on the y-axis, and age on the x-axis. You remove the genius young talent, or the incredibly talented middle-aged artist, who maybe didn’t have the best education. Removing these edge cases to get a ‘more average answer’ isn’t the right thing to do in life, so we should make sure they are accounted for in our AI too.


The lesson here from data extends far, far, beyond numbers. Ignoring edge cases deepens inequalities. Stories of innovation driven by outlier events, after all it is often said that necessity drives innovation, remind us of the true value of outliers; in my eyes the path to a better world is paved with the extraordinary, not the average.


For data scientists, incorporating edge cases means building models robust enough to handle the unexpected. For policymakers, it means crafting laws that consider everyone’s needs.


This may be getting a little deep, but I have concluded in my mind that given all the above, if AI continues to report the average or a few steps away from the average, I’m sure humanity doesn’t have quite as much to worry about as many think, because current algorithms will never be exceptional. Sure, hell yeah, AI is going to change everything, but hopefully this will drive us to a better world, where if we can train AI on data that isn’t biased we will get fairer outcomes, and a world where we value what is important in life more. Indeed Barack Obama said:

“I think we’re going to have to start having conversations about: how do we pay those jobs that can’t be done by AI? How do we pay those better — healthcare, nursing, teaching, childcare, art, things that are really important to our lives but maybe commercially historically have not paid as well?”

The Big Takeaway

Reflecting on this week, from the awkwardness of an all-male panel to wrestling with machine learning models, I’ve learned a lot about bias: the importance of diversity, and how our human thinking mirrors the behaviour of algorithms. Biased machine learning models don’t work, and bias in the real world doesn’t work either. Innovation doesn’t just come from crunching data or following established paths. It’s often the unexpected, the outliers, that lead to breakthroughs. Like that outlier panel question that got me thinking.

And about those late nights? They taught me something about balance. Sure, dedication is key, but so is knowing when to take a break. Great ideas don’t often strike at 1 AM. Mostly, they come when you step back, rest, and give yourself space to think and be human.


Let’s try to eliminate our own biases by learning, growing, and opening ourselves to new perspectives, driven by curiosity and a genuine appreciation for what makes us different. We need to understand our limitations, because generative models are not yet routinely saying ‘I don’t know’. Different is better. Lets use this wisdom to get the best out of AI in the future, a future that isn’t predicated on the average.


After all, where’s the fun in average?


 

Jamie is Founder at Bloch.ai, and a Visiting Fellow in Enterprise AI at Manchester Metropolitan University. He prefers cheese toasties.


Follow Jamie here and on LinkedIn: Jamie Crossman-Smith | LinkedIn

0 views0 comments

Comments


bottom of page