The use of machine learning is becoming pervasive within decision making systems. As computing power and storage has increased dramatically in recent years, the capabilities of machine learning have become clearer and it has been applied to a wide range of problems across many business sectors.
Before we can realise the full potential of machine learning, we first need to be able to trust it. Most complex algorithms work in an extremely opaque fashion, making it impossible to understand how they work, and making it difficult for trust and confidence to be earned.
The “Black-Box Problem” refers to a challenge at the very core of machine learning – that whilst we can see the data we input to build models using machine learning, and we can see the output, more often than not we cannot see how the model has produced the output that it did.
When your favourite online shop recommends a product that you would never consider buying, there is no harm to you. But if the failure in the AI is more critical, for example deciding whether or not someone gets parole, it matters that we can explain that decision. If a person made this decision, we could ask them to explain their decision making process. To be able to trust models, we need to understand their decision making process.
The ability of any system created using machine learning depends on the dataset it was trained on. If that dataset was incomplete, biased or out of date then the decisions that system produces are unlikely to be correct.
A system could perform correctly in most cases, but fail when it encountered a certain set of inputs.
In AI is not just learning our biases; it is amplifying them, Laura Douglas explains what can happen when we rely on a biased dataset:
“Imagine there is an algorithm which aims to predict from an image of a person whether they are a man or a woman. Let’s say it’s 75% accurate. Assume we have a biased dataset which is not reflective of true society (perhaps because, for example, it’s made of images from Hollywood movies). In this dataset, 80% of the time somebody is in the kitchen, it is a woman. The algorithm can then increase its accuracy from 75% to 80% by simply predicting that everyone who is ever in a kitchen is a woman. Since the algorithm simply optimises for accuracy, this is what it will most likely do. So the bias here has been amplified from 80% in the dataset, to 100% in the algorithm.”
In this example, the system using the algorithm might be 100% correct, until the input is an image of someone in a kitchen, and then it fails.
Having an unreliable system given a specific set of inputs isn’t necessarily a problem, providing that users understand that the system cannot be relied on under those circumstances. If you know the limitations of your system, you can take action.This could be that in circumstances where the system is unreliable, you refer the matter to a real person, or you work to improve the system for those circumstances.
In the case above, if you were confident in the system’s ability to correctly identify a kitchen, but knew it couldn’t tell if the person in the kitchen was a man or a woman, you could flag that picture for human review.
If you want to remove all instances of human review from your process, then you need to fix the algorithm. Because we are aware of the bias that has been introduced, we know we need to obtain a more representative dataset for photos of people in the kitchen.
It would be better however, if we could be trust that the dataset we started with was accurate and not biased in the first place. This is where some experts are now anticipating we will see a convergence between machine learning and blockchain technology.
Brian Kuhn, Co-Founder of the Watson Legal Practice at IBM believes that whilst AI is proving its benefits across a wide range of sectors and business cases, it could be so much more beneficial if we could trust it.
“No one wants to rely on black box technology and if they have the freedom not to they won’t – they will demand insight.”
Kuhn believes that the black-box problem is something that all vendors offering AI based solutions are going to have to address because the wider adoption of blockchain technology will mean that customers will expect transparency as standard.
In Kuhn’s view, we cannot trust machine learning models because it is often not clear who trained a model or the data that was used, and that produces a risk for businesses – a risk that they are often reluctant to take on.
Kuhn believes that blockchain can help to overcome these issues:
“How do I defend myself if a machine learning algorithm produces a recommendation that’s simply too complex for me to parse? There is a way and that’s by using blockchain to create a forensic audit trail that can be entered into evidence.”
“It might not be a perfect approach but it’s the best approach we have right now to capturing something much more than directional, about how a decision is produced by a machine learning algorithm.”
Blockchain technologies can be employed to improve the trustworthiness of the data, as well as potentially record and step through the stages in a machine based decision. Furthermore, the potential for blockchain to offer secure data sharing could mean more data becomes available.
For some businesses, the barrier to entry when it comes to making use of machine learning isn’t the access to the technology – it’s the access to the datasets they need to train the algorithm. If datasets can be shared without privacy concerns, then access to data should become easier. Recent work such as CryptoDL: Deep Neural Networks over Encrypted Data has looked at whether data can be fed into machine learning algorithms in an encrypted state, rather than providing access to raw data, in order to address privacy concerns. If data could be held on a blockchain in an encrypted state and fed into an algorithm directly without any need for decryption, then the risk of loss of data through a misplaced decryption key is mitigated.
Blockchain and AI are in some ways polar opposites when it comes to their ethos. Those who have developed machine learning algorithms that solve a given problem tend to keep guarded and we have to have faith that they work as expected. Blockchain, on the other hand is very much about being open and transparent about what’s going on.
Whilst machine learning may have come of age, the potential for blockchain to bring trust into the equation could be the missing piece in solving the black-box problem.