Machine (deep) learning is already widely used in back-end, enterprise systems to analyse large and complex datasets, but when used with speech, voice, text and facial recognition like Siri, and smart gadgets that automatically collect and analyse data and behaviour, the AI ‘genie is out the bottle’ and in our hands.
Artificial intelligence and ‘machine learning’ will be used in areas such as radiology, dermatology and pathology to ‘improve clinical care’ – and to get through more time-consuming work than health professionals can,
A previous post looked at how chess playing computers are at the forefront of development. Though designed to play strategy games, a version of the champion AphaZero, is more versatile, and considered a “broad” AI capable of both supervised and unsupervised reinforcement learning. Besides games, it has learned to model complex proteins as AlphaFold. Machine Learning and big data are quickly becoming widespread and though AIs now have only around the same number neural connections as an earthworm, they’re still capable of doing things that couldn’t be done before –
e.g. planning the daily routes for Amazon’s 158k delivery drivers, predicting attendance, identifying target demographics, weather forecasting, market and retail analysis, facial recognition, natural language processing, virtual assistants, advertising, genomics, optimising online…
When processes are automated existing constraints need no longer apply, but new challenges and risks can also emerge. As far as possible these need to be considered early on as deep learning solution can be opaque and difficult to debug e.g. in 2016 a fatal accident involving a Tesla car in auto mode was attributed to the car misidentifying a truck pulling out as a road sign because it had been trained on data that only showed trucks from behind. The codebase needs to accord not only with the view of the organisation and developer, but to work in the messy and often unpredictable real world
Some challenges –
- The problem’s too hard or too broad – even a simple sentence with double negatives can cause confusion
- AI cares nothing about UX (or anything else for that matter)
- Training data may be imbalanced leading to biased outputs – a smart system that assessed the CVs of job applicants inherited the human biases in the training data
- Bad, insufficient or messy data leads to flawed, incomplete or useless solutions – outputs can deviate from those intended
- The problem is not what was thought
- Shortcuts (hacks) taken by AI can be hard to understand- e.g. https://youtu.be/TRzBk_KuIaM?t=164
Connecting smart technology with end users and the real world?
Full automation is often inappropriate for systems e.g. consider your experience of automated telephone lines. But robotic process automation (RPA) can facilitate human work e.g ordering and finding patterns in data for a human decision maker and Google’s integration of search and analytics. In any case the development and application of AI is facilitated by ongoing feedback that connects it to the “real world”. A big design up front approach risks complexity and neglecting both usefulness and usability.
Below are some research methodologies currently used to inform the development, training and testing of automated and intelligent decision support systems (IDSS).
Formative and summative user research
(aka generative and evaluative)
- Formative explores and defines requirements and context
- Summative evaluates the efficacy and efficiency of a design or build
- Formative research typically gives way to summative as development progresses and there’s something to test
Methods and outputs
How do users/customers view automation and smart technologies?
e.g. automated systems might be less accessible, trusted or tolerated.
Qualitative research: ethnographic and attitudinal studies, user and stakeholder interviews
contextual enquiry: observing interactions
Quantitative research: surveys
Insight reports, anecdotal evidence
What might the benefits and consequences be of introducing smart technologies for both them and the provider
Formative interviews with stakeholders and users, technology capability, impact assessment, cost benefit analysis
Experience map across service touch-points
What might be the challenges of introducing smart technologies for both
Review of existing technologies, systems and data, (Though a summative technique usability testing can also inform this question)
Formative interviews with stakeholders and users
Swim lane service map showing touch points and handovers
Will the system be accessible to people with disabilities or communication requirements
Accessibility audit and gap analysis
Ensuring research demographics represent he Gov.uk digital inclusion scale
Accessibility report, alternative user journeys
Is a controlled natural language (CNL) required and if so, what is its vocabulary?
Analysis of service call logs: Bag-of-words (BoW)
Interviews and observing interactions
Lexicon, or a more formal controlled vocabulary e.g. Regular expressions (Regex)
What are the conditions and choices for the user, and the available paths in the system
e.g. most simply the menu options for a website’s navigation – decision trees
Card sorting, system logs, analytics
Scripts, questions and phrases, tree diagram of navigational hierarchy, customer journey maps
What will be helpful feedback for both the system and the user (positive and negative)?
e.g. “From how you have described your enquiry the HMRC tax helpline might be more relevant.”
Interviewing and observing frontline operations, customer service logs
Scripts for system response, model questions and responses
More generally, what are the use cases, devices, connectivity, user stories, flows and navigation
Use cases and user stories suitable for a product backlog, process (user) flow diagram,
Summative research – testing
Methods and outputs
Evaluating the “smoothness” of stepping through the process across service touch points and in realistic timeframes
Usability testing, diary studies, service logs and anecdotal evidence
Usability report, completion times and success rates
Evaluating the transitions (handovers) between different contexts/modalities
e.g. a shopping list compiled on a phone, order placed with a pc, the “shop” calculated and receipt issued at point of sale, stock systems updated, and delivery signed for on a pda.Method –
Usability testing, Amazon’s Mturk (also used to evaluate use cases), analysis of drop off rates at boundaries
Drop off and bounce rates
Evaluating the efficacy and efficiency of automated processes for users
Usability testing, preprocessing training data, rigorous testing for biases in outputs, interviews, questionnaires, observing frontline operations
analysis of server logs, bounce rates, completion rates, anecdotal reports
“Life, the Universe, and Everything. There is an answer. But, I’ll have to think about it.
The Answer to the Great Question… Of Life, the Universe and Everything… Is…
Forty-two,’ said Deep Thought, with infinite majesty and calm.”
The Hitchhiker’s guide to the galaxy