Multiparadigm data science: AI, analytics and answers

Huge amounts of data have come on stream over recent decades. Has this data availability lead to better decision making?

Overall, it is hard to believe this has been the case. Which leads to a follow up question, namely why has decision making not improved; and is there something we can do about it?

We are all suffering from information overload. We may have much more data, but we are struggling to see the wood from the trees. It is the opposite problem from the pre-internet age, when we did not have enough data to make systematic decisions. Accessibility is one problem; although there are many levels of accessibility, and accessibility can be as broad as you want to make it. There is the challenge of disparate data – can you bring different data sources together to tell a bigger story? Can you make average and aggregate data personal, for example, determining an individual's risk of heart disease? All these issues are key challenges of data methodologies today.

Many see computation as the key solution. High-powered computation for everyone will allow better access and personalisation of data. This requires good automation to empower end users. Individuals may not know exactly how machine learning functions, but that does not matter. However, extracting the best from data also requires some reorientation of humans. Unless humans have the ability to understand computational thinking it will be harder to interact with smart automation. Only a greater ability to understand automation processes will lead to the highest level of insightful computation.

One way to understand processes is to look at our track-record. Computation has worked well around data in structured form, such as data from physics, for some time now. Others have not proved as successful. This includes health and medical data. The rapid deployment of new ideas repeatedly changes the parameters of models. It can promote fuzzy questioning, and it may become difficult to communicate with people in meaningful ways. Moreover, not all data is within the parameters of a spreadsheet.

The best solution is to apply a multi-paradigm approach to computation. Individuals should not lock into a single tool set, but apply the widest possible tool set, and apply it to the problem. Although the current buzz is around machine learning, it does not always deliver good results, and is only one tool among many.

One crucial starting point with looking at data is to begin with the interface. The interface is the crucial determinant of whether something is useable or not. Data scientists need to think what sort of data they need to acquire and work backwards towards the interface. Linguistic queries are one way of doing this, and accurate for a specific data set, such as life expectancy. That could be expanded to compare life expectancy from Switzerland and the UK by search. Here you can see the underlying code cleanly, as a programmer might set this up. These inquiries are quite specific. We could ask what might happen if I went for a work out, how many calories I burn, and how this might compare with what I had for breakfast. However, if you say eggs and smoked salmon, that information alone might not be enough – how the egg was cooked, for example, could make a calorific difference. The same difficulties emerge if you try and ask if you are drunk – there are many factors to consider, including weight and gender.

There are other forms in which to present data. One is in the form of a knowledge app, a repository of data, which the user can question in different ways, perhaps accessing correlations never explored before. Another is a narrative document, rather than being a dead report, one with some interactivity built into it.

The term 'search' is very useful for eking out existing information, but is not that effective at generating new knowledge. Artificial intelligence tries to overcome this. You can through data with AI and you will get results – however, AI has yet to realise its full promise, there are some large holes, where it is not working so well, and frequently requires users to go over results again by hand. Another means of analysing data is to think of it in computable terms. The good news is that it is better at providing results.

The bad news is that you need to have your data in shape to make it work. A computational data level is not fully automated, and would require a human expert to interpret results.
Traditionally the computer has been there to crunch the data for the human boss. Increasingly the computer will make suggestions or insights in crunching the data, and the human will be managing the project to get to a point they wish. It will be akin to giving a task to a human subordinate and asking to see where they can get. Humans used to manage largely by intuition in a pre-computer age. The advent of a computer saw targets set, but relatively crude metrics that do not capture the complexities of an organization. Future management will be managing the quantification of numbers into computable data sets; but at the same time providing the latitude of intuition to assemble data in a way that ultimately can tell a story. Targets become agile instead of being fixed. Sliders become moveable, management can decide what constitutes sensible metrics, and answers are not reduced to being binary.

Multi paradigm data science is bringing benefits across a range of industries. A British team attempting the land speed record, who threw their data at the Wolfram team, started to see results and correlations they had not seen before. Alternatively, the data thrown up by Biovotion sensors creates a variety of fields we can term a matrix of confusion. It helps us demonstrate when an individual is walking, running, biking, or swimming.

Realising the full potential of data will require humans who fully understand the potential and limitations of data analytics. In education, the human does all the calculating. In the real world, the computers do. We have yet to rebuild our curriculum around the assumption computers exist. Get that right, and it will start to build our computational ability into our everyday lives. Estonia is preparing itself to bring the beginnings of such an approach into education.

 

Summary of Conrad Wolfram's presentation at the Centre's Health monitoring event in December 2017. Conrad is Strategic Director and European Co-Founder/CEO of Wolfram Research. Summary by Simon Woodward.