To help you get going, mishmash io follows three guiding principles that make algorithm development easy and accessible, despite increasing data sizes and complexity.
To extract value from patterns in data you need algorithms. Learn what makes mishmash io
the database for app's complex predictive logic.
Info alert:How it works:
There are many things that are in short supply in today’s world – privacy, clean air, good manners – but one aspect of modern life that is not lacking is data. There are mountains of data being generated every second of every day. We are overrun with the stuff. Companies collect and store data on anything and everything they can think of, knowing that it must be useful to help them solve problems, gain market advantage and increase profits. But in reality, all that data can only be useful if we can analyse and understand what the information is telling us. This need is a major reason for the increased focus on machine learning and artificial intelligence that we’ve seen in recent years, as only computers have any hope of processing all the data being collected so that we can understand what it all means.
The trend so far has been to create applications (apps) that comprise a ‘front end’ for interacting with the data, a ‘back end’ to connect to the data, and a database to store the data.
A lot of work has been done to develop large-scale computational methods to assist with extracting and then analysing the data, but these are often treated separately from the database.
In other words, you have to remove the data, analyse it, then go back to the database and repeat.
All this can seem daunting to companies without a history of handling and analysing data. While their software developers understand the apps inside out, they may not know where to start in creating an efficient algorithm that can be rapidly applied to the database to derive some useful trends or targeted information on customers, etc.
Bringing in expertise in traditional databases is one way forward, but as we shall see, these have serious limitations when it comes to handling the ‘big data’ that is increasingly showing itself to be valuable in solving all sorts of problems.
mishmash io
presents an alternative approach that unites databases and machine learning but does so in a way
that developers will find easy to understand and work with. We describe it as a ‘distributed database system’
because it looks and feels like a normal database but is designed to make it easy to take computational methods –
algorithms – written in your app and apply them to the database itself.
This approach has several advantages. It’s much faster and easier to move an algorithm from your app into a database rather than move the data into your app, especially with the quantity of data typically being handled by apps these days.
Also, because mishmash io
is set up to organise the data into useful chunks and then optimise how the
algorithm is run (more on this later), you only need a couple of hundred lines of code to perform some pretty
sophisticated computations on the data – something that a developer can easily create in a short time as a new
feature for your app.
This small chunk of code is also made easier to write for two reasons. Firstly, mishmash io
does not have a
query language; it will accept and apply your algorithm in the language in which you have chosen to write
it – python, java, ruby, whatever. Secondly, mishmash io
does not have a data schema; it can use the data
model – the arrays and objects – that you have set up in your app and apply it to the data as is. There is no
need to use special frameworks or specific lines of source code.
The ease with which algorithms can be written and run using mishmash io
helps make the principles of machine
learning more accessible.
For example, it is relatively straightforward for a developer to write an algorithm which splits the data, compares the results and looks for a gain in information relevant to the query. Each split is scored for its information gain, and then the split with the highest score is analysed further until all the key parameters needed to give the highest scores are identified by the algorithm.
This process of scoring splits in the data is one way (there are others) in which mishmash io
enables machine
learning within a system – it’s nothing more magical than that!
However, this way of analysing the data iteratively has the great advantage that you (the person posing the question) don’t have to pick a place to start.
In other words, you don’t have to select the parameters that you think might be important in answering the question, which tends to skew the answer in favour of the chosen starting point (this almost inevitably reflects some bias on your part). It also risks missing other parameters that you, in your wildest dreams, might not have expected to be important.
So far, we’ve seen that mishmash io
offers some key advantages to help those who are new to machine learning
and analysing of large amounts of data.
But don’t be fooled – there’s a lot of very clever stuff going on ‘underneath the hood’. In fact,
mishmash io
has a dual function when operating, as illustrated in the diagram.
In the first, shown on the left of the diagram, mishmash io
is ‘digesting’ the data in the database.
This involves applying a proprietary algorithm which decomposes the data into convenient chunks that contain the relevant information for answering the question posed by an algorithm.
We call these chunks mishmashes, and they are stored in a distributed file system which allows them to be processed simultaneously at separate nodes within a cluster of computing locations.
Performing computations in parallel in this way can significantly increase the speed with which the algorithm can be run across all the relevant data.
The second function, illustrated on the right, is the digesting of the submitted query (or algorithm), which
mishmash io
transforms into equivalent algorithms which will run more efficiently across the clusters in which
the mishmashes are stored.
The software looks at how the information is stored, including the splits in the data that have already been made to create the mishmash, then looks at the query that the algorithm is designed to answer, and works out a way of efficiently applying one to the other to minimise the amount of computation involved.
Representation of the functions of mishmash io
used to analyse a database to answer a query posed by an algorithm
And it doesn’t stop there, because the situation is in a constant state of flux. New data is regularly added by the app, and new queries are also received.
So mishmash io
works to re-order the way the data is split into mishmashes to optimise the running of each
algorithm, as well as transforming each algorithm to best suit the way the data is stored.
This is the linking gearwheel in the centre of the diagram. It’s a synchronous process where one side is continually adjusting in response to changes in the other, similar to a chemical reaction approaching equilibrium except that a balance is almost never achieved because new information is continuously being added by the app in today’s data-hungry world.
The strength of this approach can be illustrated using a light-hearted example: generating interesting statistics about a football match to help a commentator. Historical data on football fixtures can be purchased, giving information on the teams, venue, referee, scores, players, etc.
This is stored in mishmash io
exactly as received, where it forms a tree structure.
If you then want to discover what factors lead to England beating Bulgaria, you can create an algorithm in about
200 lines of code that directs mishmash io
to do the following:
For example, this will identify, amongst other things, that Bulgaria hasn’t beaten England at Wembley since 1967 – a serious home advantage!
We discuss this example in much more detail in a separate article.
Conceptually, this approach is familiar to software developers, and mishmash io
gives them a tool to quickly
start applying it to data of any size to extract the maximum value that it can provide.
By writing a short algorithm that uses their choice of language and data model, they can use mishmash io
to
perform data analysis across multiple nodes in a cluster simultaneously without moving large amounts of data
around or using special frameworks of specific query languages.
In other words, mishmash io
demystifies and democratises machine learning and opens up the endless
possibilities that can be conceived when it is applied to understand data and solve problems.
To help you get going, mishmash io follows three guiding principles that make algorithm development easy and accessible, despite increasing data sizes and complexity.
Check out how mishmash io speeds up complex Machine Learning algorithms, by combining sophisticated code analysis and deep understanding of input data.
See how we use an algorithm to find structure in this smart football commentator example app.