Machine learning is hard. Programming for embedded environments, where processing cycles, memory and power are all in short supply is really hard. Deploying machine learning to embedded targets, well… That’s been pretty close to impossible for all but the simplest scenarios… Until now.
New modern tools, including ours from Reality AI, are making it possible to use highly sophisticated machine learning models in embedded solutions unlike ever before. But there are some important considerations.
Has the machine learning module been thoroughly tested and validated?
It will go without saying to experienced embedded engineers, that the decision to commit any code module into an embedded system product must be careful and deliberate.
Firmware pushed to products must be solid, and reliable. Usually, products are expected to have a long lifetime between updates, if updates are even feasible at all. And they are expected to work out of the box.
In the case of a machine learning-based classifier or detector module, the usual validation concerns apply, but the caveats we've talked about elsewhere apply in spades: In particular, whether the system has been trained and tested on a large enough variety of REAL WORLD data to give confidence that it will hold up, unsupervised in the wild.
What are the risks and trade-offs of incorrect results? If you know that the machine learning models results are useful, but not perfect, can customers be educated to work within known limitations? Is the use case safety critical, and if so, what are the fail-safe backups?
Testing and validation are much better done in an easy to use, sandbox environment. Our tools, for example, provide the user the ability to experiment, retrain, and test with as much data as they choose before the classifier ever leaves the safety of the cloud.
We even support "live" testing through cloud APIs, so that the customer can have every confidence they have tested and characterized a classifier or detector module before ever committing the resources to push it to firmware and customer devices.
Will it translate to your embedded hardware?
Processing speed, memory, energy use, physical size, cost, time to deployment: all of these are critically balanced elements in an embedded processing system design. The resources required to deploy the classifier and to execute operations - often within critical real-time latency constraints - will make or break your design.
Machine learning approaches like Deep Learning are very powerful, but only if you've got space, power, and budget in your design to support its rather intensive requirements. Other methods can be reduced to sequences of standard signal processing or linear equation type operations at execution time, like ours, may be less general but can comfortably run in real-time on an inexpensive microcontroller.
Don't forget the math: it is easy and natural these days to develop detectors and classifiers in high-precision, floating point tools on large desktop or cloud-based systems. Ensuring the dynamic range and precision is sufficient to reproduce similar performance on, a low-bit depth, fixpoint platform can also put notable constraints on embeddable algorithms.
It is important to know what the requirements will be, as there is no sense spending expensive R&D efforts on a project that cannot possibly be deployed in your target system.
Is there an early feedback mechanism?
Your detector looks good enough to release, but is it perfect? Iteration is the key to perfecting AI classifiers and detectors.
Will the designers have access to what really happens in the wild? Can you set up a pilot with one or more friendly customers to give you both quantified validation of the system in a real setting, and, even better, access to data and signals that may have caused problems so that they can be incorporated into future training sets?
Remember that most machine learning is data driven. Though you have analyzed the problem and synthesized a wide variety of test vectors back at the lab, you'll never cover all possible variations. Hopefully, you'll find the first-cut classifier or detector is robust and does its job. But few good plans survive first contact with the enemy, and having data from even one or two real customers will quickly introduce situations, noises, systematic errors, and edge cases for which no one fully planned.
How fast can you respond with improvements?
Most companies that ship products with embedded software have a long and (deliberately) onerous review and quality control process for getting new code intro production. You can't afford to mess up, so validation, walk-throughs, and other process steps necessarily take time and money -- to ensure that only quality goes out the door. Of course this is exactly the opposite of what you need for fast, iterative product improvements.
But there are compromises available. Perhaps the classifier design can be reduced to a standard module, which is fully code validated, but operation of which is in part defined by a changeable set of coefficient data (something we support with Reality AI Tools).
In a design like this, updates to just the operational coefficients can be pushed, in many cases, without requiring a major code revision and trigger revalidation procedures. Hot updates are comparatively quick and safe, and the results can be revalidated in vivo, so to speak.
So what does it take to embed my classifier?
In the old school world of signal and machine learning R&D, the distance between a laboratory test prototype and an embedded deployment is substantial. The successful custom prototype was only the start of a long process of analysis of requirements and completely re-coding for the target device.
But that's changing. If you know you are targeting an embedded platform, and you have some idea of the requirements, modern AI tools can help plan for it from the start. They can be configured to choose from among proven, highly deployable AI module designs, and provide for thorough testing and validating end to end in the cloud.
With these tools generating an embedded classifier module may be as simple as linking in a small library function into your code base. An update as simple as pushing a small binary data file to the end device.