Stuart Feffer

Embedded AI is too hard - Batman meme

Embedded AI – Delivering results, managing constraints

"It’s not who I am underneath, but what I do that defines me.”

– Batman

Over the last few years, as sensor and MCU prices plummeted and shipped volumes have gone thru the roof, more and more companies have tried to take advantage by adding sensor-driven embedded AI to their products.

Automotive is leading the trend – the average non-autonomous vehicle now has 100 sensors, sending data to 30-50 microcontrollers that run about 1m lines of code and generate 1TB of data per car per day. Luxury vehicles may have twice as many, and autonomous vehicles increase the sensor count even more dramatically.

But it's not just an automotive trend. Industrial equipment is becoming increasingly “smart” as makers of rotating, reciprocating and other types of equipment rush to add functionality for condition monitoring and predictive maintenance, and a slew of new consumer products from toothbrushes, to vacuum cleaners, to fitness monitors add instrumentation and “smarts”.

Real-time, at the edge, and a reasonable price point

What these applications have in common is the need to use real-time, streaming, complex sensor data – accelerometer, vibration, sound, electrical and biometric signals – to find signatures of specific events and conditions, or detect anomalies, and do it locally on the device: with code that runs in firmware on a microcontroller that fits the product’s price point.

When setting out to build a product with these kinds of sensor-driven smarts, there are three main challenges that need to be overcome:

Simultaneous Challenges when using Sensors with Embedded AI:

  • Variation in target and background

  • Real-time detection

  • Constraints – size, weight, power consumption, price

The first is Variation.
Real world data is noisy and full of variation – meaning that the things you’re looking for may look different in different circumstances. You will face variation in your targets (want to detect sit-ups in a wearable device? First thing you will hit is that people do them all slightly differently, with myriad variations). But you will also face variation in backgrounds (vibration sensors on industrial equipment will also pick up vibrations transmitted thru the structure from nearby equipment). Background variation can sometimes be as important as target variation, so you’ll want to collect both examples and counter-examples in as many backgrounds as possible.

The second is Real-time detection in firmware.
The need to be able to accomplish detections locally that provide a user with a “real-time” experience, or to provoke a time-sensitive control response in a machine, adds complexity to the problem.

The third is Constraints – physical, power, and economic.
With infinite computing power, lots of problems would be a lot easier. But real-world products have to deliver within a combination of form factor, weight, power consumption and cost constraints.

Traditional Engineering vs Machine Learning

To do all of this simultaneously, overcome variation to accomplish difficult detections in real-time, at the edge, within the necessary constraints is not at all easy. But with modern tools, including new options for machine learning on signals (like Reality AI) it is becoming easier.

Certainly, traditional engineering models constructed with tools like Matlab are a viable option for creating locally embeddable detection code. Matlab has a very powerful signal processing toolbox which, in the hands of an engineer who really knows what she is doing, can be used to create highly sophisticated, yet computationally compact, models for detection. 

Why use Machine Learning?

But machine learning is increasingly a tool of choice. Why? 

For starters, the more sophisticated machine learning tools that are optimized for signal problems and embedded deployment (like Reality AI) can cut months, or even years, from an R&D cycle. They can get to answers quickly, generating embeddable code fast, allowing product developers to focus on their functionality rather than on the mathematics of detection.

But more importantly, they can often accomplish detections that elude traditional engineering models. They do this by making much more efficient and effective use of data to overcome variation. Where traditional engineering approaches will typically be based on a physical model, using data to estimate parameters, machine learning approaches can learn independently of those models. They learn how to detect signatures directly from the raw data and use the mechanics of machine learning (mathematics) to separate targets from non-targets without falling back on physics.

Different Approaches for Different Problems

It is also important to know that there are several different approaches to machine learning for this kind of complex data. The one getting most of the press is “Deep Learning”, a machine learning method that uses layers of convolutional and/or recurrent neural networks to learn how to predict accurately from large amounts of data. Deep Learning has been very successful in many use cases, but it also has drawbacks – in particular, that it requires very large data sets on which to train, and that for deployment it typically requires specialized hardware ($$$). Other approaches, like the one we take at Reality AI, may be more appropriate if your deployment faces cost, size or power constraints. 

Three things to keep in mind when building products with embedded AI

If you’re thinking of using machine learning for embedded product development, there are three things you should understand:

1. Use rich data, not poor data. The best machine learning approaches work best with information-rich data. Make sure you are capturing what you need.

2. It's all about the features. Once you have good data, the features you choose to employ as inputs to the machine learning model will be far more important than which algorithm you use.

3. Be prepared for compromises and tradeoffs. The sample rate at which you collect data and the size of the decision window will drive much of the requirements for memory and clock-speed on the controller you select. But they will also affect detection accuracy. Be sure to experiment with the relationship between accuracy, sample rate, window size, and computational intensity. The best tools in the market will make it easy for you to do this.

machine learning data high sample rate vibration waveform

5 tips for collecting Machine Learning data from high-sample-rate sensors

Machine learning on high-sample-rate sensor data is different. For a lot of reasons. The outcomes can be very powerful – just look at the proliferation of “smart” devices and the things they can do. But the process that creates the “smarts” is fundamentally different than the way most engineers are used to working. It’s by necessity more iterative, and it requires different analytical techniques than either the traditional engineering methods they’re used to or the methods that work on machine logs and slower time series.

What is high-sample-rate data?

But let’s start by being clear about what we’re talking about: high-sample-rate sensor data includes things like sound (8 kHz – 44kHz), accelerometry (25 Hz and up), vibration (100Hz on up to MHz), voltage and current, biometrics, and any other kind of physical-world data that you might think of as a waveform. With this kind of data, you are generally out of the realm of the statistician, and firmly in the territory of the signal processing engineer.

Machine logs and slower time series (eg pressure and temperature once per minute) can be analyzed effectively using both statistical and machine learning methods intended for time series data. But these higher-sample-rate datasets are much more complex, and these basic tools just won’t work. One second of sound captured at 14.4kHz contains 14,400 data points, and the information it contains is more than just a statistical time series of pressure readings. It's a physical wave, with all of the properties that come along with physical waves, including oscillations, envelopes, phase, jitter, transients, and so on.

machine learning data waveform sample 300Hz

1 second of data captured at 300Hz - at this speed, it becomes possible to see the underlying vibration (a fan turning at around 70 revs per second).

machine learning data waveform sample at 50Hz

The same 1 second of data sampled at 50Hz

machine learning data waveform sample at 10 hz

The same 1 second of data captured at 10Hz

It’s all about the features

For machine learning, this kind of data also presents another problem – high dimensionality.  That one second of sound with 14,400 points, if used raw, is treated by most machine learning methods as a single vector with 14,400 columns. With thousands, let alone tens of thousands of observations, most machine learning algorithms will choke. Deep Learning (DL) methods offer a way of dealing with this high dimensionality, but the need to stream real-time, high-sample-rate data to a deep learning cloud service leaves DL impractical for many applications.

So to apply machine learning, we compute “features” from the incoming data that reduce the large number of incoming data points to something more tractable. For data like sound or vibration, most engineers would probably try selecting peaks from a Fast Fourier Transform (FFT) – a process that reduces raw waveform data to a set of coefficients, each representing the amount of energy contained in a slice of the frequency spectrum. But there is a wide array of options available for feature selection, each more effective in different circumstances. For more on features, see our blog called "It's all about the features".

But this is about collecting data

But this post is really about collecting data – in particular about collecting data from high-sample-rate sensors for use with machine learning. In our experience, data collection is the most expensive, most time-consuming part of any project. So, it makes sense to do it right, right from the beginning.

Here are our five top suggestions for data collection to make your project successful:

1. Collect rich data

Though it may be difficult to work with directly, raw, fully-detailed, time-domain input collected by your sensor is extremely valuable. Don’t discard it after you’ve computed an FFT and RMS – keep the original sampled signal. The best machine learning tools available (like our Reality AI) can make sense out of it and extract the maximum information content.  For more on why this is important, see our blog post on “Rich Data, Poor Data”.

2.  Use the maximum sample rate available, at least at first

It takes more bandwidth to transmit and more space to store, but it's much easier to downsample in software than to go back and re-collect data to see if a higher sample rate will help improve accuracy. Really great tools for working with sensor data (like our Reality AI) will let you use software to downsample repeatedly and explore the relationship between sample-rate and model accuracy. If you do this with your early data, once you have a preliminary model in place you can optimize your rig and design the most cost-effective solution for broader deployment later, knowing that you’re making the right call.

3.  Don’t over-engineer your rig

Do what’s easiest first, and what’s more expensive once you know it’s worth it. If one is available to support your use case, try a prototyping device for your early data collects to explore both project feasibility and the real requirements for your data collection rig before you commit. There are a number of kits available for IoT device prototypes, but for machine learning projects you might want to consider something like the Reality AI Starter Kit

Bosch XDK

The Reality AI Starter Kit is an all-inclusive kit for getting started with an accelerometry project. A new version that also supports sound data is coming soon. 
Learn more about Reality AI Starter Kit

4. Plan your data collect to cover all sources of variation

Successful real-world machine learning is an exercise in overcoming variation with data. Variation can be related both to the target (what you are trying to detect) and to the background (noise, different environments and conditions) as well as to the collection equipment (different sensors, placement, variations in mounting). Minimize any unnecessary variation – usually variation in the equipment is the easiest to eliminate or control – and make sure you capture data that gets as much of the likely real-world target variation in as many different backgrounds as possible. The better you are at covering the gamut of background, target and equipment variation, the more successful your machine learning project will be – meaning the better it will be able to make accurate predictions in the real world.

5. Collect iteratively

Machine learning works best as an iterative process. Start off by collecting just enough data to build a bare-bones model that proves the effectiveness of the technique, even if not yet for the full range of variation expected in the real world, and then use those results to fine-tune your approach. Engage with the analytical tools early – right from the beginning – and use them to judge your progress. Take the next data you get from the field and test your bare-bones model against it to get an accuracy benchmark. Take note of specific areas where it performs well and performs poorly. Retrain using the new data and test again. Use this to chart your progress and also to guide your data collection – circumstances where the model performs poorly are circumstances where you’ll want to collect more data. When you get to the point where you’re getting acceptable accuracy on new data coming in – what we call “generalizing” – you’re just about done. Now you can focus on model optimization and tweaking to get the best possible performance.


Rich Data, Poor Data: Getting the most out of sensors

“People who advocate simplicity have money in the bank;

the money came first, not the simplicity.” 

― Douglas Coupland, The Gum Thief

Accelerometers and vibration sensors are having their day. As prices have come down drastically, we are seeing more and more companies instrumenting all kinds of devices and equipment.  Industrial, automotive and consumer products use cases are proliferating almost as fast as startups with “AI” in their names.

In many cases, particularly in industrial applications, the purpose of the new instrumentation is to monitor machines in new ways to improve uptime and reduce cost by predicting maintenance problems before they occur. Vibration sensors are an obvious go-to here, as  vibration analysis has a long history in industrial circles for machine diagnosis. 

At Reality AI, we see our industrial customers trying to get results from all kinds of sensor implementations. Many of these implementations are carefully engineered to provide reliable, controlled, ground-truthed, rich data.   And many are not.

Working with accelerometers and vibrations

In vibration data, there are certainly things you can detect by just looking at how much something shakes.  To see how much something is shaking, one generally looks at the amplitudes of the movement and calculates the amount of energy in the movement.   Most often, this means using measures of vibration intensity such as RMS and “Peak-to-Peak”.  Looking at changes in these kinds of measures can usually determine whether the machine is seriously out of balance, for instance, or whether it has been subject to an impact. 

For more subtle kinds of conditions, like identifying wear and maintenance issues, just knowing that a machine is shaking more isn’t enough. You need to know whether it’s shaking differently. That requires much richer information than a simple RMS energy. Higher sample rates are often required, and different measures. Trained vibration analysts would generally go to the Fast Fourier Transform (FFT) to calculate how much energy is present in different frequency bands, typically looking for spectral peaks at different multiples of the rotational frequency of the machine (for rotating equipment, that is; other kinds of equipment are more difficult with Fourier analysis). Other tools, like Reality AI, do more complex transforms based on the actual multidimensional time-waveforms captured directly from the accelerometer.

Figure 1

Figure 1- This example shows a time series of data from an accelerometer attached to a machine in a manufacturing facility.   X, Y and Z components of the acceleration vector are averaged over one second.   There is very little information in this data – in fact, just about all it can tell us is which direction is gravity.   This data was provided from an actual customer implementation, and is basically useless for anomaly detection, condition monitoring, or predictive maintenance.

Figure 2

Figure 2 - This example shows vibration data pre-processed thru a Fast Fourier Transform (FFT) at high-frequency resolution.   The X-axis is frequency and the Y-axis is intensity.  This data is much more useful than Figure 1 - the spikes occurring at multiples of the base rotation frequency give important information about what’s happening in the machine and is most useful for rotating equipment.  FFT data can be good for many applications, but it discards a great deal of information from the time-domain. It shows only a snapshot in time – this entire chart is an expansion of a single data point from Figure 1. 

Figure 3

Figure 3 - Raw time-waveform data as sampled directly from the accelerometer.  This data is information-dense – being the raw data from which both the simple averages in Figure 1 and the FFT in Figure 2 were computed.   Here we have frequency information is much more resolution than the FFT, coupled with important time information such as transient and phase.  We also see all of the noise, however, which can make it more difficult for human analysts to use.   But data-driven algorithms like those used by Reality AI extract maximum value from this kind of data.  It holds important signatures of conditions, maintenance issues, and anomalous behavior.

But rich data brings rich problems –  more expensive sensors, difficulty in interrupting the line to install instrumentation, bandwidth requirements for getting data off the local node.  Many just go with the cheapest possible sensor packages, limit themselves to simple metrics like RMS and Peak-to-Peak, and basically discard almost all of the information contained in those vibrations.   Others use sensor packages that sample at higher rates and compute FFTs locally with good frequency resolution, and tools like Reality AI can make good use of this kind of data. Some, however, make the investment in sensors that can capture the original time-waveform itself at high sample rates, and work with tools like Reality AI to get as much out of their data as possible.

It’s not overkill

But I hear you asking “Isn’t that overkill?"   

Do I really need high sample rates and time-waveforms or at least hi-resolution FFT?  Maybe you do.

Are you trying to predict bearing wear in advance of a failure?   Then you do.

Are you trying to identify subtle anomalies that aren’t manifested by large movements and heavy shaking?   Then you do too.

Is the environment noisy?  With a good bit of variation both in target and background?  Then you really, really do.

Rich data, Poor data 

Time waveform and high-resolution FFT are what we describe as “rich data.”  There’s a lot of information in there, and they give analytical tools like ours which look for signatures and detect anomalies a great deal to work with.   They make it possible to tell that, even though a machine is not vibrating “more” than it used to, it is vibrating “differently.”

RMS and Peak-to-Peak kinds of measures, on the other hand, are “poor data.” They don’t tell you much, and discard much of the information necessary to make the judgements that you most want to make. They’re basically just high-level descriptive statistics that discard almost all the essential signature information you need to find granular events and conditions that justify the value of the sensor implementation in the first place.  And as this excellent example from another domain shows, descriptive statistics just don’t let you see the most interesting things.   

Figure 4

Figure 4 – Why basic statistics are never enough.  All of these plots have the same X and Y means, the same X and Y standard deviations, and the same X:Y correlation. With just the averages, you’d never see any of these patterns in your data. (source:

In practical terms for vibration analysis, what does that mean?  It means that by relying only on high-level descriptive statistics (poor data) rather than the time and frequency domains (rich data), you will miss anomalies, fail to detect signatures, and basically sacrifice most of the value that your implementation could potentially deliver. Yes, it may be more complicated to implement. It may be more expensive. But it can deliver exponentially higher value.  

Reality AI Starter Kit using Bosch XDK

Try Reality AI Starter Kit

Includes Bosch XDK sensor modules 

+ 2 months access to Reality AI Tools™

> Collect accelerometer and vibration data 

> Detect anomalies, or create labeled classes

> Create detectors for specific events and conditions

> Explore relationship between sample rate, detection window, computational complexity, and detection accuracy

> Determine hardware requirements for embedded AI solutions

Learn more on Starter Kit
Baby blue eye

It’s all about the features

We’re an AI company, so people always ask about our algorithms. If we could get a dollar for every time we’re asked about which flavor of machine learning we use –convolutional neural nets, K-means, or whatever – we would never need another dollar of VC investment ever again.

But the truth is that algorithms are not the most important thing for building AI solutions -- data is. Algorithms aren’t even #2. People in the trenches of machine learning know that once you have the data, It’s really all about “features.”

In machine learning parlance, features are the specific variables that are used as input to an algorithm. Features can be selections of raw values from input data, or can be values derived from that data. With the right features, almost any machine learning algorithm will find what you’re looking for. Without good features, none will. And that's especially true for real-world problems where data comes with lots of inherent noise and variation.

With the right features, almost any machine learning algorithm will find what you’re looking for.

Without good features, none will.

My colleague Jeff (the other Reality AI co-founder) likes to use this example: Suppose I’m trying to detect when my wife comes home. I’ll take a sensor, point it at the doorway and collect data. To use machine learning on that data, I’ll need to identify a set of features that help distinguish my wife from anything else that the sensor might see. What would be the best feature to use? One that indicates, “There she is!” It would be perfect -- one bit with complete predictive power. The machine learning task would be rendered trivial.

If only we could figure out how to compute better features directly from the underlying data… Deep Learning accomplishes this trick with layers of convolutional neural nets, but that carries a great deal of computational overhead. There are other ways.

At Reality AI, where our tools create classifiers and detectors based on high sample rate signal inputs (accelerometry, vibration, sound, electrical signals, etc) that often have high levels of noise and natural variation, we focus on discovering features that deliver the greatest predictive power with the lowest computational overhead. Our tools follow a mathematical process for discovering optimized features from the data before worrying about the particulars of algorithms that will make decisions with those features. The closer our tools get to perfect features, the better end results become. We need less data, use less training time, are more accurate, and require less processing power. It's a very powerful method.

Features for signal classification

For an example, let’s look at feature selection in high-sample rate (50Hz on up) IoT signal data, like vibration or sound. In the signal processing world, the engineer’s go-to for feature selection is usually frequency analysis. The usual approach to machine learning on this kind of data would be to take a signal input, run a Fast Fourier Transform (FFT) on it, and consider the peaks in those frequency coefficients as inputs for a neural network or some other algorithm.

Why this approach? Probably because it’s convenient, since all the tools these engineers use support it. Probably because they understand it, since everyone learns the FFT in engineering school. And probably because it’s easy to explain, since the results are easily relatable back to the underlying physics. But the FFT rarely provides an optimal feature set, and it often blurs important time information that could be extremely useful for classification or detection in the underlying signals.

Take for example this early test comparing our optimized features to the FFT on a moderately complex, noisy group of signals. In the first graph below we show a time-frequency plot of FFT results on this particular signal input (this type of plot is called a spectrogram). The vertical axis is frequency, and the horizontal axis is time, over which the FFT is repeatedly computed for a specified window on the streaming signal. The colors are a heat-map, with the warmer colors indicating more energy in that particular frequency range.

Time frequency plot showing features based on FFT

Time-frequency plot showing features based on FFT

Time frequency plot showing features based on Reality AI

Time-frequency plot showing features based on Reality AI

Compare that chart to one showing optimized features for this particular classification problem generated using our methods.  On this plot you can see what is happening with much greater resolution, and the facts become much easier to visualize.  Looking at this chart it’s crystal clear that the underlying signal consists of a multi-tone low background hum accompanied by a series of escalating chirps, with a couple of other transient things going on.   The information is de-blurred, noise is suppressed, and you don’t need to be a signal processing engineer to understand that the detection problem has just been made a whole lot easier.


There’s another key benefit to optimizing features from the get go – the resulting classifier will be significantly more computationally efficient.  Why is that important?  It may not be if you have unlimited, free computing power at your disposal.  But if you are looking to minimize processing charges, or are trying to embed your solution on the cheapest possible hardware target, it is critical.   For embedded solutions, memory and clock cycles are likely to be your most precious resources, and spending time to get the features right is your best way to conserve them.

Deep Learning and Feature Discovery

At Reality AI, we have our own methods for discovering optimized features in signal data (read more about our Technology), but ours are not the only way.

As mentioned above, Deep Learning (DL) also discovers features, though they are rarely optimized. Still, DL approaches have been very successful with certain kinds of problems using signal data, including object recognition in images and speech recognition in sound. It can be a highly effective approach for a wide range of problems, but DL requires a great deal of training data, is not very computationally efficient, and can be difficult for a non-expert to use. There is often a sensitive dependence of classifier accuracy on a large number of configuration parameters, leading many of those who work with DL to focus heavily on tweaking previously used networks rather than focusing on finding the best features for each new problem. Learning happens “automatically”, so why worry about it?

My co-founder Jeff (the mathematician) explains that DL is basically “a generalized non-linear function mapping – cool mathematics, but with a ridiculously slow convergence rate compared to almost any other method.” Our approach, on the other hand, is tuned to signals but delivers much faster convergence with less data. On applications for which Realty AI is a good fit, this kind of approach will be orders of magnitude more efficient than DL.

The very public successes of Deep Learning in products like Apple’s Siri, the Amazon Echo, and the image tagging features available on Google and Facebook have led the community to over-focus a little on the algorithm side of things. There has been a tremendous amount of exciting innovation in ML algorithms in and around Deep Learning. But let's not forget the fundamentals.

It’s really all about the features.

Stuart Feffer interview at TechCrunch Disrupt

A Really Good Week at Reality AI

Usually, we try to keep our blog’s focused on substance related to the intersection between sensors, signal processing, and artificial intelligence.

But this time we’re making an exception and taking the opportunity to toot our horn a little.

We’ve had a really good week at Reality AI last week, and we want to tell you about it.

$1.7 million Seed Round

Last Monday morning, we announced that our Seed investment round has closed and that we had raised just over $1.7 million. The round was oversubscribed in the end, and we extended the round twice to accommodate some terrific investors who wanted to participate.

One of those was the TechNexus Venture Collaborative, a firm out of Chicago that works with both startups and corporate innovation groups. Far more important to us than the investment dollars will be engagement with the corporate network that TechNexus brings to the table. We’re very excited to be working with them. 

You can read coverage of our round in Alley Watch and Sensors Magazine.

Under the spotlights at TechCrunch Disrupt

Later on Monday, we were exhibiting at TechCrunch Disrupt NY on the AI Pavilion. If you don’t know about Disrupt, it's the premier event for showcasing startup activity, and as New Yorkers we were very happy to be on home turf. The TechCrunch media team visited our booth and did an interview, released on Wednesday, which you can see below. 

We have a winner!

Fantastic as that was, what happened on Thursday was better still.

While the New Yorkers were at Disrupt NY, Nalin Balan (our guy on the ground and Head of Business Development for Silicon Valley) was exhibiting at Internet of Things World in Santa Clara – one of the main events for all things IoT.

IoT World sponsors a startup pitch competition in conjunction with Project Kairos, looking for the most innovative startups active in the Internet of Things. Nalin presented Reality AI at the Project Kairos competition, made the finals out of more than 100 startups, and WON!

Reality AI was awarded the “Innovation of Things ” award at IoT World. 

Screen Shot 2017-11-07 at 15.35.47

We were looking for the innovative and potentially most valuable (company)” said Eric Winsborrow, spokesperson for the panelist of judges and CEO of Distrix Networks, the 2016 Innovation of Things Award winner.

“What we liked about (Reality AI) the most is that it had a huge impact. It was really game-changing, yet the approach was simple enough. It essentially allowed all the blue sky without boiling the ocean.”

We’re so proud! Proud of the hard work by our team, and proud of the incredible customer engagement that has allowed us to build such a great product.

Thanks for letting us brag a little. Back to the substance in our next blog post. We promise.


Machine Learning: the Lab vs the Real World

"In theory there's no difference between theory and practice. In practice there is." 

-- Yogi Berra

Not long ago, TechCrunch ran a story reporting on Carnegie Mellon research showing that an “Overclocked smartwatch sensor uses vibrations to sense gestures, objects and locations.” These folks at the CMU Human-Computer Interaction Institute had apparently modified a smartwatch OS to capture 4 kHz accelerometer waveforms (most wearable devices capture at rates up to 0.1 kHz), and discovered that with more data you could detect a lot more things.  They could detect specific hand gestures, and could even tell a what kind of thing a person was touching or holding based on vibrations communicated thru the human body.  (Is that an electric toothbrush, a stapler, or the steering wheel of a running automobile?”)


To those of us working in the field, including those at Carnegie Mellon, this was no great revelation. “Duh!  Of course, you can!”  It was a nice-but-limited academic confirmation of what many people already know and are working on.  TechCrunch, however, in typical breathless fashion, reported as if it were news.  Apparently, the reporter was unaware of the many commercially available products that perform gesture recognition (among them Myo from Thalmic Labs, using its proprietary hardware, or some 20 others offering smartwatch tools).  It seems he was also completely unaware of commercially available toolkits for identifying very subtle vibrations and accelerometry to detect machines conditions in noisy, complex environments (like our own Reality AI for Industrial Equipment Monitoring), or to detect user activity and environment in wearables (Reality AI for Consumer Products).

But my purpose is not to air sour grapes over lazy reporting.   Rather, I’d like to use this case to illustrate some key issues about using machine learning to make products for the real world: Generalization vs Overtraining, and the difference between a laboratory trial (like that study) and a real-world deployment.


Generalization and Overtraining

Generalization refers to the ability of a classifier or detector, built using machine learning, to correctly identify examples that were not included in the original training set.   Overtraining refers to a classifier that has learned to identify with high accuracy the specific examples on which it was trained, but does poorly on similar examples it hasn't seen before.  An overtrained classifier has learned its training set “too well” – in effect memorizing the specifics of the training examples without the ability to spot similar examples again in the wild.  That’s ok in the lab when you’re trying to determine whether something is detectable at all, but an overtrained classifier will never be useful out in the real world.

machine learning post

Illustration from the CMU study using vibrations captured with an overclocked smartwatch to detect what object a person is holding.

Typically, the best guard against overtraining is to use a training set that captures as much of the expected variation in target and environment as possible. If you want to detect when a type of machine is exhibiting a particular condition, for example, include in your training data many examples of that type of machine exhibiting that condition, and exhibiting it under a range of operating conditions, loads, etc.  

It also helps to be very skeptical of “perfect” results. Accuracy nearing 100% on small sample sets is a classic symptom of overtraining.

It’s impossible to be sure without looking more closely at the underlying data, model, and validation results, but this CMU study shows classic signs of overtraining.   Both the training and validation sets contain a single example of each target machine collected under carefully controlled conditions. And to validate, they appear to use a group of 17 subjects holding the same single examples of each machine.  In a nod to capturing variation, they have each subject stand in different rooms when holding the example machines, but it's a far cry from the full extent of real-world variability. Their result has most objects hitting 100% accuracy, with a couple of objects showing a little lower.

Small sample sizes.  Reuse of training objects for validation.  Limited variation.  Very high accuracy... Classic overtraining.

Detect overtraining and predict generalization

It is possible to detect overtraining and estimate how well a machine learning classifier or detector will generalize.  At Reality AI, our go-to diagnostic is the K-fold Validation, generated routinely by our tools.

K-fold validation involves repeatedly 1) holding out a randomly selected portion of the training data (say 10%), 2) training on the remainder (90%), 3) classifying the holdout data using the 90% trained model, and 4) recording the results.   Generally, hold-outs do not overlap, so, for example, 10 independent trials would be completed for a 10% holdout.  Holdouts may be balanced across groups and validation may be averaged over multiple runs, but the key is that in each iteration the classifier is tested on data that was not part of its training.  The accuracy will almost certainly be lower than what you compute by applying the model to its training data (a stat we refer to as “class separation”, rather than accuracy), but it will be a much better predictor of how well the classifier will perform in the wild – at least to the degree that your training set resembles the real world.

Counter-intuitively, classifiers with weaker class separation often hold up better in K-fold.  It is not uncommon that a near perfect accuracy on the training data drops precipitously in K-fold while a slightly weaker classifier maintains excellent generalization performance.  And isn’t that what you’re really after?  Better performance in the real world on new observations?

Getting high-class separation, but low K-fold? You have a model that has been overtrained, with poor ability to generalize.   Back to the drawing board.  Maybe select a less aggressive machine learning model, or revisit your feature selection.  Reality AI does this automatically. 

Be careful, though, because the converse is not true:  A good K-fold does not guarantee a deployable classifier.  The only way to know for sure what you've missed in the lab is to test in the wild.  Not perfect?  No problem: collect more training data capturing more examples of underrepresented variation.  A good development tool (like ours) will make it easy to support rapid, iterative improvements of your classifiers.

Lab Experiments vs Real World Products

Lab experiments like this CMU study don’t need to care much about generalization – they are constructed to illustrate a very specific point, prove a concept, and move on. Real-world products, on the other hand, must perform a useful function in a variety of unforeseen circumstances. For machine learning classifiers used in real-world products, the ability generalize is critical.

But it's not the only thing. Deployment considerations matter too. Can it run in the cloud, or is it destined for a processor-, memory- and/or power-constrained environment?  (To the CMU guys – good luck getting acceptable battery life out of an overclocked smartwatch!) How computationally intensive is the solution, and can it be run in the target environment with the memory and processing cycles available to it?  What response-time or latency is acceptable?  These issues must be factored into a product design, and into the choice of machine-learning model supporting that product.

Tools like Reality AI can help. R&D engineers use Reality AI Tools to create machine learning-based signal classifiers and detectors for real-world products, including wearables and machines and can explore connections between sample rate, computational intensity, and accuracy. They can train new models and run k-fold diagnostics (among others) to guard against overtraining and predictability to generalize.  And when they’re done, they can deploy to the cloud, or export code to be compiled for their specific embedded environment.

R&D engineers creating real-world products don’t have the luxury of controlled environments – overtraining leads to a failed product.   Lab experiments don’t face that reality.  Neither do TechCrunch reporters.


How to Succeed with Machine Learning

At Reality AI we see a lot of machine learning projects that have failed to get results, or are on the edge of going off the rails. Often, our tools and structured approach can help, but sometimes not.

Here are 3 ways to succeed with machine learning:

Number 1: Get ground truth.

Machine learning isn’t a magic wand, and it doesn’t work by telepathy.   Algorithms need data and examples of what it is trying to detect, as well as examples of what it is not trying to detect, so that it can tell the difference.    This is particularly true of “supervised learning” algorithms, where the algorithm must train on sufficient numbers of examples in order to generate results.  But it also applies to “unsupervised learning” algorithms, which attempt to discover hidden relationships in data without being told ahead of time, as well.  If relationships of interest don’t exist in the data, no algorithm will find them. 

Number 2:  Curate the data. 

Data should be clean and well curated.  Meaning that to get the best results, it is important to have faith in the quality of the data. Misclassifications in training data can be particularly damaging in supervised learning situations  -- some algorithms (like ours) can compensate for occasional miss-classifications in training data, but pervasive problems can be hard to overcome.  

Number 3:  Don't Overtrain.

Overtraining is a situation where a machine learning model can predict training examples with very high accuracy but which cannot generalize to new data, leading to poor performance in the field. Usually, this is a result of too little data, or data that is too homogenous (ie does not truly reflect natural variation and confounding factors that will be present in deployment), but it can also result from poor tuning of the model.

Overtraining can be particularly pernicious, as it can lead to false optimism and premature deployment, resulting in a visible failure that could easily have been avoided. At Reality AI, our AI engineers oversee and check customer’s model configurations to prevent this unnecessary pitfall.

Example:  AI for machine health and preventative maintenance 

(Names and details have been changed to protect the inexperienced.)

For example, we recently had a client trying to build a machine health monitoring system for a refrigerant compressor.  These compressors were installed in a system subject to rare leaks, and they were trying to detect in advance when refrigerant in the lines has dropped to a point that put the compressor at risk -- before it causes damage, overheats, or shuts down through some other mechanism.  They were trying to do this via vibration data, using a small device containing a multi-axis accelerometer sensor mounted on the unit. 

Ideally, this client would have collected a variety of data with the same accelerometer under known conditions:  including many examples of the compressor running in a range of normal load conditions, and many examples of the compressor running under adverse low refrigerant conditions in a similar variety of loads. They could then use our algorithms and tools in confidence that the data contains a broad representation of the operating states of interest, including normal variations as load and uncontrolled environmental factors change. It would also contain a range of different background noises and enough samples so that the sensor and measurement noise is also well represented. 

But all they had was 10 seconds of data of a normal compressor and 10 seconds with low refrigerant collected in the lab.  This might be enough for an engineer to begin to understand the differences in the two states -- and a human engineer working in the lab might use his or her domain knowledge about field conditions to begin extrapolating how to detect those differences in general.  But a machine learning algorithm knows only what it sees.   It would make a perfect separation between training examples, showing a 100% accuracy in classification, but that result would never generalize to the real world. In order to consider all the operational variation possible, the most reliable approach is to include examples in the data of a full range of conditions, both normal and abnormal, so that the algorithms can learn by example and tune themselves to the most robust decision criteria. 

Reality AI tools automatically do this by using a variety of methods for feature discovery and model selection. To help detect and avoid overtraining, our tools also test models with “K-fold validation,” a process that repeatedly retrains, but holds out a portion of the training data to for testing.  This simulates how the model will behave in the field, when it attempts to operate on new observations it had not trained on.  K-fold accuracy is almost never as high as training separation accuracy, but it’s a better indicator of likely real-world performance – at least to the degree that the training data is representative of the real world.  

To understand our machine learning tools more fully and how they can be applied to your data, read our Technology Page.

Keep calm and use data wisely

Model-Driven vs Data-Driven methods for Working with Sensors and Signals

There are two main paradigms for solving classification and detection problems in sensor data:  Model-driven, and Data-driven.

Model-Driven is the way everybody learned to do it in Engineering School.   Start with a solid idea of how the physical system works -- and by extension, how it can break. Consider the states or events you want to detect and generate a hypothesis about what aspects of that might be detectable from the outside and what  the target signal will look like. Come collected samples in the lab and try to confirm a correlation between what you record and what you are trying to detect. Then engineer a detector by hand to find those hard-won features out in the real world, automatically.

Data-Driven is a new way of thinking, enabled by machine learning. Find an algorithm that can spot connections and correlations that you may not even know to suspect.  Turn it loose on the data.  Magic follows.  But only if you do it right.

Both of these approaches have their pluses and minuses:

Model-Driven approaches limit complexity

Model-driven approaches are powerful because they rely on a deep understanding of the system or process, and can benefit from scientifically established relationships. Models can’t accommodate infinite complexity and generally must be simplified. They have trouble accounting for noisy data and non-included variables.  At some level they’re limited by the amount of complexity their inventors can hold in their heads.

pix 3

Model-Driven is expensive and takes time

Who builds models? The engineers that understand the physical, mechanical, electronic, data flow, or other appropriate details of the complex system -- in-house experts or consultants that work for a company and develop its products or operational machinery. These are generally experienced experts, very busy, and are both scarce and expensive resources. 

Furthermore, modeling takes time.  It is inherently a trial-and-error approach, rooted in the old scientific method of theory-based hypothesis formation and experiment-based testing. Finding a suitable model and refining it until it produces the desired results is often a lengthy process.


Data-Driven is Data Hungry

Data-Driven approaches based on machine learning require a good bit of data to get decent results.   AI tools that discover features and train-up classifiers learn from examples, and there need to be enough examples to cover the full range of expected variation and null cases.  Some tools (like our Reality AI) are powerful enough to generalize from limited training data and discover viable feature sets and decision criteria on their own, but many machine learning approaches require truly Big Data to get meaningful results and some demand their own type of experts to set them up.

Reality AI tools are data-driven machine learning tools optimized for sensors and signals. 

To learn more about our data-driven methods visit our Technology page and download our technical white-paper.


Will engineers use AI to replace themselves?

A piece in IEEE Spectrum asked this question recently: Are engineers designing their own robotic replacements?  There's no question that AI is transforming many engineering disciplines.  Control engineering and manufacturing optimization have seen a number of new tools come out that are likely to change industrial practice significantly. And I frequently describe our own Reality AI as an artificial intelligence substitute or supplement for a signal processing engineer working on sensors and signals. 

AI stimulates demand, not suppresses it

But I think most practicing engineers have very little to fear. For starters, these tools are far more likely to stimulate demand for engineering skills than to replace engineering jobs. In our area of signal processing and working with sensors, the sheer increase in the number of connected devices and economic activity associated with deploying them will keep engineering teams quite busy for a long time to come, even with all the AI assistance we can give them.    Gartner predicts more than 21 billion connected devices by 2020, and McKinsey says that will create economic opportunity in the trillions. In order to make that happen, AI will have to enable more engineering productivity -- there won't be enough of them to do the job without it.

Computers aren't brains, and vice versa

Plus, there's another important thing to remember. Computers aren't brains, and brains aren't computers.  There's lots of things people can do that machines can't: creativity and social interaction among them.  Oh, some AI systems can do a pretty good impersonation of these things.  But there's something deeper here -- computers are information processors, but people are experiencers.   We do things differently, and do different things well -- and always will.

Take for example the case of a baseball player catching a fly ball quoted in a recent essay "Your Brain does not Process Information and it is not a Computer": The information processing perspective requires the player to formulate an estimate of various initial conditions of the ball’s flight – the force of the impact, the angle of the trajectory, that kind of thing – then to create and analyze an internal model of the path along which the ball will likely move, then to use that model to guide and adjust motor movements continuously in time in order to intercept the ball.  That is all well and good if we functioned as computers do, but McBeath and his colleagues gave a simpler account: to catch the ball, the player simply needs to keep moving in a way that keeps the ball in a constant visual relationship with respect to home plate and the surrounding scenery...  This might sound complicated, but it is actually incredibly simple, and completely free of computations, representations, and algorithms."

Breakthroughs come from experiences, not from algorithms

Ultimately, engineering jobs will follow the path that many other white collar jobs have done as computer technology has encroached on their turf.  Those jobs will become less rote, and there will be fewer of them required per unit of work produced.  But the very act of creating the new technology that pushes this ratio down (also known as increasing productivity), will create the demand for more engineering.

AI in the cloud

AI in the Cloud is the Next Big Thing

A  recent article in the Financial Times called out the growing number of AI services available in the cloud, and the growing conviction at the leadership levels of Google, Microsoft and IBM that cloud-based AI services are the wave of the future.  Enabled by ubiquitous cloud servers, storage, and big data, AI services will be incorporated into programs across the enterprise, in mobile apps, and ...  Well, everywhere.  "The next great disrupter," said the FT.  As big a disrupter as electricity or the steam engine says a well-known professor at MIT. But are the tools available today really that transformative?  What about what's coming next?

Voice and Language

The tools that IBM, Microsoft and Google have made available already are truly game changers -- but are still narrow in scope compared to what is to come.  Speech Recognition and Natural Language Processing have made huge advances in recent years and are now available on several platforms for your UX-Creating pleasure. These are core AI methods behind Siri-like assistants and the new crop of email-reading task-bots like

Computer Vision

Computer vision has also improved significantly.

The big guys and a number of startups like Clarifai and Imagga now offer image tagging services that can ingest images and identify objects or scene composition with tolerable accuracy making visual search much easier and more accurate.  These still have a way to go though, in my opinion, before they are truly disruptive or transformative.

Data Analysis

And then there are a host of services offering cloud-based data analysis aimed at analyzing large amounts of data for specific vertical applications.  Amazon has exposed many of the algorithms developed for their own use as external services.  Google has opened up their AI development tools to encourage others to develop services offering AI in the cloud.

AI in the cloud that does more, with more

The next generation of cloud-based AI services is about to arrive - and it will include many diverse services that go far beyond the relatively narrow selection currently on offer.  Our own Reality AI is a good example -- cloud-based AI services for sensor data that does the work of a signal-processing engineer.   Using Reality AI, product and application developers can train up algorithms to spot complex vibration signatures (eg inside industrial equipment), spot specific sounds despite overwhelming background noise, identify both complex and simple motions using accelerometer data, even work with AC power and RF signals.

Other sources of transformative AI services in the cloud include a new startup from NYU's Gary Marcus that looks to use insights from how children learn to help AI generalize about the world the way people do, and another startup with a sort of a meta-AI tool that looks at the available data, automatically figures out what kind of predictive model will work best, then builds it.

We are only at the beginning....

Get exclusive content!

Copyright 2018 Reality AI ©  All Rights Reserved