By the end of this chapter, your serverless application should do the following:
Data science and machine learning are enormous domains in their own right. It is far beyond the scope of this module to teach the basics of how machine learning model training works. Fortunately, the tool chain for creating new models has been simplified to the point that we can use that tool chain to experiment with ML just by knowing enough about our data.
In this solution, you are trying to replace a simple threshold in the IoT Core rules engine that evalutes the sound level reported by the device and coerces a new Boolean key-value called roomOccupancy. You know from looking at the sound data in the reported messages that the values are low when it is quiet and higher when there is noise. That’s how you know a simple threshold like “greater than 10” was an okay starting place to generate the roomOccupancy value. (In your specific case, a different threshold for ambient noise versus registered activity may have been more appropriate!)
You will apply a similar approach by giving the machine learning tool chain a sample of thermostat data recorded from the room where your device is deployed and telling the training job “this is what the data looks like when the roomOccupancy should be true, and what it looks like when it is false.” The training job will evaluate your data set, target the roomOccupancy column, and try to build a model that accurately reproduces roomOccupancy values based on the sound level ranges and even the other columns like time, HVAC status, and temperature.
Once your first model is trained, in the following chapter you will_replace_ that simple static threshold on the sound level with the inferred classification of *roomOccupancy* as determined by your model!
First, you will set up Amazon SageMaker Studio in order to configure a new experiment for automatic model training.
Once your SageMaker Studio has finished provisioning, the next step is to open your Studio and configure a new project.
Once the project is created, you will see a project dashboard with tabs like Repositories, Pipelines, Experiments, and so on. Leave this browser tab open to SageMaker Studio so you can quickly return to this page.
The next task is to return to AWS IoT Analytics so you can export the aggregated thermostat data for use by your new ML project.
sagemaker-project-p-somehashhere. If there are multiple buckets named like this, you’ll need to check the SageMaker Studio project for the random hash ID of your project. You can see the hash in other resources of your project like the Repositories and Pipelines tabs.
You are now ready to start your ML experiment back in SageMaker Studio. An experiment will use the reported thermostat data that was just exported by your IoT Analytics data set as inputs. You will configure the experiment to look for ways to accurately predict the existing roomOccupancy column. The automatic training job will analyze your data for relevant algorithms to try, then run 250 training jobs with varying hyperparameters, selecting the one that gives the best fit to your input training data.
Before starting your ML experiment, you should have several hours of data reported from your thermostat in the room you want to analyze, and in that time the room should have had a mix of active and inactive periods. An automatic ML experiment needs at least 500 rows of data to work, but the more data you bring the better the result will be. If you still need to generate more data before proceeding, don’t forget to re-run the data set in the IoT Analytics console (last step of the previous instruction list) so that those results are available to SageMaker in your project S3 bucket. When you’re ready to start your experiment, read on.
output/smartspaceand choose Use input as S3 object key prefix “output/smartspace”. This defines a new prefix in the S3 bucket that will be used for your output files.
Running the experiment might take minutes to hours. You can follow along the experiment’s progress in the SageMaker Studio browser tab, but it is also safe to close the tab and come back later to check progress.
Once the experiment has concluded, the resultant output is 250 trials that SageMaker used to find the best tuning job parameters. Sort the table of trials to find the one marked Best. The next milestone is to deploy this trial as a model endpoint so that you can invoke it as an API.
Now your machine learning model is deployed as an API endpoint, managed by Amazon SageMaker. In the next chapter, Working with ML models, you will consume the API endpoint with a serverless function and replace the simple threshold logic in the IoT Core rule that determines the
roomOccupancy value with inferences generated by your model.
Leaving this application running beyond 6 hours can result in AWS charges due to the number of requests to the S3 bucket. We recommend you finish this tutorial in that time and perform the cleanup steps to avoid any unwanted costs.
Before moving on to the next chapter, you can validate that your serverless application is configured as intended…
If these are working as expected, let’s move on to Working with ML models.