Exploring Amazon Forecast for Workforce Planning

August 7, 2020

By Pointwest Forecast team


Amazon Forecast is a fully managed Machine-Learning-as-a-Service (MLaaS) that AWS is launching in the Asia Pacific region using the same technology behind their e-commerce site, amazon.com. This makes forecasting more accessible and simpler to business decision makers as applied in Retail, Utilities and Services industry use cases.


For this exploration, Pointwest chose the specific use case of coming up with a utilization forecast model for Workforce Planning. We worked with a local Service company to develop an accurate, automated, and predictable budget forecasting and resource fulfillment based on historical and future supply and demand data. 


While the exercise reveals (a) gaps between dimensions linking workforce demand and supply, (b) disconnect of the granularity at which the information is captured; and (c) lack of sustainability of data capture to support the Workforce Planning process, the resulting utilization forecast models showed insights that invites asking the right questions on matters such as:

  • Factors affecting workforce utilization, 
  • The skill gaps based on changing market demand dynamics and 
  • the existence of jobs with hard-to-find skills but attractive service rates, and
  • What services really sell and with more predictable utilization factors .


The problem


Our client relies on manual projections for their workforce planning and forecasting process effort. Although past trends of human resource demand and supply are considered, the assumptions verge more on the desired outcome and not on what is actually likely to happen . It is still mainly dependent on intuition, which, although developed through constant exposure to trends, is subject to profound error. 


The solution


Unlike projection, a forecast is based on assumptions that reflect specific fact patterns. This gives a much more accurate representation of the expectations for future events. AWS Forecast makes this possible even to those with no prior experience with machine learning.


For this use case, we automated the forecasting process so that the predictions can easily be updated and, if necessary, the model can be re-trained

Fig1. Diagram of overall forecasting process.

We utilized different AWS services to carry out the forecast training, validation, and inference tasks. The forecast horizon is set to 12 week, and AutoML was used to automate the selection of the forecast model.


Fig2 – Utilization predictions for a Project 327 shown here against the actual utilization (blue line)


Fig3 – Utilization prediction for a Project 264, shown here against the actual utilization (blue line)


The results yielded by AWS Forecast were promising, with Deep AR+ as the best performing algorithm that emerged. The mean average percent error was 43%, meaning the forecast values in terms of resource utilization were, on average, within 43% from the actual values. 


The percent error begs the questions:

  •  What service execution factor or internal/external forces that caused the deviation, or 
  • Were these due to the assumptions made in training the model for this initial exploration


This reflection opens opportunities for improving the accuracy of the forecast by identifying these missing factors and data points.


Training Data


For this exercise, the training data includes the (1) Supply data – in this case the available man hours of the workforce, (2) Demand data – Service Orders as captured from the CRM and expressed as work unit planned utilization hours by job category, (3) the actual utilization as these Service Orders materialize as projects, and (4) a list of reference data points to tie them all up. Assumptions were made in the input datasets to fill in the gaps.


The team used 2.25 years worth of data from 2018 to March 2020, grouping the data points into weekly work-hours as the granularity unit of utilization.


Fig4 – The source datasets


AWS Forecast specifies the use of three datasets: time-series (required), related time-series (optional), and item metadata (optional).  The Time Series dataset defines the target field that we want to generate forecasts for. 


The Supply data is fed to the time-series dataset. Related time-series dataset contains data that is related to the target time series datain this case, the Demand data or the Service Orders extracted from the client’s CRM. Lastly, item metadata contains categorical dimensions that give valuable context for the items in the time-series dataset such as service types, job codes, work unit, client groupings to name a few. It provides information that does not change through time.


The Forecasting Process


In order to automate the whole process, we used the following supporting AWS Services, apart from AWS Forecast as shown in the diagram:

  • Data Preparation/Input Automation
    • AWS S3 for Storage services of the input datasets
    • AWS Step, Lambda, Cloudwatch and Cloudtrail services to check uploaded input files, and trigger the appropriate forecast functions 


Fig5 – Supporting AWS Services


  • Forecast Automation Proper
    • AWS Step and Lambda functions to create the dataset for use by AWS Forecast and validating the forecast with external data, calculating the Root Mean Squared Error (RMSE) for P10, P50, P10, and P90 forecast values
    • AWS Forecast service for both new and forecast update
    • S3 to save the output text file


  • Visualization of Data and Forecast Consumption
    • AWS SPICE to merge the 4 separate datasets (Target, Related, Metadata, Forecasts)
    • AWS Quicksight for interactive visualization of the results


Conclusion and Insight 


AWS Forecast truly makes forecasting easier but there are still some struggles in the initial process. The two major problems faced in this project are data preparation and automation of the end-to-end forecast process. 


AWS Forecast is very specific when it comes to the data to be used.  Currently, the only accepted format is CSV that, preferably, has no column names. The column names will be inserted through a schema during the CreateDataset procedure and they must also be changed to specific names like ‘item_id’, and ‘target_value’. 


Also, some models have data requirements specific to AWS Forecast which were not obvious initially. One example of this, Deep AR+ requires all combinations of item id and forecast dimension to have values in all dates involved in the data. 


The problem regarding the automation of AWS Forecast can be illustrated by our forecasting process. Automating the whole process, from importing data to exporting forecasts, requires other software like Python or AWS services like Step and Lambda.


Although it has its shortcomings, AWS Forecast still delivers a more efficient, sustainable  and manageable forecasting managed service for its intended users. 


Creating dataset groups that have everything essential to a forecast makes tracking results more effortless. A dataset group contains the datasets, the forecasts, and the predictor (trained model). 


Model training in AWS Forecast is made easier and is custom fit to your data. AutoML compares the results of five different algorithms and selects the best one. It also creates train and test datasets automatically and calculates the error metrics on the “unseen” data. These lessen the workload of the user significantly. As for usability, AWS Forecast, being in the cloud, allows the user to work while the model is training since it doesn’t eat up hardware. On the other hand, the forecast lookup page makes it convenient to get only the necessary values for specific business insights.


On to the next AWS Forecast business use case!