Venue | Category |
---|---|
LISA'11 | Workload Analysis |
Capacity Forecasting in a Backup Storage Environment1. SummaryMotivation of this paperData Domain Implementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Future Works
Many system administrators already have historical data for their systems and thus can predict full capacity events in advance.
It needs a proactive tool
- predicts the date of full capacity and provides advance notification.
- there seems to be little previous work discussing applications of predictive modeling to data storage environments.
This paper presents the predictive model employed internally at EMC to forecast system capacity.
generate alert notification months before systems reach full capacity.
Customers can configure their Data Domain systems to send an email everyday with detailed diagnostic information.
the historical data enables more effective customer support.
Two variables of capacity forecasting:
- Total physical capacity of the system (changes over time)
- Total physical space used by the system
- This is challenging because behavior changes
- blind application of regression to the entire data set often leads to poor predictions.
- eliminates the influence of the older data and improves the accuracy of the model's predictions.
- applying the regression to a data subset that best represents the most recent behavior.
- the boundary must be determined where the recent behavior begins to deviate.
- "goodness-of-fit" of a linear regression: indicates perfectly linear data
- select the subset with maximum , from .
- the calculated boundary occurs near the discontinuity of the truc function.
- Goodness-of-fit
- positive slope
false positive: hardware changes, software changes from a statistical perspective, it is unknown whether the recent data points are signal or noise.
By requiring more data for models, it can gain higher confidence in their predictions, but reduce the advanced notification for true positives.
- weighted linear regression
- logarithmic regression
- auto-regressive (AR) model
There is an open question whether the remaining systems can be modeled by other methods.