A DOE is a series of tests in which purposeful changes are made to the input variables
to investigate their effect upon the output responses and to get an understanding of the global behavior
of a design problem.
A primary desire when creating a Fit is to construct it with high predictive accuracy. HyperStudy provides several metrics which can be used to quantitatively judge the quality of a Fit. Selecting a Fit based on observing how the metrics perform on the input data is simple, but may result in overfitting the model.
Singularities in a Fit matrix indicates that there is insufficient data to properly solve the posed problem. A singular matrix means that
it cannot be inverted properly, which is similar to dividing by zero in scalar problems.
An Optimization is a mathematical procedure used to determine the best design for a set of given constraints, by changing
the input variables in an automatic manner.
A Stochastic approach is a method of probabilistic analysis where the input variables are defined by a probability
distribution, and consequently the corresponding output responses are not a single
deterministic value, but a distribution.
A primary desire when creating a Fit is to construct it with high predictive accuracy. HyperStudy provides several metrics which can be used to quantitatively judge the quality of a Fit. Selecting a Fit based on observing how the metrics perform on the input data is simple, but may result in overfitting the model.
A primary desire when creating a Fit is to construct it
with high predictive accuracy. HyperStudy provides several metrics which
can be used to quantitatively judge the quality of a Fit.
Selecting a Fit based on observing how the metrics perform on
the input data is simple, but may result in overfitting the model.
Tip: These metrics are presented in the Post Processing step, Diagnostic tab
of the Fit.
Overfitting describes the phenomena of a Fit with very high
input data diagnostics, but the Fit results in inaccurate
predictions when presented with new data. Essentially, the model has been tuned to be
too specific to the exact input data.
To avoid overfitting, a Fit is trained with three
conceptually unique sets of data. Input data is used to build a Fit, validation data is used to tune and compare different
Fit options, and the testing data is used in a final
step to quantify the predictive ability to unseen data.
Note: Test data is never used in
the construction and tuning of the Fit.
In
HyperStudy, testing data is optional and the validation data is
automatically constructed from the input data using a technique known as k-fold cross
validation.
This technique begins with the input data and segments it into multiple folds (or
groups). Imagine having 10 data points and 3 folds, the folding may look like:
Fold #
Run #
1
1,4,7,10
2
2,5,8
3
3,6,9
A fold is first withheld and a response surface is built using the remaining data.
The prediction is then tested on data from the withheld fold. In this example, a
Fit is first built using folds 2 and 3 and tested on
fold 1. Next, it is built data from folds 1 and 3, while predicted on fold 2. This
process continues for each fold. When this process is completed, the predictions on the
folded data sets are compared to their known values and traditional diagnostic measures
can be evaluated. Selecting a Fit based on cross-validation metrics is good practice to
ensure a balance between predictive accuracy and avoiding overfitting. The size of the
cross-validation folds can be set via the Cross-Validation option (accessed in the
Evaluate step of the Fit); the method Fit Automatically Selected by Training calculates an internal fold size to ensure efficiency.