Author Topic: Validating the Surrogate Model (Read 5552 times)

2006garg · « **on:** June 09, 2023, 10:14:06 AM »

Hello all,

I had a question regarding Surrogate Modelling. As part of my project, i am making a Surrogate Model via EE-UQ to get more number of data points.

As input for the training i am using Ground Motion Intensity Parameters PGA, PSA and SAavg. On the output side i have the peak displacement and peak acceleration. I have roughly 60 data points from experimental studies which can be used for the training. However i am leaving out some of the data sets for the validation of the Model.

> My first question is having obtained the model, how can i validate my model with the left over datasets? I mean lets say i say that i have a scatter plot for SAavg vs Peak acc. The model gives me new realizations. How can i find out, for e.x. what value of peak acc would the model have given for a particular SAavg, so that i can compare it at the end to the exact value for the same SAavg?
I tried understanding the same from this research paper (https://onlinelibrary.wiley.com/doi/10.1002/eqe.3839), wherein they considered uncertainties in both physical properties as well as the GM parameters., but i have not been able to understand it so far.

Most examples on the official forum (EE-UQ and quo-FEM) have uncertainties in the physical properties, as in Young's Modulus or Floor weight for example. Since in those examples the training data is being generated by the SIM (in EE UQ) or FEM (in quo-FEM), and since the physical properties can be changed for validation studies, they are able to plot the Leave-One-Out Cross validation Prediction curves and able to verify their models. Example: (https://github.com/NHERI-SimCenter/EE-UQ/tree/master/Examples/eeuq-0009).

> My second question is not related to the tool but rather understanding the process: Unlike the first part where i was leaving some datasets(derived from experimental studies) for validation purposes, i also want to verify the new realizations given by the surrogate model with the help of a simplified open-sees numerical model. Since the generated realization (in this case as input PGA, PSA and SAavg) represents a time history is it possible to have a time history curve which might have the same parameters? I have read that we can scale earthquakes so that the IMs might match, but it is really difficult matching multiple IMs. If not, how can one validate the generated realizations by the model apart from the scatter plot, as in this example https://github.com/NHERI-SimCenter/EE-UQ/tree/master/Examples/eeuq-0006, wherein one sees whether the generated realizations lie in the same zone or not. Any suggestions for some other techniques which i can read or look into?

> My third question is, is there a difference between the Surrogate Modelling via EE-UQ and quo-FEM and does there exist a possibility of plotting the LOOCV curve in either of these softwares, when there are uncertainties in the Ground motion?

I hope you can bear with so many questions of mine but this is the only forum where there exists a possibility of discussing such questions. Thank you in advance!

Regards,
Gaurav

kuanshi · « **Reply #1 on:** June 16, 2023, 01:24:49 AM »

Hi Gaurav,

Good questions and thanks so much for the interest of using the surrogate modeling in EE-UQ and quoFEM! I'm trying to first brief a potential use case of the PLoM package in EE-UQ/quoFEM and then write my thoughts to your questions - hope they could be helpful if any.

Let's consider the uncertainty in earthquake source, path, and local soil condition, one would except different ground motions given a specific return period. If we select a set of representative ground motions and use them for response history analysis of the structure, the ground motion uncertainty is propagated to the uncertainty in the resulting structural responses. Let's assume one ground motion could be described by a set of M intensity measures (e.g., PGA, PSA...) and we're interested in P different responses (e.g., peak displacement/peak acceleration), a (M+P)xN matrix can be constructed (with N realizations). The PLoM package could help on two possible tasks: (1) it can learn this matrix and generate new realizations that preserve the data structure and (2) it can generate new realizations in which a few dimensions have moments per user-defined values (e.g., the mean Sa of input matrix is 0.6g while the mean Sa of new realizations is moved to 0.4g) by adding corresponding constraints (e.g., the target mean Sa, 0.4g). So, the PLoM package develops a mapping between the joint distributions of input parameters and output responses (e.g., mapping a sample of PSA to a sample of responses).

> For the first question, it may not be straightforward to validate one PLoM realization at a particular SAavg (it's a single realization given the SAavg though one could potentially generate sufficient sets of PLoM realizations and then estimate the statistics); alternatively, it's more easier to validate one set of PLoM realizations at a particular mean SAavg - one could first compute the mean SAavg of the validation set and provide that as a constraint, then the comparison would be made between the statistics of the responses from PLoM and the validation set.

> For the second question, if I understand correctly, ground motion selection/scaling (e.g., https://www.nist.gov/publications/selecting-and-scaling-earthquake-ground-motions-performing-response-history-analyses) seems what you were asking for. Given the target intensity measures (e.g., PSA), one could select recordings that fit the target intensity measures, and spectral matching algorithms may also be considered if desired.

> For the LOOCV, similar to the first questions, it may not be straightforward to do one data point (PLoM is intended to focus on the joint distribution instead of individual points).

Hope the above discussion could be helpful if any and please do feel free to get us back if you have any trouble of running the PLoM package and/or have any needs that we could help on to extend/gear the package better for your use case.

Regards,
Kuanshi

Sang-ri · « **Reply #2 on:** June 16, 2023, 06:04:59 PM »

Hi Gaurav,

Regarding the third question, quoFEM and EE-UQ have an alternative surrogate modeling method (using Gaussian Process) which also displays LOOCV metrics that you may find interesting. See example here

Thanks,
Sang-ri

2006garg · « **Reply #3 on:** June 19, 2023, 11:59:47 AM »

Thank you Kunashi and Sang-ri for your answers and explanations.

So i understand that since PLOM modelling is all about approximating the joint distribution of input parameters and the output responses of the model, it is not possible to validate the results one by one but rather looking at the metrics such as mean, or other statistical quantities for the two data sets, the validating data set and the new realizations data set.

Then Sang ri talked about the importance of the sample distribution of the input parameters between the training and test dataset. However if it is not the same, it has to be constrained. This is also mentioned in your paper(https://onlinelibrary.wiley.com/doi/pdf/10.1002/eqe.3839) and can be best understood by the image attached. I did find the option for constraints under the advanced options button on the UQ Tab. However i guess it needs an python script. Can you help me with what this python script should look like and the what configuration should be used for iteration number and iteration tolerance?

Best regards,
Gaurav

SimCenter Forum

News:

Author Topic: Validating the Surrogate Model (Read 5552 times)

2006garg

Validating the Surrogate Model

kuanshi

Re: Validating the Surrogate Model

Sang-ri

Re: Validating the Surrogate Model

2006garg

Re: Validating the Surrogate Model