SimCenter Forum

Research in Natural Hazards Engineering => Uncertainty Quantification (quoFEM) => Topic started by: atish on October 25, 2023, 04:55:13 AM

Title: Using quoFEM for GP based Surrogate Modelling and Sensitivity Analysis,
Post by: atish on October 25, 2023, 04:55:13 AM

Dear SimCenter Team,

I am a PhD student at the University of New South Wales, Australia. Currently, I am working on a global sensitivity analysis using a high-fidelity atmospheric model. I have considered 50 reaction rates (input parameter) as my input quantity of interest, whose effects I will be quantifying. Using 512 sets of perturbed reaction rates, I tried to run my model and perform sobol analysis. However, my sobol indices are not converging, as I think my sample size is small. Now I have decided to create a surrogate model using the Gaussian Process (GP) Regression method and then perform sobol analysis. I am new to machine learning.

My question:
1. Since my input parameter is 50 and I have 512 output data, can I use quoFEM to perform sensitivity analysis.
2. If not, then can I use quoFEM to create GP based surrogate model.

Title: Re: Using quoFEM for GP based Surrogate Modelling and Sensitivity Analysis,
Post by: Sang-ri on October 25, 2023, 08:45:04 PM

Hi,

Thanks for the post! I have some quick questions before going into details.

1. Can you let me know the dimensions of the model input (=number of the parameters whose contribution to the responses will be inspected) and output (=number of the responses from a simulation run) of interest? I am assuming that the output dimension is 50, but I just wanted to clarify. I believe the number of simulations you ran is 512.

2. Regarding the sensitivity analysis that did not converge, was it performed using quoFEM?

Thanks,
Sang-ri

Title: Re: Using quoFEM for GP based Surrogate Modelling and Sensitivity Analysis,
Post by: atish on October 30, 2023, 02:37:26 AM

Thanks, Sang-ri.

My apologies for the late reply. I have been feeling unwell for the past few days.

1. The dimension of my model input is 50. I have chosen 50 chemical reaction rates whose contribution I would like to inspect. The atmospheric model that I am working on is the Globol Ionosphere Thermosphere Model (GITM), which outputs many parameters, but I will try to focus on baseline parameters such as neutral temperature, nitric oxide density, neutral density, and electron density. Let's say, for now, my output dimension is 4.

2. No, I have not used quoFEM before. Previously, I tried to use the SALib - Sensitivity Analysis Library in Python to perform Sobol Analysis. The result was not good.

I have 512 simulation data which I obtained when I ran my model using 512 different sets of perturbed reaction rates. Each set of perturbed reaction rates contains 50 reaction rates. I generated these 512 sets of perturbed reaction rates using Monte Carlo simulation.

Regards,
Atish

Title: Re: Using quoFEM for GP based Surrogate Modelling and Sensitivity Analysis,
Post by: Sang-ri on October 30, 2023, 07:36:22 PM

Hi Atish,

Thanks much for clarifying! I hope you get fully recovered soon.

Dealing with high-dimensional input is typically more tricky than high-dimensional output, partially because of the algorithmic challenges (e.g. the number of parameters to be optimized increases), but more importantly, because this means it is likely that the sensitivity index values are very low. In an extreme case, imagine the case where 50 variables equally contribute to the response - the sensitivity of each variable can be less than 0.02, and getting the estimation accuracy of this level will require an enormous number of samples. With 512 samples, the estimation can be significantly perturbed by the sampling variability. However, on the other hand, if only a few variables actually dominate the response of your model, some algorithms can work. The best way to figure it out is to test it out :)

To run quoFEM analysis using existing Monte Carlo results, please follow the below:

In the UQ tab, select "Sensitivity Analysis"-"SimCenterUQ"-"Import Data Files". Set # samples to 512 and import the data files that are prepared following the instructions (https://nheri-simcenter.github.io/quoFEM-Documentation/common/user_manual/usage/desktop/SimCenterUQSensitivity.html#lblsimsensitivity).
In the FEM tab, select "none".
Then, if you click the RV tab, quoFEM should already have auto-populated 50 variables (nothing to change), and finally, in the QoI tab, you can set any name for the output variable and set the length to 4.

Some caveats:

Please note that the total sensitivity index coming from the algorithm in SimCenterUQ is likely not credible for such high-dimensional inputs (the challenge is in fitting a Gaussian mixture distribution in 50-dim space; see here (https://nheri-simcenter.github.io/quoFEM-Documentation/common/technical_manual/desktop/SimCenterUQTechnical.html) for the reference). So, only the main index should be useful.
If you want to get the reliable total index, you may want to run the algorithm in the Dakota engine, but this typically requires a much larger number of simulations and cannot be estimated using pre-simulated samples (need to import the model in FEM tab). But this algorithm is guaranteed to converge to the exact solution if the number of samples is very large
One more caveat is warranted for the case where the input variables are correlated. If this is the case, please note that the contribution can be "double counted" for the correlated variables, and be careful with the interpretations of Sobol indices.

For the surrogate model, I assume GP in quoFEM would not work - With 512 samples for 50 input dimensions, it will very likely result in overfitting.

Please let me know if something is unclear or have difficulty running the analysis.

Best,
Sang-ri

Title: Re: Using quoFEM for GP based Surrogate Modelling and Sensitivity Analysis,
Post by: atish on October 31, 2023, 12:05:12 AM

Many thanks, Sang-ri.

1. I will try to perform the sensitivity analysis on quoFEM as per your instructions. Fingers crossed. Let's see how the sensitivity index values are.
2. For the GP-based surrogate model, could you please suggest which tools will be easier for my case, provided I have no machine learning experience? I am comfortable with Python and MATLAB.

Title: Re: Using quoFEM for GP based Surrogate Modelling and Sensitivity Analysis,
Post by: Sang-ri on October 31, 2023, 11:31:04 PM

Hi Atish,

For the second question, I believe the overfitting is a general limitation of GP for high-dimensional inputs (rather than a limitation in specific toolboxes/packages), and its effect is highly problem-specific.

You could always try running quoFEM, because once the data files are prepared to run the sensitivity analysis, the same files can be easily used for surrogate model training. The cross-validation results are provided as an output to help you understand how well the surrogate model is trained.

quoFEM provides easy access to UQ beginners as we put some recommended setups by default, but if you would like to have more control of the surrogate training algorithm by directly working on python/matlab toolboxes, "GPy" is the Python package that quoFEM utilizes for GP training. Additionally, "UQpy" (python) and "UQlab" (matlab) are some of the well-established and maintained UQ packages that have surrogate training modules.

Best,
Sang-ri