SimCenter Forum

Research in Natural Hazards Engineering [Archived] => Uncertainty Quantification (quoFEM) => Topic started by: rsam1993 on July 05, 2023, 06:17:41 PM

Title: Parallel execution on a Windows HPC
Post by: rsam1993 on July 05, 2023, 06:17:41 PM
Dear all,

I am using QuoFEM on my personal computer (20 core and 40 logical processors - Intel Xeon 4210R 2.40GHz) and our server with a Windows HPC (64 core and 128 logical processors - AMD EPYC 7513 2.60GHz). When I use QuoFEM on my PC with 50 samples for forward propagation analysis, 100% of the CPU is active and occupied by QuoFEM, which makes sense. However, when I use the exact same model in QuoFEM on our server, but with 150 samples, I expect to see that 100% of the CPU is working, while the CPU utilization never goes beyond 65%. I was wondering if there is some type of limitation on the setting of our server or if there are any restrictions in QuoFEM parallel execution.

Thank you,
Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on July 06, 2023, 12:30:44 AM

To help us identify the reason why the CPU is not fully occupied, can you please follow the below steps?

1. In the server, find a file named "" created by quoFEM. This should be in the folder where the working directories are located ("C:\Users\SimCenter\Documents\quoFEM\LocalWorkDir\tmp.SimCenter" in my machine)
2. open "" using a text editor
3. Please let us know the number written after the keyword "asynchronous evaluation_concurrency"

Thank you,
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on July 06, 2023, 03:18:08 PM
Here is the info written in file. It is interesting because the number is 64! Then why the CPU utilization is still around 60%?

  analysis_driver = 'workflow_driver1.bat'
   parameters_file = ''
   results_file = 'results.out'
     named 'workdir'
     copy_files = 'templatedir/*'
  asynchronous evaluation_concurrency = 64

Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on July 06, 2023, 08:37:10 PM

Thank you for the info. We think this number should be 128 instead of 64. While we figure out the solution, can you try the following workaround and let us know if this makes CPU occupied 100%?

1. Modify the number after "asynchronous evaluation_concurrency" in from 64 to 128
2. Remove all files and folders in the local working directory except for "" and "templatedir"
3. Find the path of the Dakota executable from the preference window of quoFEM. Let us denote this {dakota path}
4. Open the command prompt, cd into the folder where is located, and type "{dakota path}" (without the quotation marks)

It will run the forward propagation analysis, and the results will be shown in dakotaTab.out.

Thank you,
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on July 07, 2023, 02:57:39 PM
The procedure you described seems straightforward and I believe I did it right, but it does not work correctly. It only makes 128 workdir folders while I expect 150 (the analysis was done using 150 samples) and there are other errors regarding some required files which cannot be found when I run through the command prompt.  Here is some part of the errors I got:

'python' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.127>call ./driver.bat

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.127>"C:/quoFEM_Windows_Download/applications/performUQ/templateSub/simCenterSub.exe" SimCenterInput.RV SimCenterInput.tcl
ERROR: simCenterDprepro could not open:

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.127>OpenSees SimCenterInput.tcl  1>ops.out 2>&1
nonblocking fork: workflow_driver1.bat results.out
Second pass: scheduling 22 remaining local asynchronous jobs
Waiting on completed jobs
Too many processes (128) in wait_setup
Current limit on processes = 64

'python' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>call ./driver.bat

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>"C:/quoFEM_Windows_Download/applications/performUQ/templateSub/simCenterSub.exe" SimCenterInput.RV SimCenterInput.tcl

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter>ERROR: simCenterDprepro could not open:

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>OpenSees SimCenterInput.tcl  1>ops.out 2>&1
Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on July 07, 2023, 06:47:02 PM

I apologize that I missed one important step. It is not running because the commands "python" and "opensees" are not recognized.

There are two choices:

You can add to the Windows "PATH environment variable" the two directories where Python and OpenSees executables are located [the directories can be found in the quoFEM preference].  Here  ( is an example of how to add to the PATH variable. Please note that these paths should end at the folder level ("bin") and should not include the executable name.

An alternative is to find "workflow_driver1.bat" and "driver.bat" created inside "templatedir", open it with a text editor, and replace the commands "python" and "opensees" to {python path} and {opensees path}, similarly to what we did for Dakota.

Please let me know if something is unclear. Thanks!

Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on July 07, 2023, 09:49:40 PM
Hi Sang-ri,

I did the second approach but I am still getting the same error, which means I am doing something wrong.

Here is the modified "workflow_driver1.bat" and "driver.bat" after I replaced the commands "python" and "opensees" to {python path} and {opensees path}. Let me know if there is anything wrong here, please.


call ./driver.bat


"C:/quoFEM_Windows_Download/applications/performUQ/templateSub/simCenterSub.exe" SimCenterInput.RV SimCenterInput.tcl
C:/quoFEM_Windows_Download\applications\opensees\bin SimCenterInput.tcl 1> ops.out 2>&1
Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on July 07, 2023, 10:43:55 PM
Can you change "C:/quoFEM_Windows_Download\applications\opensees\bin" to "C:/quoFEM_Windows_Download\applications\opensees\bin\OpenSees" and see if it works?
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on July 08, 2023, 04:21:50 PM
I applied that change and still getting this error. Seems python cannot be called or found.

'C:/quoFEM_Windows_Download/applications/python' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>call ./driver.bat

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>"C:/quoFEM_Windows_Download/applications/performUQ/templateSub/simCenterSub.exe" SimCenterInput.RV SimCenterInput.tcl

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter>ERROR: simCenterDprepro could not open:

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>C:/quoFEM_Windows_Download\applications\opensees\bin\OpenSees SimCenterInput.tcl  1>ops.out 2>&1
Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on July 08, 2023, 07:29:46 PM
I see, {python path} should also be "C:/quoFEM_Windows_Download/applications/python/python". Sorry for the confusion.
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on July 10, 2023, 06:29:31 PM
Good news!

It is finally working and with this method, all the 128 cores of the CPU are occupied and the CPU utilization is 100% now. So, how can we fix the QuoFEM itself to do this automatically? Should I wait for a new update from your side?
Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on July 10, 2023, 11:18:59 PM
That's great to know! Yes, unfortunately, this can only be fixed in the future release.

We will post a reply here once it is released. Thank you again for reporting the bug and testing with us.
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on July 11, 2023, 02:00:29 AM
Very good.

Thank you for your help and support.
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on August 31, 2023, 04:36:59 PM
Dear Sang-ri

Hope all is well.

I was wondering if the issue we talked about in this topic has been resolved in QuoFEM yet. If not, do you know when the new update is going to be released?

Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on August 31, 2023, 09:56:47 PM

We are planning to have a pre-release of this feature within the next couple of weeks.

I'll let you know as soon as it is uploaded. I apologize for the delay, and I appreciate your inquiry!

Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on October 26, 2023, 12:06:17 AM

I apologize for the delayed update. We have finally updated the quoFEM (v3.4.0).

To run the analysis with all 128 cores, please locate the attached config.json file in the same directory as the quoFEM executable. Then, you can start the quoFEM application as usual.

Detecting the configuration file, it will automatically overwrite the evaluation_concurrency in to 128 (64*2). Currently, the multiplier can only be an integer.

Please let us know if you have any trouble or questions.

Thank you,
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on October 29, 2023, 03:30:43 PM

I really appreciate it. I will start using this new update soon and let you know in case there is any problem, which I doubt.
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on October 29, 2023, 08:04:05 PM
Dear Sang-ri,

I have updated the QuoFEm application on our HPC and located the config.jason file in the same directory as the QuoFEM executable. Now when I run it with 128 samples all the 128 cores are called and it seems the Opensees analyses are performing successfully, but I get this error at the end and QuoFEm does not give me any results!

Error Running Dakota: Too many processes (128) in wait_setupCurrent limit on processes = 64

And here is the error message that I get from dakota.err file:

Too many processes (128) in wait_setup
Current limit on processes = 64

I am not sure where the problem is, because it should work. Please let me know what you need me to share with you to find the reason for this error.
Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on October 30, 2023, 11:45:13 PM

Thank you so much for following up, and I'm sorry for the inconvenience! Your feedback is extremely appreciated because we could not test this feature without having a machine with more than 64 cores.

Can you please check if "dakotaTab.out" file is created in the local working directory (C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter) and see if it contains the desired sample evaluation results?

If it does, it would be great if you could share "dakota.err" file in the same folder with us. Currently, quoFEM is raising an error whenever dakota.err is non-empty. So, we can simply add an exception condition to fix that.

If "dakotaTab.out" has not been created properly, please share files "dakota.err", "", "dakota.out", and "log.txt", if those exist in the local working directory, to help us figure out the source of error.

Thanks again!,
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on October 31, 2023, 01:40:34 AM
Thank you for your prompt response,

Yes, dakotaTab.out is created, but the results are not there. It seems to me that the dakota.out file is not completely generated by QuoFEM. I am attaching all the files you mentioned so you can check them all and see where this issue comes from.

Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on October 31, 2023, 11:01:13 PM

Thanks for sharing the file. We are still struggling to identify the issue. From my understanding, the automated process of overwriting the evaluation_concurrency value is exactly the same as the process we tried manually, as shown below.


Thank you for the info. We think this number should be 128 instead of 64. While we figure out the solution, can you try the following workaround and let us know if this makes CPU occupied 100%?

1. Modify the number after "asynchronous evaluation_concurrency" in from 64 to 128
2. Remove all files and folders in the local working directory except for "" and "templatedir"
3. Find the path of the Dakota executable from the preference window of quoFEM. Let us denote this {dakota path}
4. Open the command prompt, cd into the folder where is located, and type "{dakota path}" (without the quotation marks)

It will run the forward propagation analysis, and the results will be shown in dakotaTab.out.

Thank you,

Regarding this, is it possible that when we manually tried, dakotaTab.out file was not properly created even though the CPU was occupied 100%? Sorry, I should have asked this earlier.

If unsure, please just let me know. We will continue investigating the issue on our side.


Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on October 31, 2023, 11:34:58 PM
Honestly I do not remember if the dakotaTab.out was properly created when we did everything manually. Let me try it again and keep you posted.
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on November 01, 2023, 02:32:35 AM
I have bad news,

I ran Dakota through the command prompt as you taught me before with 128 samples. During the analysis, the CPU is 100% occupied but I get an empty dakotaTab.out at the end, and the same error in the terminal (please see the attached screenshot). But there is no dakota.err file in the directory.
Title: Re: Parallel execution on a Windows HPC
Post by: Sang-ri on November 12, 2023, 01:21:35 AM
Thank you so much for revisiting the previous tests! The test was immensely helpful as it clarified that the limitation comes from the UQ engine rather than the interface.

I apologize for the late reply - I was out of the office last week. In the meantime, our team made some effort to find a workaround, but unfortunately, we could not find an immediate solution, especially without being able to reproduce the error. Also, the source of issue is related to the internal function of Dakota program, which is slightly beyond SimCenter's development focus. It seems like we cannot provide a solution at this point.

However, we will keep you posted in case of further updates.

Thanks again,
Title: Re: Parallel execution on a Windows HPC
Post by: rsam1993 on November 12, 2023, 01:26:18 AM

Thanks for teh update. I believe I should be still able to use the interface and call all the 128 cores, so the sampling and opensees analyses will be done and then using those results, I should be able to calculate the mean, standard deviation and other outputs separately.

But, hopefully, this issue can be fixed at some point.