Author Topic: Parallel execution on a Windows HPC  (Read 33667 times)

rsam1993

  • Newbie
  • *
  • Posts: 21
    • View Profile
Parallel execution on a Windows HPC
« on: July 05, 2023, 06:17:41 PM »
Dear all,

I am using QuoFEM on my personal computer (20 core and 40 logical processors - Intel Xeon 4210R 2.40GHz) and our server with a Windows HPC (64 core and 128 logical processors - AMD EPYC 7513 2.60GHz). When I use QuoFEM on my PC with 50 samples for forward propagation analysis, 100% of the CPU is active and occupied by QuoFEM, which makes sense. However, when I use the exact same model in QuoFEM on our server, but with 150 samples, I expect to see that 100% of the CPU is working, while the CPU utilization never goes beyond 65%. I was wondering if there is some type of limitation on the setting of our server or if there are any restrictions in QuoFEM parallel execution.

Thank you,

Sang-ri

  • Administrator
  • Jr. Member
  • *****
  • Posts: 70
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #1 on: July 06, 2023, 12:30:44 AM »
Hello!

To help us identify the reason why the CPU is not fully occupied, can you please follow the below steps?

1. In the server, find a file named "dakota.in" created by quoFEM. This should be in the folder where the working directories are located ("C:\Users\SimCenter\Documents\quoFEM\LocalWorkDir\tmp.SimCenter" in my machine)
2. open "dakota.in" using a text editor
3. Please let us know the number written after the keyword "asynchronous evaluation_concurrency"

Thank you,
Sang-ri

rsam1993

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #2 on: July 06, 2023, 03:18:08 PM »
Here is the info written in dakota.in file. It is interesting because the number is 64! Then why the CPU utilization is still around 60%?

interface
  analysis_driver = 'workflow_driver1.bat'
  fork
   parameters_file = 'paramsDakota.in'
   results_file = 'results.out'
   aprepro
   work_directory
     named 'workdir'
     directory_tag
     directory_save
     file_save
     copy_files = 'templatedir/*'
  asynchronous evaluation_concurrency = 64


Sang-ri

  • Administrator
  • Jr. Member
  • *****
  • Posts: 70
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #3 on: July 06, 2023, 08:37:10 PM »
Hi,

Thank you for the info. We think this number should be 128 instead of 64. While we figure out the solution, can you try the following workaround and let us know if this makes CPU occupied 100%?

1. Modify the number after "asynchronous evaluation_concurrency" in dakota.in from 64 to 128
2. Remove all files and folders in the local working directory except for "dakota.in" and "templatedir"
3. Find the path of the Dakota executable from the preference window of quoFEM. Let us denote this {dakota path}
4. Open the command prompt, cd into the folder where dakota.in is located, and type "{dakota path} dakota.in" (without the quotation marks)

It will run the forward propagation analysis, and the results will be shown in dakotaTab.out.

Thank you,
Sang-ri

rsam1993

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #4 on: July 07, 2023, 02:57:39 PM »
The procedure you described seems straightforward and I believe I did it right, but it does not work correctly. It only makes 128 workdir folders while I expect 150 (the analysis was done using 150 samples) and there are other errors regarding some required files which cannot be found when I run dakota.in through the command prompt.  Here is some part of the errors I got:


C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.127>python writeParam.py paramsDakota.in params.in
'python' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.127>call ./driver.bat

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.127>"C:/quoFEM_Windows_Download/applications/performUQ/templateSub/simCenterSub.exe" params.in SimCenterInput.RV SimCenterInput.tcl
ERROR: simCenterDprepro could not open: params.in

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.127>OpenSees SimCenterInput.tcl  1>ops.out 2>&1
nonblocking fork: workflow_driver1.bat paramsDakota.in results.out
Second pass: scheduling 22 remaining local asynchronous jobs
Waiting on completed jobs
Too many processes (128) in wait_setup
Current limit on processes = 64

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>python writeParam.py paramsDakota.in params.in
'python' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>call ./driver.bat

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>"C:/quoFEM_Windows_Download/applications/performUQ/templateSub/simCenterSub.exe" params.in SimCenterInput.RV SimCenterInput.tcl

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter>ERROR: simCenterDprepro could not open: params.in

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>OpenSees SimCenterInput.tcl  1>ops.out 2>&1

Sang-ri

  • Administrator
  • Jr. Member
  • *****
  • Posts: 70
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #5 on: July 07, 2023, 06:47:02 PM »
Hi,

I apologize that I missed one important step. It is not running because the commands "python" and "opensees" are not recognized.

There are two choices:

(1)
You can add to the Windows "PATH environment variable" the two directories where Python and OpenSees executables are located [the directories can be found in the quoFEM preference]. Here is an example of how to add to the PATH variable. Please note that these paths should end at the folder level ("bin") and should not include the executable name.

(2)
An alternative is to find "workflow_driver1.bat" and "driver.bat" created inside "templatedir", open it with a text editor, and replace the commands "python" and "opensees" to {python path} and {opensees path}, similarly to what we did for Dakota.

Please let me know if something is unclear. Thanks!

Sang-ri

rsam1993

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #6 on: July 07, 2023, 09:49:40 PM »
Hi Sang-ri,

I did the second approach but I am still getting the same error, which means I am doing something wrong.

Here is the modified "workflow_driver1.bat" and "driver.bat" after I replaced the commands "python" and "opensees" to {python path} and {opensees path}. Let me know if there is anything wrong here, please.

workflow_driver1.bat:

C:/quoFEM_Windows_Download/applications/python writeParam.py paramsDakota.in params.in
call ./driver.bat



driver.bat:

"C:/quoFEM_Windows_Download/applications/performUQ/templateSub/simCenterSub.exe" params.in SimCenterInput.RV SimCenterInput.tcl
C:/quoFEM_Windows_Download\applications\opensees\bin SimCenterInput.tcl 1> ops.out 2>&1

Sang-ri

  • Administrator
  • Jr. Member
  • *****
  • Posts: 70
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #7 on: July 07, 2023, 10:43:55 PM »
Can you change "C:/quoFEM_Windows_Download\applications\opensees\bin" to "C:/quoFEM_Windows_Download\applications\opensees\bin\OpenSees" and see if it works?

rsam1993

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #8 on: July 08, 2023, 04:21:50 PM »
I applied that change and still getting this error. Seems python cannot be called or found.


C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>C:/quoFEM_Windows_Download/applications/python writeParam.py paramsDakota.in params.in
'C:/quoFEM_Windows_Download/applications/python' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>call ./driver.bat

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>"C:/quoFEM_Windows_Download/applications/performUQ/templateSub/simCenterSub.exe" params.in SimCenterInput.RV SimCenterInput.tcl

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter>ERROR: simCenterDprepro could not open: params.in

C:\Users\rsamtaslimi\Documents\quoFEM\LocalWorkDir\tmp.SimCenter\workdir.128>C:/quoFEM_Windows_Download\applications\opensees\bin\OpenSees SimCenterInput.tcl  1>ops.out 2>&1

Sang-ri

  • Administrator
  • Jr. Member
  • *****
  • Posts: 70
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #9 on: July 08, 2023, 07:29:46 PM »
I see, {python path} should also be "C:/quoFEM_Windows_Download/applications/python/python". Sorry for the confusion.

rsam1993

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #10 on: July 10, 2023, 06:29:31 PM »
Good news!

It is finally working and with this method, all the 128 cores of the CPU are occupied and the CPU utilization is 100% now. So, how can we fix the QuoFEM itself to do this automatically? Should I wait for a new update from your side?

Sang-ri

  • Administrator
  • Jr. Member
  • *****
  • Posts: 70
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #11 on: July 10, 2023, 11:18:59 PM »
That's great to know! Yes, unfortunately, this can only be fixed in the future release.

We will post a reply here once it is released. Thank you again for reporting the bug and testing with us.

rsam1993

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #12 on: July 11, 2023, 02:00:29 AM »
Very good.

Thank you for your help and support.

rsam1993

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #13 on: August 31, 2023, 04:36:59 PM »
Dear Sang-ri

Hope all is well.

I was wondering if the issue we talked about in this topic has been resolved in QuoFEM yet. If not, do you know when the new update is going to be released?


Thanks,

Sang-ri

  • Administrator
  • Jr. Member
  • *****
  • Posts: 70
    • View Profile
Re: Parallel execution on a Windows HPC
« Reply #14 on: August 31, 2023, 09:56:47 PM »
Hello,

We are planning to have a pre-release of this feature within the next couple of weeks.

I'll let you know as soon as it is uploaded. I apologize for the delay, and I appreciate your inquiry!

Best,
Sang-ri