Taking advantage of mutliple CPUs
There has been a lot of talk recently about how difficult it is in C/C++ to write good code that efficiently takes advantage of multiple cores/CPUs. That is one of the inherent benefits of LabVIEW, it will automatically split things out onto multiple cores/CPUs if it can figure out that it can.
In the NI-Week keynote, they demonstrated a block diagram with four for-loops executing simultaneously. That is one way to do it. It is simple and gets the job done. However, sometimes you need something that is a little more flexible. To do this, you can use Queues and Open VI by Reference.
This block diagram shows a VI I wrote that generalizes and simplifies the task of running a VI multiple times simultaneously on different sets of data. It takes a path to the VI to run several copies of, two queue references, the names of the queue controls on the VI and the number of copies to start running.
Here is an example of the worker VI. All it takes is two queues, the Data In Queue and the Data Out Queue. It takes a unit of work from the Data In Queue, works on it and puts the results in the Data Out Queue. You may recognize the work being done as the for loop work done in the NI-Week Keynote.
There are, of course, some caveats to great performance and working as expected.
1. Each unit of work must be independent. It may not rely on any other unit of work.
2. There must be no references to uninitialized shift registers, global VIs or references to things that are not protected against being called from multiple threads.
3. The worker VI (and as many sub VIs as possible) must be Reeantrant (File -> VI Properties -> Execution).
4. The Units of work must be fairly hefty. You must have significantly more work done in the VI than the overhead of passing things around in queues.
Using the VI in the keynote on my Mac Pro (4 core), I got 3.6x faster than single core. When I replaced it with the VIs shown above doing the work, I got to 4.9x.
Also of note: on the Mac, you can determine the number of cores in the machine by running the VI at vi.lib/Platform/Miscellaneous.llb/MPProcessors.vi.
In the NI-Week keynote, they demonstrated a block diagram with four for-loops executing simultaneously. That is one way to do it. It is simple and gets the job done. However, sometimes you need something that is a little more flexible. To do this, you can use Queues and Open VI by Reference.
This block diagram shows a VI I wrote that generalizes and simplifies the task of running a VI multiple times simultaneously on different sets of data. It takes a path to the VI to run several copies of, two queue references, the names of the queue controls on the VI and the number of copies to start running.
Here is an example of the worker VI. All it takes is two queues, the Data In Queue and the Data Out Queue. It takes a unit of work from the Data In Queue, works on it and puts the results in the Data Out Queue. You may recognize the work being done as the for loop work done in the NI-Week Keynote.
There are, of course, some caveats to great performance and working as expected.
1. Each unit of work must be independent. It may not rely on any other unit of work.
2. There must be no references to uninitialized shift registers, global VIs or references to things that are not protected against being called from multiple threads.
3. The worker VI (and as many sub VIs as possible) must be Reeantrant (File -> VI Properties -> Execution).
4. The Units of work must be fairly hefty. You must have significantly more work done in the VI than the overhead of passing things around in queues.
Using the VI in the keynote on my Mac Pro (4 core), I got 3.6x faster than single core. When I replaced it with the VIs shown above doing the work, I got to 4.9x.
Also of note: on the Mac, you can determine the number of cores in the machine by running the VI at vi.lib/Platform/Miscellaneous.llb/MPProcessors.vi.
8 Comments:
per phillip I have submitted the Spawn VI.vi to community.ni.com
Marc,
Nice hint. I did something similar for stress testing a system just to load up all the cores. But I always got odd results.
In this case, I load up the system with independently running FFTs. I keep track of the number of threads spawned etc. You can get the code at http://sthmac.magnet.fsu.edu/downloads/CPU%20Loader.zip
The very odd thing is that on a 4 core PPC, CPU usage peaks at about 3 threads and 230% usage. After that the usage gets noisey and decreases down to about 125% at 8 threads????
I believe that it is not thrashing ram and using the threadconfig.vi I have set the system to use 8 threds in all execution engines. any ideas?
Scott,
Please try 8.5. I believe that we fixed some problems with high thread count on more than two CPUs in 8.5.
I will as soon as my official final copy disks arrive. They are "in the mail". I will try to let you know how it goes.
Marc, a few places in your post you contrasted this vi with those shown in the keynote. Are the keynote slides available anywhere? Also, I am a little confused at how your performance could improve more than 4x. I would think that efficiency would approach 4x in an ideal scenario, but not exceed it. Maybe I'm missing something..
Marc,
I just updated to an "official" copy of 8.5. I seem to still get the odd decrease in performance as I spawn more threads.
Do you have an example worker VI that I can actually measure CPU loading? ie do you have the example from the keynote. Since I am spawing reentrant copies of a VI this is a bit different than your example, but should still utilize all my CPU horsepower.
Do you have an example worker VI that I can actually measure CPU loading? ie do you have the example from the keynote. Since I am spawing reentrant copies of a VI this is a bit different than your example, but should still utilize all my CPU horsepower.
You can actually get better performance from the hardcoded loops, but you have to know how many CPUs you are running on to optimize it. This method also scales better.
Marc, great contribution to the multicore LV on Macs. I have so far only a dual core MacBook Pro but LV8.5 does pretty fine there as far as my experiences go. I really would like to get some image acquisition possible again on the Mac though.
BTW it's also great to have http://www.ni.com/mac/ updated. It was about time !
Post a Comment
<< Home