AMD Logo AMD Developer Central

Software Design that gets the most out of new AMD Multi-core Microprocessors 

Skip Navigation LinksHome > Docs & Articles > Articles & Whitepapers
Greg Fry  8/21/2008 
» Overview
» Overloading the UI
» Stopping Background Threads
» Conclusion

Software is often written as a long series of instructions to accomplish a specific task. These instructions can be very intricate, especially when the task being worked on is complicated. However, instead of thinking of the task as a long series of serialized instructions, the task can often be broken down into a grouping of simpler and somewhat independent tasks. With the advent of AMD multi-core processors, software that makes use of such a task-oriented design can obtain dramatic performance improvements over the former method. This article explains that software design shift and walks the reader through an application example, while also discussing some of the advantages of using AMD multi-core microprocessors and AMD-based computing platforms.

Multi-core Microprocessor Overview

Let’s face it – during the last few years, instead of getting much faster, microprocessors have taken a different path towards improving performance – that of adding multiple cores.  Originally, AMD microprocessors supported multi-threading by switching between threads on the same core.  Once multi-core technology was developed, the threads could execute independently on separate cores. 

One intuitive way to split up software into multiple threads is along its User-Interface (UI) and non-UI areas.  In this way, while the user-interface (UI) part of the software is busy “painting” the visual information, other threads can be forging ahead getting the next object ready to display.  Besides making the program overall faster, it results in stunningly responsive applications.  The UI is no longer bogged down doing all the work.

Background on Message-Based UI

End-User software with a windowed UI is most often “message” based.  This means that all actions that the software takes are based on receiving some kind of message from the OS.  Such software receives many different messages, including messages to “paint” the software’s UI/visual elements, messages that indicate the user has clicked on some UI element such as a button, or messages indicating the OS is about to do something that will affect the application (such as shutdown).  In the past, software was often designed so that when a message was received, the message was handled to its completion before returning to the OS or before checking the next message.  This meant that no other UI / visual elements could be manipulated until the handling was completed.  To get around this, handling a message was often broken up by the software into sub tasks, where the software would send itself messages for each sub-task that needed to be accomplished.  And although this allowed for better UI responsiveness because other system messages could get processed in-between these sub-task messages, it was really still single threaded, and was complicated, to say the least.

With today’s multi-core microprocessors, there is a better way of accomplishing the work represented by the various messages that the software receives.  By starting a background thread to do the work required when a message is received, the foreground message handler (the UI thread) can immediately return to the OS and take the next message.  The background thread then posts a message to the message handler when the work has been completed so that the UI can be updated accordingly.  Thus the software stays very responsive.

A Real World Example

At one point in my career, I was working on a project that had a lot of independent UI elements.  Designing the software to handle these elements via multiple threads worked out nicely.  The UI was, to a large degree, independent of the background threads and just responded with UI updates as messages arrived from the other threads.  When some action was initiated by the user, the UI would simply start up a new background thread, or wake up an existing thread, to accomplish the requested action.  Progress of long term operations was managed either by the thread posting update messages at appropriate times, or by simply having the UI thread check and update the progress during regularly scheduled timer events.

One day, while in a program status meeting with various representatives of the company, some decisions were made that required a new feature to be added.  When asked how difficult it would be to add this feature, the UI software engineer involved in the project quickly stated, somewhat cynically, “Oh it will be easy – they’ll just add another thread.”  And he was right – that was exactly how we would do it.

I realized at that point that the particular UI engineer was not used to thinking about software in a multi-threaded way.  He was used to doing everything within the same thread and context of the message handler, and he was having some difficulty adjusting to this “new” style of software architecture.  The initial design of the software, before multi-threading had been applied appropriately, took 45 seconds or longer to load.  After applying the multi-threading concepts discussed in this article, the application took only 15 seconds to load and display useful information.  The final form of the software also allowed the user to have two or three operations running simultaneously and still remain responsive.

Now let's take a look at a multi-core example to better illustrate the advantages that multi-core microprocessors can provide.

Splitting the UI from other Tasks – A Multi-core Coding Example

Let’s say that your UI software has to display the contents of a file as binary data, also known as a “hex” viewer (please note that in order to remain focused and easy to follow, the example code has had all error checking removed as well as some other simplifications, such as using 32 bit arithmetic instead of 64 bit).  One way of doing this would be to read each section of the file and translate these sections to strings of their binary equivalent and display these strings in a list box (as shown in Figure 1).

Figure 1: The BinaryViewer

A user could indicate which part of the file they wanted to view by entering file offsets in part of the UI and then, when ready, by clicking on a “Show Binary Data” button.  When the software receives the “Show Binary Data-button-was-clicked" message, instead of reading the file and converting the data to hexadecimal display strings within the message handler, the software starts a new thread to do this work:

void CBinaryViewerDlg::OnBnClickedShowBinaryData()
{
DWORD readThreadId = 0;

UpdateData(TRUE);

m_hBackgroundThread = CreateThread(NULL, 0, ReadingThread,
(LPVOID)this, 0, &readThreadId);

return;
}

From that thread, as each display string was prepared, the thread would post a message indicating another line was ready: 

DWORD WINAPI ReadingThread(LPVOID pvDlgptr)
{
CBinaryViewerDlg *pDlg = (CBinaryViewerDlg *)pvDlgptr;
HANDLE hReadFile = INVALID_HANDLE_VALUE;
ULONG currentOffset = 0;
unsigned char dataBuf[16] = { 0 };

hReadFile = CreateFile(pDlg->m_filename, GENERIC_READ, 0, NULL,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
SetFilePointer(hReadFile, (LONG)pDlg->m_ulStartingOffset, NULL, FILE_BEGIN);

for (currentOffset = pDlg->m_ulStartingOffset;
currentOffset <= pDlg->m_ulEndingOffset;
currentOffset += 16)
{
DWORD bytesRead = 0;
TCHAR *pHexDataStr = new TCHAR [88];

ReadFile(hReadFile, dataBuf, 16, &bytesRead, NULL);
if (bytesRead == 0) // if End of File encountered, then stop
break;
FormatHexData(currentOffset, dataBuf, pHexDataStr);
pDlg->PostMessage(WM_ADD_HEX_DATA_ITEM, 0, (LPARAM)pHexDataStr);
}

CloseHandle(hReadFile);
return 0;
}

The UI part of the software would then process that message by adding the string to the list box:

LRESULT CBinaryViewerDlg::OnAddHexDataItem( WPARAM wParam, LPARAM lParam ) 
{
TCHAR *pHexDataStr = (TCHAR *)lParam;
m_listbox.AddString(pHexDataStr);
delete [] pHexDataStr;
return 0;
}

Processing the button click as described above does a number of things to improve performance and responsiveness:

  1. The foreground UI/message handling thread is not stalled while the software waits for what could be fairly lengthy I/O operations to take place.  This is important because even if a line representing the binary data were added to the list box, until the original message (“Get Binary” being clicked) handler returned, any new messages to paint newly added items couldn’t be processed.
  2. While I/O and data translation is taking place, the UI can be doing the painting of a line, thus making use of at least two cores in a multi-core system.
  3. The I/O requests can be broken up into small pieces which means that the UI can begin painting much quicker, thus appearing very responsive.
  4. The I/O doesn’t have to wait for each UI update to complete, and the UI doesn’t have to stay synchronized with the I/O, thus allowing the I/O and UI threads to run at their own optimal speed.
  5. The UI can process other messages even while the I/O operations are still going on.  This is especially important if the user decides they want to cancel the current operation instead of waiting for its completion.  If all the work were being done in the message handling thread, the software couldn’t know about the user clicking a “cancel” button without having to add additional message handling checks within the file data handling loop.
The “Why” Behind the Coding Example

Several things in the coding example need some additional explanation. 

First, the message used to cause an item to be added to the list was defined by the software itself.  It was not an OS pre-defined message:

#define WM_ADD_HEX_DATA_ITEM (WM_USER+1)

In a real world situation there would probably be many such messages defined, each with its own message handler function.  Each function would take care of updating the UI according to the meaning of the message and any data that was sent with it.

Second, the background thread “posted” messages – it didn’t “send” messages.  There is an important reason why this was done.  Posting a message (via the pDlg->PostMessage(…) call) just drops it off in the message handlers message queue, without waiting for it to be processed.  Sending a message (via the pDlg->SendMessage(…) call) would have waited for the message handler thread to receive it and process it and return a result.  This is not what we want to have happen.  The threads need to live and work independently.

Third, the string that held the information to be added to the listbox was allocated in one thread (the ReadingThread) and freed in the other (UI) thread.  It is important that any memory shared between threads either be used in a producer / consumer methodology or that it be permanent.  In our example, since it wasn’t known how many lines might be needed, and since allocating and disposing of the memory that held each line would allow the two threads to work independently, the producer / consumer methodology was used.  Thus the code did:

    TCHAR *pHexDataStr = new TCHAR [88];

in the reading thread, and it did the following in the message handler:

    delete [] pHexDataStr;

One caution regarding messaging and passing pointer – don’t pass the address of a local variable to the PostMessage function.  By the time the message actually gets processed, the contents of the local variable may have changed, or worse, the local variable may have gone out of scope.

Overloading the UI

Sometimes, if the background threads are producing lots of messages, there needs to be some mechanism for throttling the producer and consumer relationship between such threads.  In our example above, the ReadingThread could easily overwhelm the UI thread if adding the strings to the list is slower than reading 16 bytes of data from a file.  In this case, some simple counts can be used as follows:

LRESULT CBinaryViewerDlg::OnAddHexDataItem( WPARAM wParam, LPARAM lParam ) 
{
TCHAR *pHexDataStr = (TCHAR *)lParam;
m_listbox.AddString(pHexDataStr);
delete [] pHexDataStr;
m_msgsProcessed++;
return 0;
}

DWORD WINAPI ReadingThread(LPVOID pvDlgptr)
{
....
ULONG msgsPosted = 0;
....

for (currentOffset = pDlg->m_ulStartingOffset;
currentOffset <= pDlg->m_ulEndingOffset;
currentOffset += 16)
{
....

pDlg->PostMessage(WM_ADD_HEX_DATA_ITEM, 0, (LPARAM)pHexDataStr);

msgsPosted++;
if (msgsPosted > (pDlg-> m_msgsProcessed + 1000))
{
while (msgsPosted > m_msgsProcessed)
{
Sleep(20);
}
}
}

....
}

Stopping Background Threads

Although it is not shown in the original code example, being able to cancel background operations is desirable.  This can easily be accomplished by checking a variable in the ReadingThread’s loop that gets set by the foreground when a “stop” or “cancel” button gets pressed.  Such code would look something like this:

LRESULT CBinaryViewerDlg::OnStopReading( WPARAM wParam, LPARAM lParam )
{
m_bStopReading = TRUE;
WaitForSingleObject(m_hBackgroundThread, INFINITE);
CloseHandle(m_hBackgroundThread);
return 0;
}


DWORD WINAPI ReadingThread(LPVOID pvDlgptr)
{
....

for (currentOffset = pDlg->m_ulStartingOffset;
currentOffset <= pDlg->m_ulEndingOffset;
currentOffset += 16)
{
DWORD bytesRead = 0;
TCHAR *pHexDataStr = new TCHAR [88];

if (pDlg->m_bStopReading)
Break;

ReadFile(hReadFile, dataBuf, 16, &bytesRead, NULL);
....
}

....
}

Conclusion

In conclusion, it is easy to see that, by changing a few things in UI software design, multi-core microprocessors can be used effectively.  This results in a more responsive UI.  By having multiple threads taking on different tasks and the UI thread doing just UI activity, the work can be spread out across all the cores of the microprocessor.    

In the coding example in this article, we only took advantage of two cores (i.e. used two threads).  But AMD microprocessors now come in quad-core designs that run even more efficiently.  And some motherboards allow for running two or more of these quad-core processors together. 

So, don’t stop at two threads.  Take threading to the max!  In this way, the AMD multi-core processors can make your software really fly.  

Greg Fry of Portland, Oregon, A 20+ year veteran of professional software development, including a decade in commercial software development (large volume, end-user/shrink-wrapped software). B.S. in Computer Science, Instrument rated private pilot, and outdoor enthusiast.

Back to top
© 2009 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, AMD Opteron, AMD Athlon, AMD Turion, AMD Sempron, AMD LIVE!, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Linux is a registered trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.

This website may be linked to other websites which are not in the control of and are not maintained by AMD. AMD is not responsible for the content of those sites. AMD provides these links to you only as a convenience, and the inclusion of any link to such sites does not imply endorsement by AMD of those sites. AMD reserves the right to terminate any link or linking program at any time.