Tuesday, November 27, 2007

RAII as an alternative to Garbage Collectors

This text I've written is actually already pretty old, but I never really 'published' it on the web, because I was kinda too lazy. :-)
The topic is comparing RAII (Resource Acquisition Is Initialisation) to common Garbage Collector practices and why, in my opinion, Garbage Collectors are not needed in this world.
So here's a link, enjoy:
Garbage Collectors vs. RAII

Thursday, November 22, 2007

boost::asio, Visual Studio 2005 and Windows 2000

As we upgraded our compiler to 2005 earlier this year, I converted one project from Visual Studio 6 to the 2005 version now. It is a service-application and I struggled with it because it did not start. Then I ran the .exe from Explorer instead of the Services-management console to see that it required the (POSIX) function "freeaddrinfo", which is not present in Windows 2000's ws2_32.dll. This is a serious problem, because we want to support Windows 2000 Server edition for that software, especially since we run that version ourself! So I digged into it and found out about this:

Lost backwards-compatibility

According to this Microsoft-Blog the old versions of Visual Studio (read: 6 and earlier) defined getaddrinfo, freeaddrinfo and others as macros that called in-line functions that were declared in the now-legacy wspiapi.h header. When including Winsock2.h nowadays with VS 2003 (.NET) and higher, you don't include the wspiapi.h header anymore and use the usual function-declarations from ws2tcpip.h instead. Those functions now are resolved at run-time to be loaded from ws2_32.dll. Windows 2000's version of this dll does not provide these functions, because at that time they did not yet exist and were expanded to in-line functions instead.

The problem with asio

And here comes the trouble with boost::asio: boost::asio does not let you include the old Winsock-headers when compiling for a Windows-target greater than NT-version 5.1 (this means XP). It will #error out with the message "WinSock.h has already been included".


To circumvent this, you need to define _WIN32_WINNT to something lower than 0x0501. 0x0500 is quite resonable for this.

Friday, November 16, 2007

Making .NET-assemblies available to native languages

As jb from #mono pointed out, there's already a tool that converts Assemblies to corresponding glue-code in C, called cilc. It's included in mono and uses mono runtime libraries, so you need to have mono installed to use it.
I could only find a man-page here, but no documentation so far, but it should be fairly easy to use and understand the C-code, I guess.

Making C++ libraries available to .NET and native languages

I just got the following idea on how I could expose my pure, unmanaged C++ interface to .NET-languages. The idea is taking two steps:

Step 1

I create a C-interface from my C++ classes, which should be rather easy. Take the class "class Foo { void bla(); };" for example, which would be wrapped into C-code like "struct foo *foo_create();" and "void foo_bla(struct foo*);".
With this, I have a language-neutral interface that I can put into a library. With language-neutral I mean that nearly every language can interact with C-interfaces. This can be done manually or you can write a 'converter' by parsing the C++-headers and auto-generating C glue-code.
I've done something similar while creating the fbsql Interbase/Firebird-API for Lisp, but that was all manual work.

Step 2

Then I'll need to find a way to create ILC-code that resembles this object oriented C-interface and creates a real class from it, fills the method implementations with stubs that load the library and finds the symbols to P/Invoke the functions, respectively.
An alternative to directly writing the ILC-code would be writing C#-code instead. But I'd like to eliminate the dependency on a C#-compiler by directly writing ILC-code, if this is not incredibly hard. I don't have a clue about how ILC looks like and how complex it is.

Direct solution

There is, however, an even more advanced approach. Theoretically I can parse C++-headers (with gcc-xml) and extract all classes and methods. I create ILC code from that which loads the library and the (mangled) symbols and do a bit voodoo to fake C++'s object mechanism. It's actually not that hard, I only need to allocate memory and call the ctor on that memory with the correct calling-convention to create an object.
I like having the C-interface, though, because it gives me more flexibility and I can use it from non-.NET-languages, too. This way I have a truly language-independent interface that should be easy to use from anywhere.

Wednesday, November 7, 2007

New AMD Phenom X4 fails to impress

The Phenom X4 is about to hit the shelves. There's been
word about pricing
already, but none about performance so far. Well, until now. There's a Crysis benchmark over at expreview.com. I think many expected a miracle happening on the AMD side, because their CPUs, humbly spoken, suck at this time. The results are disillusioning, though:
While the E6850, QX6850 and QX9650 (all 3GHz CPUs) run at the same average of 49 FPS, the Phenom X4 only does 46 FPS at the same clock-rate of 3GHz.
All intel CPUs use a 333MHz (quad-pumped) Front-Side-Bus and a low multiplicator of 9, while the Phenom X4 has a low FSB of 200MHz and a high multiplicator of 15. That means that the communication of the CPU to the periphery is 40% slower, which might, in my humble no hardware-geekish opinion, be the reason for the slowdown.
And now here's the even more demolishing opinion of mine: As you might know I own a Q6600 which is clocked at 2.4GHz. I overclocked it to 3GHz, the same clock-speed the benchmark used, and noticed virtually *no* difference. This could mean that the Phenom X4 is even slower than the Q6600 with 600Mhz less clock-speed!
It's still very early and I don't want to jump to conclusions as of yet, but I don't expect much from AMD's upcoming CPU generation.

Monday, November 5, 2007

Learning a foreign language? Try Anki!

There are many approaches to learn vocabulary of foreign languages, and one of those are flashcards. There's an awesome and at the same time lightweight software out there that helps you learning with flashcards. It's localized to English, Spanish, French, German, Japanese and Dutch and comes with example decks for Japanese and Russian words.
If you're interested, try Anki now at http://ichi2.net/anki/.
It's totally free and even open-source.

Efficient and reliable file management

We all know Microsoft explorer. We all know the millions of products that copy it. We also know, from earlier times, the Commander software and we should at least have seen a few of the millions of products that copy it. None of them suits my needs, unfortunately.

Status Quo

Efficiency is everything in my daily life with computers. You should be able to accomplish every task with the minimal count of keystrokes, and have every task freely pausable, stoppable and resumable, if technically possible. And, of course, you need to parallelize tasks and have them reliable work next to each other.
For the Microsoft Explorer, for example, reliability is something unknown. In the default configuration, if one explorer window, or some seemingly unrelated system-task crashes, it takes all your copy operations with you. And it's impossible to resume copy operations with the Microsoft Explorer. I have to admit that the new version shipped with Windows Vista is a complete overhaul and does address some of the main issues I had with XP and earlier Windowses. First, I like the Task-dialogs. Task-dialogs give you big, recognizable buttons with labels that actually tell you what they do after you click them, including a small, detailed explanation beneath them. This is a good example for intuitive, yet non-intrusive user-interface design. But the new Explorer still fails at reliably resuming copy operations or predicting the results of those. For example, you can't specify whether you want the timestamps of the files to be copied stay the same or are updated to 'now'. When moving files, the timestamps don't change. When copying files, they're updated. But, especially when copying files when transferring some large amounts of data that's in daily use, I want the timestamps to stay the same! And it may actually be a big problem if they change, for backup-reasons. No chance with Explorer.


Additionally, I want to see both progress in bytes and percent. And I want to see speed in a unit like MByte/s or MBit/s and a reliable prediction of the ETA. Here, too, the new Vista Explorer does a better job than it's XP counterpart. I guess everybody already became victim of a 500kb file that, according to Explorer, would take 32038 days to copy from the CD to your hard disk.
Then, to increase throughput, you might want to fine-tune some parameters like the buffer-size. The buffer-size can be very important in copy operations depending on the source and destination. For example, for a network source from a SMB-share to your local hard-drive or from one hard-drive to the other, you should use very different buffer-sizes. The hard-drives should cope with much bigger buffers than the network-protocol-bound SMB transfer (this is still speculation at this time, though).
Finally resume-operations should be fully customizable. I want options like 'Always overwrite newer', 'Ask if size differs', 'Always overwrite smaller', 'Check file-contents if smaller' and much more. I need those for various situations. You use file-copy operations in many different scenarios that all require different behavior.
A great feature would be to postpone files that the software is unsure about to the end. It happened *so* often that I started a large (like a few hours) file-operation and came back just to see that it asked me for the fifth file whether I want to hug, seize or overwrite it. If it skipped that file and already copied all the others, it would've saved me hours of either my free-time or my work-time.


Let's take the Explorer-example, again. In Explorer, you can do everything with the keyboard. But maybe not as easily as you wanted. Jumping to a new address can become a hassle. I noticed that the hotkey to jump to the address-pane changed in every small update the explorer ever got. I think it was already set to each character that you can find in 'address', namely a, d, r, e and s. In the Vista-explorer, you don't even know how to jump to the breadcrumb-style address-pane. But I'll tell you a secret: It's Alt+D. For now, it might change next week.
But still, everything you can do in Explorer you can relatively "easy" do with they keyboard. But there are just things you can't do with Explorer. Selecting files by pattern? Nope. You can use the search-function as a workaround, which isn't optimal and only suits some limited needs. Imagine a dir filled with hundreds of seemingly random files and you want to delete all that start with an A and end in .xxx. Maybe only if they have the archive-bit set or have some special UNIX-permissions. Or only directories. You simply can't, and have to pick them by hand. I want a mixed Regular Expression and file-attribute filter to select my files to delete, copy or move them.


So here's a feature round-up that I demand from a file-management-software:
- Large number of options for file-transfers
- Resumable operations
- Independent, parallel operations
- Postponing files that it is not sure about
- User-interface completely keyboard-compatible
- Selection by regex and file-attribute filtering, maybe recursive
- Command-line as an alternative to GUI would be awesome!

High-performance, high usability SQL-List

This is about list-controls in graphical user interfaces that display query-contents from a sql-query. I think most of us know the hassles of scroll-bars in lists that are dynamically filled with result-sets from sql-queries. The problem with it is that after you began to fetch datasets, you don't know how many to come. It may be 20, or it may be 20.000.000.

Status Quo

So most application-writers are lazy and just fetch some more than fit into the list, and fetch more as the user tries to scroll down. That's actually not as lazy as it sounds, since even this simple solution is problematic to implement depending on GUI-toolkit or API. Hooking up with the correct scroll-bar-events and making it at least a bit bearable to the user can become quite a hassle in itself. And when you've managed to make it work, it's still far from usable: The user doesn't know how many datasets are displayed in the list, either, and the scroll-bar and other visual implications might let the user think that there are only a few datasets. After he scrolled down for 5 minutes, he might finally understand that there's much more data than he might initially have thought. There's the Ctrl+End shortcut that jumps to the end of lists. Most implementations display an hourglass-cursor or a progress-bar in this situation. It might be slow, because they fetch all the content in the list when you want to reach the end.

There are better ways to do this!

Incrementally filling the list in the background (a second thread) might or might not be the best solution to this problem. First, you definitely need an activity-implicator, like a small text saying "Fetching..." somewhere. And you need a way to cancel the fetching in case the user doesn't even remotely care about any but the first 30, 10 or even 2 datasets, or simply cancels the operation. The implementation should be smooth enough so the user doesn't actually care whether it's still fetching, or not, anyways. But being able to cancel it is not a bad idea, either.
Now to more detailed implementation suggestions: Fetching in the background is, of course, nice to have. The user can interact with the software as early, or even earlier, than with the naive 'fetch-first-20'-approach, and still, after some time, knows how many datasets there are. He can't immediately know how many there are, because it's technically impossible. You could do a select count(*) from table beforehand, but it's not even guaranteed the second select will yield the same number of datasets and it may cost a LOT of initial wait-time and up to double the server-load than the following approach. Fetching all available datasets takes some time, too, but we have no other choice but to let the user wait until we gathered the needed information. But it's good to let him interact with the parts of the list we already fetched.

Speeding it up

So there are two things you're eager to know: You need to know the contents of the first X datasets, where X is the number of datasets displayed in the list. And you need to know how many datasets are available in total. Fetching the first X datasets is easy, speeding up fetching all datasets is a bit tricky, though.
The trick is to fetch only the primary key of the relevant datasets. This will be magnitudes faster than fetching the whole dataset, which might include large blobs of text or pictures. Let's imagine a table "list" that consists of the fields ID (integer, primary key), Name (varchar) and Description (Text). The list is generated from the query select * from list. What the implementation does is convert this sql into two sql-statements: select id from list and select * from list where id=?. You execute the first query, and check after each fetch whether a dataset's content needs to be fetched. In the display-routine of the list, you mark each dataset that needs to be displayed as 'needs to be fetched' (if it isn't already available, of course). If there are datasets that need to be displayed, the fetching-thread stops, fetches the relevant datasets and sends a signal to the list that those datasets are ready to be displayed. This works remarkably well. The only pitfall is the thread-synchronization-issue, which might or might not be easily possible in your programming language of choice. The fetcher-thread signals new datasets to the list-view, so the list-view can increment it's item-count to update the scrollbar's appearance. It's wise not to do this after each fetch. If you did so, the signal-queue might fill up and slow down the application. You should only send an update-signal to the list every 500ms or so. This is easier on the eyes for the user and it'll yield better performance and usability. Depending on the list-implementation, the slider of the scrollbar should be usable even while filling the list. With Qt's QListView, for example, this works like a charm.


So we've managed to overcome every shortcoming that SQL-based lists usually yield:
- The user sees the first datasets immediately
- There's virtually no wait-time
- The total number of datasets will be available as fast as technically possible
- Only the content of datasets that need to be displayed will be fetched
- Even skipping to the end will not fetch every dataset
- The network-traffic is therefore kept MINIMAL for the desired behaviour