Monday, November 5, 2007

High-performance, high usability SQL-List

This is about list-controls in graphical user interfaces that display query-contents from a sql-query. I think most of us know the hassles of scroll-bars in lists that are dynamically filled with result-sets from sql-queries. The problem with it is that after you began to fetch datasets, you don't know how many to come. It may be 20, or it may be 20.000.000.

Status Quo

So most application-writers are lazy and just fetch some more than fit into the list, and fetch more as the user tries to scroll down. That's actually not as lazy as it sounds, since even this simple solution is problematic to implement depending on GUI-toolkit or API. Hooking up with the correct scroll-bar-events and making it at least a bit bearable to the user can become quite a hassle in itself. And when you've managed to make it work, it's still far from usable: The user doesn't know how many datasets are displayed in the list, either, and the scroll-bar and other visual implications might let the user think that there are only a few datasets. After he scrolled down for 5 minutes, he might finally understand that there's much more data than he might initially have thought. There's the Ctrl+End shortcut that jumps to the end of lists. Most implementations display an hourglass-cursor or a progress-bar in this situation. It might be slow, because they fetch all the content in the list when you want to reach the end.

There are better ways to do this!

Incrementally filling the list in the background (a second thread) might or might not be the best solution to this problem. First, you definitely need an activity-implicator, like a small text saying "Fetching..." somewhere. And you need a way to cancel the fetching in case the user doesn't even remotely care about any but the first 30, 10 or even 2 datasets, or simply cancels the operation. The implementation should be smooth enough so the user doesn't actually care whether it's still fetching, or not, anyways. But being able to cancel it is not a bad idea, either.
Now to more detailed implementation suggestions: Fetching in the background is, of course, nice to have. The user can interact with the software as early, or even earlier, than with the naive 'fetch-first-20'-approach, and still, after some time, knows how many datasets there are. He can't immediately know how many there are, because it's technically impossible. You could do a select count(*) from table beforehand, but it's not even guaranteed the second select will yield the same number of datasets and it may cost a LOT of initial wait-time and up to double the server-load than the following approach. Fetching all available datasets takes some time, too, but we have no other choice but to let the user wait until we gathered the needed information. But it's good to let him interact with the parts of the list we already fetched.

Speeding it up

So there are two things you're eager to know: You need to know the contents of the first X datasets, where X is the number of datasets displayed in the list. And you need to know how many datasets are available in total. Fetching the first X datasets is easy, speeding up fetching all datasets is a bit tricky, though.
The trick is to fetch only the primary key of the relevant datasets. This will be magnitudes faster than fetching the whole dataset, which might include large blobs of text or pictures. Let's imagine a table "list" that consists of the fields ID (integer, primary key), Name (varchar) and Description (Text). The list is generated from the query select * from list. What the implementation does is convert this sql into two sql-statements: select id from list and select * from list where id=?. You execute the first query, and check after each fetch whether a dataset's content needs to be fetched. In the display-routine of the list, you mark each dataset that needs to be displayed as 'needs to be fetched' (if it isn't already available, of course). If there are datasets that need to be displayed, the fetching-thread stops, fetches the relevant datasets and sends a signal to the list that those datasets are ready to be displayed. This works remarkably well. The only pitfall is the thread-synchronization-issue, which might or might not be easily possible in your programming language of choice. The fetcher-thread signals new datasets to the list-view, so the list-view can increment it's item-count to update the scrollbar's appearance. It's wise not to do this after each fetch. If you did so, the signal-queue might fill up and slow down the application. You should only send an update-signal to the list every 500ms or so. This is easier on the eyes for the user and it'll yield better performance and usability. Depending on the list-implementation, the slider of the scrollbar should be usable even while filling the list. With Qt's QListView, for example, this works like a charm.


So we've managed to overcome every shortcoming that SQL-based lists usually yield:
- The user sees the first datasets immediately
- There's virtually no wait-time
- The total number of datasets will be available as fast as technically possible
- Only the content of datasets that need to be displayed will be fetched
- Even skipping to the end will not fetch every dataset
- The network-traffic is therefore kept MINIMAL for the desired behaviour

1 comment:

  1. Did you consider using asynchronous SQL queries? Should be quite easy with Qt and Postgres.