pywin32 and docs

Surely not a new topic but… There was a thread yesterday on then python mailing list which turned on how to access the Desktop folder under Windows. This is actually a fairly frequently-asked question, but when I tried a few queries against Google, even targetting msdn directly, no clear answers emerged. (I was pretending I didn’t know what to look for, just using phrases like “find desktop folder”).

Once an answer was given, the OP asked how one knew where to look for that kind of information and pointed out that the pywin32 docs are hard to find things in. This is a hardy perennial and people have tried to form wikis and knowledge bases and so on, but nothing’s ever really taken off. Mike Driscoll, in the same thread, said he would try to pursue more vigorously the initiative he started before the summer (which I can’t find a link for now: sorry, Mike). These things need a lot of energy though. I keep an occasionally-updated series of Win32 how-do-I pages which I hope are useful but much more is needed.

Part of the issue is that pywin32 — without which Python under Windows would be a much much less powerful thing — does little more than wrap the underlying API with the necessary Python plumbing. So a lot of the time, a question like “How do I do X using Python under Windows?” is really better phrased “How do I X under Windows? And how do I translate the answer into Python?”. Not infrequently I answer questions on the Python and Python-Win32 lists by Googling for a non-Python answer and then translating. No expertise on my part beyond quite a few years’ experience in tracking down Windows APIs which can give me a few words to look for.

No real answers here: ctypes has the same problems, only more so, since it doesn’t even do the plumbing for you. There are a few Windows-specific modules around, including my own. But Python’s history and present is very much *nix-oriented and the balance of expertise tends to be over there. (Along, in some cases, with the dismissive attitude as soon as Microsoft technologies are mentioned).

I see it as my mission (of sorts) to champion Python as a force for much good under Windows: I try to pick up any Windows-specific questions on the lists, even if only to show willing. And I know I’m by no means alone in this. We just need a little more effort in making the darker corners of Windows clearer to innocent Pythoneers. My feeling is that recipes are the way to go (hence the How-do-I? series) but I’d welcome any initiative which had staying power.

Little gems in SQL Server 2005

I know it’s clear I don’t have all that much time for the “innovations” in SQL Server 2005, but there are two syntax additions which I’ve been yearning for for years. (Since I moved from a company which used Oracle, in fact). These are: INTERSECT and EXCEPT (the latter known as MINUS in other dialects of SQL). They complete the list of set operations which has always included UNION. And the practical use is much clearer code in some circumstances.

We have, for example, a set of data on our database and we want to know what’s been added compared with a backup set from a few days back. So:

SELECT * FROM bus_barcode EXCEPT SELECT * FROM bus_barcode_bak

What could be simpler? Throw in a UNION ALL (and some messiness to cope with operator precedence) and you have symmetric difference:

SELECT * FROM (
  SELECT * FROM bus_barcode EXCEPT SELECT * FROM bus_barcode_bak
) AS live_bak
UNION ALL
SELECT * FROM (
  SELECT * FROM bus_barcode_bak EXCEPT SELECT * FROM bus_barcode
) AS bak_live

Brilliant!

Comparing Unicode and JPEG

I recently needed to explain to a colleague of mine how Unicode and encodings worked. This guy’s a programmer — although from a non-technical background — and doesn’t usually have an issue with technical concepts, but the usual explanations weren’t really getting through. I found that the explanation below helped him out, so I reproduce it here for anyone else’s benefit:

When you have an image which you save as a JPEG file, you’re taking something you see on screen (whose internal format you neither know nor care) and saving it encoded in the JPEG way in a file whose extension is .jpg, a hint to the image viewer application that the contents of the file is a picture encoded a certain way. Of course an image viewer might be able to work that out by sniffing the header, but the extension saves some time.

You could have saved it as some black-and-white format if, for example, you wanted a smaller file size and were prepared to lose some detail (viz, the colours) or if you knew that the colours involved were only black and white anyway. Or you might have used GIF if you knew that it would be more efficient in space terms for this particular image or that someone you were sending it to could only read GIFs. And so on. Ultimately, you know that any application which pretends to be able to read a file of the format you’ve specified will present on screen the image you started off with.

Encoding text is much the same. You start with text which looks like something on the screen. Often it’s conventional Western characters (the unaccented letters a-z in upper and lower case); sometimes there are accents on top or extra characters; you can even have entirely different character sets, such as Chinese or Linear B. Ultimately, you want that to appear on someone else’s screen (or printout or browser) as it appears on yours.

So you save it in a format which you both know, and you say what the encoding (format) is. Some encodings will only allow you to store characters in a certain range (say, Western characters only or Russian characters only). Others will store everything. You make the same decisions about losing data and the coherence of the result. You could use encodings, such as UTF-8 which guarantee to encode every codepoint in the Unicode universe, but that might be more expensive in terms of space. (Although UTF-8 does its best to encode common Western characters in fewer bytes, which is helpful if you’re using common Western characters!). When someone at the other end loads the text into their editor or browser, their application decodes it back into Unicode again and displays the appropriate characters.

But how does that application know what encoding was used in the first place? Well that’s the unfortunate difference: there’s no established way to indicate what encoding (format) text is in. There are some conventions (an HTTP charset header, for example, or a first line which looks like this: # -*- coding: utf-8 -*-) but none is universal. You could have an understanding between you and people with whom you exchange text that replaces the common .txt extension by an encoding name. But it would only work at that level. There are ways of sniffing the encoding (ranging from the simple recognition of an initial byte pair to statistical analysis of the contents) but none is foolproof.

Ultimately, though, in the same way in which you trust your image viewers to unpack image data from its format to some native Image type and possibly to pack it into another format elsewhere, your applications have to unpack encoded data from a file to a native Unicode type until it needs to be written out to storage or to another system, at which point you encode it again, advertising the encoding as best you can.

.reg files - what are those funny numbers?

If you’ve ever exported a segment of the registry into a .reg file for application elsewhere, you may have noticed that a value which seems like a perfectly coherent string in the Registry Editor becomes a sequence of hexadecimal digits. Something like this:

In the registry editor, I have the string:

"This is a test"

In the .reg file:

"test"=hex(2):54,00,68,00,69,00,73,00,20,00,69,00,73,00,20,00,61,00,20,00,74,\
  00,65,00,73,00,74,00,00,00

So how do you find out what all those numbers were? (Assuming you don’t still have the value in the registry). Python to the rescue. First, let’s cut-and-paste the digits only into an interpreter window. We could have used a triple-quoted string, but as it happens they’ve already got line continuation markers, so:

x = "54,00,68,00,69,00,73,00,20,00,69,00,73,00,20,00,61,00,20,00,74,\
  00,65,00,73,00,74,00,00,00"

Now, we do that thing you never do in Python[*] : a one-liner.

encoded_string = "".join (chr (int (i, 16)) for i in x.split (","))
print encoded_string

But it’s got all those extra spaces in it! Aha. I haven’t yet waved my magic Unicode wand:

print encoded_string.decode ("utf_16_le")

And voila! The string you first thought of. So how did I know it was UTF16-LE encoded? Lucky guess, coupled with quite a few years of wandering around Windows.

Ok, the briefest of explanations for anyone who’s less interested in my showmanship and more in a working solution. Windows has taken the string, encoded it in little-endian UTF16 which gives two bytes per codepoint, and then represented that encoding as hexadecimal digits in the .reg file. To reverse the effect, we unstitch the string along the commas, convert each of the resulting digit-strings to an integer using base 16 and convert each of those integers to its corresponding character. That gives you a UTF16 string which is just one decode away from the Unicode string you started off with.

Clear?

[*] except when you do

The cross-platform nature of Python

Python is a very cross-platform sort of language. That is, it runs on no few platforms including the significant three: Linux, OS X and Windows. As a rule you can run programs unchanged across those platforms (and possibly others). You may have to be the tiniest bit cautious with filepath formats or assumptions about default directories and so on. But I, for example, have a couple of Quixote-based websites which run completely unchanged as Apache-CGI programs on my WinXP laptop and the Redhat-based webspace I rent. I take that so much for granted that I forget just how much of an achievement it is.

There’s a thread on the python-list at the moment in which someone’s asking for a cross-platform equivalent to a signal-based approach to timing-out functions. Now, the alarm signal is a Unix-specific concept: it’s not even supported by some kind of POSIX layer under Windows. So what should Python do to achieve cross-platformability?

  1. Nothing: recognise that not every system has everything
  2. Some kind of conditionally-defined trickery within the interpreter loop that checks for some Windows-equivalent event object

A lot of questions of the sort “Can’t X be included in Python / the stdlib?” can be answered “Get an implementation in PyPI, see what the uptake is, and come back later”. This kind of thing, though, is more finicky because to be truly effective it needs the cooperation of the interpreter’s internals. Certainly you can fossick around with timeout threads and so on but then, as the saying goes, you have two problems. Obviously nothing’s going to happen here: signal-handling under Unix comes effectively for free; anything else would require an unconscionable amount of mucking with the interpreter which isn’t going to happen.

But I wonder whether something like Windows’ cross-process events which come for free under Windows and, as far as I know, don’t exist under Unix, would have been built-in if the boot had been on the other foot.

How do I get the window for a subprocess.Popen object?

[This only applies to Win32, obviously!]
Perhaps you want to be able to close an app cleanly by sending it a WM_CLOSE? Or maybe you want to minimize it on demand? To do that you have to get the hWnd of its top-level window. You’d have thought there’d be some kind of get-window-from-process-id API call, wouldn’t you? Well, according to all the available literature, there isn’t. (Maybe there is in .NET? I wouldn’t know). The code below pretty much illustrates the canonical approach: loop over available windows, find their process id, and compare against the process id you first thought of. Since a process could have multiple visible top-level windows, I’ve allowed for a list of HWNDs but this is probably overkill.

import subprocess
import time

import win32con
import win32gui
from win32gui import IsWindowVisible, IsWindowEnabled
import win32process

def get_hwnds_for_pid (pid):

  def callback (hwnd, hwnds):
    if IsWindowVisible (hwnd) and IsWindowEnabled (hwnd):
      _, found_pid = win32process.GetWindowThreadProcessId (hwnd)
      if found_pid == pid:
        hwnds.append (hwnd)
    return True

  hwnds = []
  win32gui.EnumWindows (callback, hwnds)
  return hwnds

if __name__ == '__main__':
  notepad = subprocess.Popen ([r"notepad.exe"])
  #
  # sleep to give the window time to appear
  #
  time.sleep (2.0)

  for hwnd in get_hwnds_for_pid (notepad.pid):
    print hwnd, "=>", win32gui.GetWindowText (hwnd)
    win32gui.SendMessage (hwnd, win32con.WM_CLOSE, 0, 0)

Using MSSQL from within Python

As you may have gathered from earlier posts I’m a professional database developer, currently using Microsoft SQL Server 2000 / 2005. Wherever it makes sense, I use Python to access and manipulate the data. Over the years I’ve tried out several different libraries for connecting to SQL Server so I present below what seems to be the current landscape.

ODBC Solutions

ntwdblib.dll Solutions

Other Solutions

My own default choice these days is pyodbc. It’s active and pretty much feature-complete (they added the .nextset method at my request). It’s also the dbapi module of choice for sqlalchemy’s mssql backend. ceODBC is a relative newcomer, and it’s not clear what it offers which the others don’t. mxODBC is of course a stable and mature product which recently released v3, and probably the most fully-featured, but it does require a commercial license.

The ntwdblib solutions all suffer from the fact that MS are removing support for that method of db access after MSSQL 2005 (and even in SQL2005 it’s only there in SQL2000-compatibility mode), so their days are numbered. In addition, the Object Craft module hasn’t been updated for 3 years with no supplied binary past Python 2.3. (I used it for years before that so I can attest to its robustness). The pymssql module also hasn’t updated in about a year, and does have some slightly funny issues with Unicode.

Of the two other solutions, the osql one is a bit of a curiosity, parsing the output from SQL Server’s commandline tools. As is noted in the comments on the recipe, it’s mostly handy when you’ve got no other possibilities, perhaps in an admin situation. The adodbapi module was moribund for several years, but has recently been picked up again by Vernon Cole who’s resurrected the SourceForge project and applied some of the outstanding patches. That said, I’m not aware that it offers anything which I can’t achieve with an ODBC solution.

Update (June 2008): adodbapi is now maintained as part of the pywin32 packages, as of release 211.

Sadly, “The Dark is Rising”

When I was younger, I loved Susan Cooper’s fantasy sequence The Dark is Rising. I remember reading a few years ago that some studio had picked up the option to make a cinema-release film out of it and I was just about as enthusiastic as I could be. Then the thing seemed to go quiet (which seems to happen quite a lot with films). Earlier this year, Walden Media of Narnia & Holes fame announced that they were going ahead with the film, based on the second book after which the sequence is named. Naturally I was thrilled. While I didn’t think their first Narnia film was the best thing since, at least it was moderately faithful to the spirit of the author’s created world. And Holes was pretty much spot-on.

Alas, I should have known better than to trust Hollywood. When the trailer appeared on the internet, I scrambled for it and hunched agog over my laptop screen… Well, I nearly cried. About the only thing the film and the book have in common as far as I can see are the title and the names of some of the characters. Will Stanton, that understated but very normal Buckinghamshire lad, youngest son of a boisterous but affectionate family, has become Will Stanton the American schoolboy who discovers superman-like powers under the tutelage of a bunch of frankly weird-looking adults who seem to lack any of the dignity and gravitas of their literary counterparts.

I realise that a film doesn’t have to be — arguably shouldn’t be — a simple rendering of the book onto (digital) celluloid. But surely audiences can appreciate the appeal of something which stands out from the teenage-superhero mould and which shares a background with a host of British folk legends? Obviously I haven’t seen the film, but if the trailer’s anything to go by, we’ve lost any of the magic of the original, to be replaced by nascent telekenesis, pyrokenesis and an apparently obligatory love interest.

Oh well. Another dream bites the dust.

UPDATE: Evidently, I’m not alone.

bugs.python.org

The Python issues list has recently completed its move to a roundup tracker hosted at bugs.python.org (hosting donated by some third-party, I seem to remember). I’m sure there’s a fair amount of the merely psychological about it, but I find it much easier to look at and to understand. I’ve long found SourceForge’s interface far too messy and clicky. (How many things do you have to click to download the latest release of a project?). In contrast, the python.org skin on roundup is clear and uncluttered.

Of course, I wish it had some better facilities for searching / sorting.[*] I’m the kind of guy who likes to look at what’s gone on lately, both so I know what’s happened and so I can see if there’s something I can contribute to. I’ve volunteered to help with docs patches (coordinated by Georg Brandl) to ease my conscience somewhat. (I’ve got so much out of Python over the years and put so little back in).

Someone posted — and, frustratingly, I can’t find the link now — a graphical representation of the activity on each issue on the database shown across a timeline. It’s a great way to visualise what’s happening, especially when an issue can suddenly burst into life again after literally years of inactivity.

[*] UPDATE: I’ve just realised how to set up user-defined searches etc. Looks good.

SQL - always learning

I recently attended a course for SQL Server programmers upgrading from SQL Server 2000 to 2005 (as we are at work). Naturally enough I thought that I’d know everything there was to know about the existing product but would find out about the new stuff in 2005. To my surprise we learnt a technique entirely by the way which has been there forever.

If you do this kind of thing:

DECLARE @v VARCHAR (1000)
SET @v = ''
SELECT @v = @v + a.x + ' ' FROM a WHERE a.y = z

then @v will be a space-separated string of all the values of the the x column in table a. I always assumed it would simply give me the first or the last or some arbitrary value.

You live and learn.