Hmm. Good point. It must not use a single point, even though that's how the old light pen interfaces used to work. Turns out the close integration of console games with their video timing chain holds the clue.
My initial response was that perhaps it was a series of frames with distinct patterns encoding a binary screen address. That would only take as many frames as there are bits in the address...so if the pointing resolution was, say, 300x200 (fairly high, really, it's prolly much lower than that) the highest address would be 0xEA60, no more than 16 bits...you can show 16 frames in around 2600 msec.
OK, that was a good try on my part. But it turns out I was overthinking: here's two real answers (with two real patent numbers to go with them)...I like the second one best. :
http://www.howstuffworks.com/question273.htm