Thursday 14 November 2013

Buy Shares in the Garden Path, friends; it's a wonder!

Not giving actual securities advice, of course; merely commenting on the metaphorical "garden path" one is "led down" by seemingly trustworthy sources. The next thing you know, it's three minutes since you started falling; you've no idea where "bottom" is and even less idea how you got there. (For the record, at three minutes, were it not for atmospheric drag slowing you down, you'd have been falling for ~158,760 meters and have an instantaneous speed of ~1764 m/s. Happy landings!)

All right, what am I blathering on about? It's really simple. (Until you fall off that cliff.) See here…

The problem is actually documented rather clearly in the Position section of the W3C "DOM Level 2 Traversal Range" spec, and had I read that instead of relying on the documentation of the jQuery++ class that wraps it, I'd have saved several hours and $DEITY-knows-how-many litres of stomach acid. According to the spec,

The offset within the node is called the offset of the boundary-point and its position. If the container is [any of 5 different types of] node, the offset is between its child nodes. If the container is a CharacterData, Comment or ProcessingInstruction node, the offset is between the 16-bit units of the UTF-16 encoded string contained by it.

(Emphasis mine.)

The jQuery++ documentation gives an example that uses a single element containing a single text node. No mention is made of the (sensible once you figure it out) distinction between offsets in a text node vs. offsets in an element. And attempting to adapt their example to your markup, which is unlikely to be so trivially simple, is bound to lead to confusion unless you know the answer to The Riddle of the Magical Redefining Offset™.

Now you do, and you can be about your work a great deal more quickly than I.

So how do I do what I set out to do, which is to build up an HTML fragment matching selected text on a page? With great and grandiose ceremony, alas.

  1. Get the active Range for the block-level element containing my content. That gives me the starting and ending offsets (child nodes of that outer block) corresponding to the child nodes that the selection overlaps.
  2. If that outermost range spans multiple child nodes
    1. walk down the first selected node and its descendants, until we find the actual text node containing the start of the selection;
    2. Add that text and the trailing child nodes, if any, of the element node containing that text node to a buffer;
    3. Iterate for that element node's parent element node's trailing child nodes, and on again until we've walked back up to the outermost content area;
    4. Add the entire markup of each succeeding top-level child node up to but not including the block identified by that initial ending offset;
    5. Descend into the top-level ending node and its descendants, adding markup of each node until we reach the actual node containing the endpoint of the selection;
    6. Add the selected text fragment from the end point text node to the buffer. Done.
  3. If that outermost range has a single child node (the starting and ending offsets are the same)
    1. walk down the selected node and its descendants, until we find the actual text node containing the start of the selection;
    2. Add that text and the trailing child nodes, if any, of the element node containing that text node to a buffer;
    3. Iterate for that element node's parent element node's trailing child nodes, and on again until we find a node containing the endpoint node (or that node itself);
    4. Descend into the top-level ending node and its descendants, adding markup of each node until we reach the actual node containing the endpoint of the selection;
    5. Add the selected text fragment from the end point text node to the buffer. Done.

Oh, my aching head.

If anybody has any better ideas, I'd love to hear them. Oh, did I mention that this is in CoffeeScript/JavaScript in the browser, so we don't have any fancy tools like Nokogiri, which I've previously described as "the Swiss Army Ginsu Chainsaw for parsing markup". With tooling like that, this little exercise would be over and done with in an hour. Building a no doubt buggy improper subset of its functionality, in Script, has taken days. Plural, and never mind how plural. If I were a drinking man, there'd be a crate of top-shelf Scotch in my immediate future.

No comments: