Revision 502 – Obsessive Parser Retry

I’m not proud of this one. Please bear with be:

In revision 493, I wrote about how the parser for “–nickname=” actually pushes content to three separate parsers, and simply chooses the one with the best results. That was all about not trying to guess the content, but leave the guessing to the parsers. Whichever one gets the most, it wins. Too easy, and extremely scalable.

Problem is, the underlying Apache lib used to fork-off the incoming stream — to avoid downloading a file multiple times to parse it — that doesn’t always seem to work.

I put a lot of time and concern into trying to figure out why, but in the end, I just added a retry-counter.

When all parsers return a “shoot, I dunno” response, we simply run it again. And again. And again. …not so obsessive because we give up after 3 times, but you’re free to make it as psychotic/obsessive as you want.

To describe this, I verbosely wrote “add retries to the parsing so that we can thrash on a file if we need to just-get-it-done”

I promise to do better design in the future, but for now, this will only re-download a file for each full retry cycle. This doesn’t matter at all for file:// URLs, but for ftp://, bnapsql://, and http://, it will show up as multiple tries.