Archive for December 1st, 2008

Bad RSS

From the TMI category for most of you, but of interest to at least two readers…

We’re getting some bogus RSS feeds, some from otherwise respectable media sources. One class of problems has to do with GUIDs (Globally Unique IDs). In particular, we’re seeing a single GUID being used for different programs, which violates the whole idea of a GUID. We thought we could depend on GUIDs as the sole mechanism of identifying a program, but when a site re-uses its GUIDs, the effect is that the programs appear to change more than once every time the feed is scanned, which drives our updating logic crazy. Here’s what I think we’re going to do:

  • If any <guid> appears more than once in the current <item>s of a feed, we’ll never depend on GUIDs for that feed again.
  • If we’ve never seen such a duplicate GUID, we’ll use each <item>s GUID as it’s supposed to be used: to uniquely identify the program.
  • If we’ve ever found a duplicate GUID for a feed, we’ll look at the <title> elements and the <enclosure url= attributes.
  • If either the title OR the url for an item match one that’s in the database for this feed, we’ll assume the scanned item is just a modified version of the program previously found. The reason is that we tend to occasionally see a site change the title or the media filename of a program, but rarely both at once. (If they do change both at once AND they’ve ever used dupe GUIDs, there’s not really much we can do. We have to assume it is indeed a new program.)
  • IOW, if the GUIDs have been bogus and neither an item’s title NOR media URL can be found in the database, we’ll assume it’s a new program.

The Unkindest Click of All

Boys and girls, don’t try this at home. I was doing a bit of late-night file reorganization and I wanted to delete a directory of photos from my 3TB Drobo. I opened Drobo in the OS X finder, right-clicked on the directory and clicked on Move to Trash (OS X). Then I noticed that I’d accidentally moved the mouse and clicked on Backups.backupdb, the folder that contains everything in Time Machine. Oops! Too late. That’s right, 130+GB of files now in the Trash.

Okay, I figured, I should be able to recover from that. So I tried to move the directory from the Trash folder back to Drobo. About 30 hours later that failed, so I gave up on saving my Time Machine database. Next step was to run TIme Machine to create a new backup, and that took the better part of an entire day. I’m now in what I hope is the final stage: emptying the Trash and getting rid of those 7+ million files that I can’t use with Time Machine. “Preparing” took about 20 hours. It looks like “Emptying” should be done by midnight or so.

Lest you think that Time Machine combined with redundant disk storage makes for an idiot-proof backup scheme, just consider that you may need to redefine “idiot.” Hard to believe I managed to find and execute a one-click vulnerability to the whole system, but that’s what I get for trying to do anything risky late at night. Be careful out there.