Forwarding Address: OS X

Monday, May 09, 2005

Spotlight indexing

Spotlight is really frustrating to run on a Mac with a slower hard drive, such as an iBook. It doesn't seem like you can schedule the indexing at all, which means that it can really slow down the system with disk thrashing when you are trying to do something else. There is no UI that says "index less frequently" or "only index when inactive for nn minutes".

Other things I learned today:

  • You can turn off indexing for a partition with "mdutil".
  • You can reindex a partition with "mdutil".
  • The indexes don't take up that much space. With ~33 Gigs filled, my index was taking up about .4 Gig.
  • If you restrict indexing of a directory, the space that was used the first time is not regained until you delete the index and start over.
  • The first estimates of how long it will take to index are wildly wrong. My estimates went from 15 hours to 20 hours to 3 hours. Then it took an hour.

6 Comments:

  • are you talking about the initial spotlight indexing or when connecting an external drive?

    as i understand it, after the initial indexing spotlight shouldn't ever need to scan the drive unless you ask it.

    By Anonymous, at 6:20 PM  

  • Don't even try to use your machine while it's doing its initial index. (or at least don't do anything which will need to do any heavy disk I/O)

    Once the disk is indexed, it shouldn't be a problem again, unless you are creating lots of files very fast. Unlike earlier disk indexing systems such as the old Mac OS 9 FindByContent indexing) there is no scheduled indexing of the drive. Instead, Spotlight uses a new (undocumented, so far) kernel API to be notified every time a file is modified.

    So it normally sits there doing nothing, until a file is created or changed, at which point it reindexes that file. Unless you're changing lots of files at once, it shouldn't be doing much.

    If you decide to create 10,000 files at once, then it may be a bit busy for a few minutes after that. But for normal usage patterns, it works great.

    By Tim Buchheim, at 7:50 PM  

  • The kernel api is documented: kqueue (2)

    By Anonymous, at 9:12 PM  

  • The standard kqueue/kevent API only allows you to watch files you have open (you have to pass in a file descriptor). Since you can only have 256 files open at a time in a single process, Spotlight obviously can't use this mechanism.

    If you look at /usr/include/sys/event.h on Tiger, you'll see that a new kevent type, EVFILT_FS has been added. There's no documentation for it, but it's assumed that this allows you to see all changes to the filesystem. (I suppose one could dig around in the Darwin source to confirm this, but I haven't done so yet.) It's not documented in the kevent/kqueue man page.

    By Tim Buchheim, at 9:30 PM  

  • I found that downloading files from Usenet or Torrent caused the indexing to run continuously, thrashing both CPUs of my dual-1Ghz G4. As new data came in to the downloading file, the file would be continuously indexed over and over, resulting in 200% CPU utilization, all other processes (including the download app) to slow to a crawl. I had to exclude my downloads folder, which is just the sort of folder I'd like to have indexed.
    Yep, there are a few bugs left to be worked out.

    By Anonymous, at 1:48 AM  

  • i saw somewhere that you can open "terminal" and type in a command that lets you specify what drives you can exclude from the "spotlight" indexing program...

    i have several media drives that i'd like to exclude from indexing (i'm aware of the spotlight privacy pane)

    unfortunately, i can't find the information again...does anyone know this information...?

    thanks

    By Dick Swanson, at 5:45 AM  

Post a Comment

<< Home