Degristling the sausage: BBEdit 11 Edition

Women in uniforms standing at long tables, handling sausages.

Almost two years ago, I wrote a post called “Degristling the sausage” to explain my method of using BBEdit to get a list of which CSS classes are actually applied in a given EPUB file, out of the sometimes hundreds that are included in the stylesheet. Apparently I’m not the only person who needs to do this sort of thing, because that post has stayed in the top four pages on this site ever since, and clever people keep linking to it.

But BareBones recently released BBEdit 11, and although I’m usually a lateish adopter on software updates—particularly when they remove features I’ve been using for years, such as HTML Tidy (I now use Balthisar Tidy to make my HTML readable)—this time I waited a mere three months before buying the new version, and I immediately wished I’d done so earlier. Because why? Because among many other nice new features,

There’s a new button in the Find and Multi-File Search windows: “Extract”. This button (backed by a command on the Search menu, so you can assign a keyboard equivalent to it) will locate all occurrences of the search string (across multiple files, if appropriate) and those occurrences will be collected into a new untitled text document, separated by line breaks.

I had already meant to post an update to the 2012 post, as I’ve made the process a bit more efficient over the years, but this Extract button radically simplifies things. The new process is as follows:

  1. Unzip the EPUB file. (You can edit zipped EPUBs in BBedit, but you won’t get an accurate preview if the pages link to an external CSS file; also, it’s easier to overwrite changes and lose your work if you’re juggling multiple search results windows, as I usually am. So I work on a zipped file only if I’m making very minor changes.) I keep P. Durrant’s ePub Zip/Unzip app in my Finder toolbar so I can drag and drop.

  2. Drag the whole folder onto the BBEdit application to open it as a project.

  3. Open the Multi-File Search dialog (shift-command-F) and in the “Find:” box put
    class=".*?"

  4. Select your project in the “Search in:” box,
    The correctly set dialog box
    make sure “Search nested folders” under “Options…” is checked,
    The correctly set Options dropdown
    and click “Extract.”Poof! This creates a new document containing just the class attributes—probably several thousand lines long.
    A screenshot of the new document containing the extracted text. My example has 5950 lines.
  5. Using the regular Find dialog, replace
    class="(.*?)"\r
    with
    \1\r
    to strip off everything except the class name.

  6. Use Text > Sort Lines… to sort the list.

  7. Use Text > Process Duplicate Lines… to leave only a single instance of each class.

Viola!

“Extract” is not one of the command options when you’re creating a text factory, but you can automate steps 5–7 by placing this file in ~/Library/Application Support/BBEdit/Text Filters/: Unwrap+Sort+Dedupe_classes.textfactory. Then, from your file of extracted occurrences, select Text > Apply Text Filter > Unwrap+Sort+Dedupe_classes.

Easy peasy.

I still like that the old method shows me which HTML elements each class is attached to, and also pulls out any locally applied style attributes. But the convenience of this two-step process outweighs those features, for me.

3 Responses

  1. Wu
    Wu February 17, 2015 at 5:16 pm |

    Or, open the EPUB in Sigil, and click Tools, and Delete Unused Stylesheet Classes.

Leave a Reply

%d bloggers like this: