If the Olympics had an event in InDesign, MITTERA would bring home the gold It’s one of the primary tools used by our designers to bring all the nifty ideas in their imaginations to life. From weekly ads, magazines, promotional mailers and digital publications — they can do it all.

When a client recently requested our help automating a key performance report, our extensive understanding of the application was insufficient. Our highly versatile data team had to learn the program from an entirely new angle, diving deep into the inner workings of the program.

When saving an InDesign project, the default file format used is INDD, which predictably stands for InDesign Document. Another option is IDML, which stands for InDesign Markup Language.

The IDML format serves two main functions.

  1. It allows for backwards compatibility of an InDesign document back to CS4.
  2. It provides a way for third party developers to interact with an InDesign file, whether it’s creating a one from scratch, modifying an existing file, or scraping information from one.

These were precisely the capabilities we needed to automate our client’s report, so we chose to learn more about the IDML format.

To fully describe the specification is well beyond the scope of this post, but we can provide an overview. (If you’re interested, you can read this handy 500 page manual.)

On the surface, there doesn’t seem to be much difference between an IDML file and an INDD file, except for the file extension. But IDML files carry a little secret in that they (in most cases) are actually just ZIP archive files, containing many other files compressed together. Like any other ZIP file, accessing these other files calls for unzipping the IDML file. After unzipping the IDML file, we see the following files and folders:

  • xml file
  • mimetype file
  • MasterSpreads folder containing xml files
  • META-INF folder containing xml files
  • Resources folder containing xml files
  • Spreads folder containing xml files
  • Stories folder containing xml files
  • XML folder containing xml files

Together, these files contain every bit of information that defines your InDesign document, stored in the XML format. For those unfamiliar, XML is a popular language used to store and transmit data in a highly structured way that’s both human and machine-readable. Curious readers can note this is also the underlying language behind current Microsoft Office files. (It’s the source of the x in the transition from .doc to .docx).

Many of these XML files have information pertaining to document minutiae like encoding, file types, fonts, colors, styles, element relationships, etc. For our purposes, we’ll just focus on two elements that contain the substance of a project, the Spreads and Stories folders.

Already from the names, an experienced (in)designer can probably guess the information stored in these files. The Spreads folder contains an xml file for every spread in the document, detailing information on element positioning, styling, layout, and the images and stories that make up the spread. The Stories folder contains an xml file for every story in the document (every portion of text). An example of these representations is shown below:

InDesign view of the document.

InDesign view of the document.

Entry in the spread xml file for our document’s story. This contains information on how to position the story in the spread and other styling information.

Entry in the spread xml file for our document’s story. This contains information on how to position the story in the spread and other styling information.


The actual story xml file for our document’s text. In addition to the actual text, this contains additional information on how to style the text.

So whereas a designer would manipulate the document via the InDesign application, if we wanted to make some programmatic modifications to this document as part of an automated process, we’d make changes directly to either the spread file or story file — depending on the change we wanted to make.

To reposition the text, for example, we’d make modifications to the coordinates given in the spread file for our story element. To add an entirely new element, we’d create an entry for it in the spread file. If it’s a text element, create a new corresponding story file with the actual content. To apply a different style to the text, we could modify that in the story file. Or if we wanted to extract the text from the document using XML parsing software, we could extract it from the story file instead of messing with cumbersome PDF parsers to scrape a final PDF of our document.

As we went about creating our automated solution, we did all of these things.

Ultimately, we were able to save our client hours of manual effort that was typically spent on generating their report and provided them with a more accurate result in the process. This newfound knowledge has also allowed us to make other internal processes more accurate and efficient, helping us in our relentless efforts to serve our customers better.

-Abhishek Vemuri, Data Analyst