Handling non-table Content when Exporting to Excel – New Options Added

Solid Framework  is great for extracting tables from a PDF and putting them into an Excel Spreadsheet. Options exist to allow you to specify whether to create a separate sheet for each table, or whether to have all of the tables put onto the same sheet.

If a table is split over several pages in the PDF then very often Solid Framework will stitch the various parts of the table back together, rather than creating separate tables.

But what should be done with text and images in a PDF that are not part of a table?

 

In the past, Solid Framework has offered two choices: either put the content into the first columns of the Excel file (TablesFromContent = true) , or else remove it entirely (TablesFromContent = false).

In version 8472 we changed ExcelTablesFromContent from being a Boolean to being an enum to support a third option. This was reverted in 8492 to avoid compile time errors.

It is recommended that users download the latest version of Solid Framework is they are using 8472.

 

To support this functionality we have added a new Boolean option PreserveColumnsInNonTableContent

The choices now are:

  •  false is the equivalent of ExcelTablesFromContent = true in previous versions of SolidFramework.
  •  true allows text that is in columns in the PDF to be placed in columns in the reconstructed Excel file. This is now the default.

 

How this compares in practice

These examples are based on the following text:

 

PreserveColumnsInNonTableContent = false

Each paragraph is placed into a single cell within the reconstructed spreadsheet. The text for each paragraph is always placed in first column.

PreserveColumnsInNonTableContent = true

Text is retained in columns.  A number of sentences may be included in the same cell and row.

This mode may be particularly useful if table data is not correctly detected, since the spreadsheet will still look similar to the original file.