SOLID FRAMEWORK 9.2.8564.1 RELEASED

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8564.

MAIN FEATURES

TextRecoveryLanguage

TextRecoveryLanguage is used to specify the language of the document that needs to have text recovered by OCR when creating a CoreModel.

Previously this was always set to “english”. It now defaults to “automatic”. For non-English documents this option will now allow the same result to be created whether conversion is performed using the “Converter” classes or the CoreModel.

Improved Merging of Logical Tables

If a table is split over multiple pages within the PDF then an attempt is made to stitch these back together into a single table.

We have resolved issues that prevented some tables from merging correctly.

Improved handling of Chinese Language files

Several issues associated with reconstruction of Chines Language files have been resolved.

Renaming of properties to improve self-documentation

KeepNonTableContent  has been created as an alias for TablesFromContent. The option specifies how non-table text and images within a PDF should be handled when reconstructing Excel documents.

The name TableFromContent does not clearly identify what the option does.

It has therefore been deprecated and users are advised ot use “KeepNonTableContent” instead.

Solid Framework 9.2.8492.1 Released

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8492.

Main features

Handling of column-based non-table text when converting to Excel

We have created a new Option “PreserveColumnsInNonTableContent”.

If this is true (the default)  then non-table content will be retained in columns when exporting to Excel. This will make the reconstructed spreadsheet look similar to the PDF. If it is false then non-table content will be placed in the first column, with one paragraph per row.

The default value is different from previous version of Solid Framework. This will result in a different, but more faithful reconstruction.

This option is only meaningful if TablesFromContent is true.

Note: in 8472 we implemented this mechanism by making TablesFromContent an enum. However, this could cause compile time issues for our customers, and therefore we have reverted to the previous mechanism.

“FireSaveProgress” and and other  PagesModel  events are now available in C++

SaveProgress and other, similar, events have been available within the .Net version of Solid Framework for a long time.

We have now made the same functionality available within the native C++ version of Solid Framework.

Handling non-table Content when Exporting to Excel – New Options Added

Solid Framework  is great for extracting tables from a PDF and putting them into an Excel Spreadsheet. Options exist to allow you to specify whether to create a separate sheet for each table, or whether to have all of the tables put onto the same sheet.

If a table is split over several pages in the PDF then very often Solid Framework will stitch the various parts of the table back together, rather than creating separate tables.

But what should be done with text and images in a PDF that are not part of a table?

In the past, Solid Framework has offered two choices: either put the content into the first columns of the Excel file (TablesFromContent = true) , or else remove it entirely (TablesFromContent = false).

In version 8472 we changed ExcelTablesFromContent from being a Boolean to being an enum to support a third option. This was reverted in 8492 to avoid compile time errors.

It is recommended that users download the latest version of Solid Framework is they are using 8472.

To support this functionality we have added a new Boolean option PreserveColumnsInNonTableContent

The choices now are:

  •  false is the equivalent of ExcelTablesFromContent = true in previous versions of SolidFramework.
  •  true allows text that is in columns in the PDF to be placed in columns in the reconstructed Excel file. This is now the default.

How this compares in practice

These examples are based on the following text:

PreserveColumnsInNonTableContent = false

Each paragraph is placed into a single cell within the reconstructed spreadsheet. The text for each paragraph is always placed in first column.

PreserveColumnsInNonTableContent = true

Text is retained in columns.  A number of sentences may be included in the same cell and row.

This mode may be particularly useful if table data is not correctly detected, since the spreadsheet will still look similar to the original file.