SOLID FRAMEWORK – 10.0.8920

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 10.0.8920.

MAIN FEATURES

Improved consistency when dealing with borderless tables.

Better Table Detection

Identifying whether or not text is part of a table is complex, particularly when they are no borders to delimit the table edges. This release increases the accuracy with which this is achieved.

More Logical Tables are now Merged Correctly

Solid Framework aims to recreate a single table where it appears as if it has been split over multiple pages within the PDF (i.e. it is a single “logical” table).

This release improves detection of table columns, which allows more logical tables to be recognised and reconstructed.

Improved Column Title Detection

This release is better at identifying text that represents column titles (i.e. the first row of a table).

Solid Framework – 10.0.8870

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 10.0.8870.

This is a Major Release. and includes a number of breaking changes.

MAIN FEATURES

Extraction of wide tables that were tiled over multiple pages

If a table within a spreadsheet is wider than the paper size then it will be created as a multi-page “tiled” PDF. Previously this would have resulted in the file being reconstructed with multiple tables.

Solid Framework 10 is able to reconstruct these pages to give a single wide table.

 

Example of a tiled PDF being reconstructed as a wide spreadsheet

This functionality is enabled by default, but can be disabled by setting DetectTiledPages to false.

 

Exact mode for converting to HTML

Solid Framework has been able to reconstruct HTML for many years. This has been done by “reflowing” the document which results in a web page that may be easy to read, but that may not look like the original PDF.

Solid Framework 10 now allows HTML to be created that looks very similar to the PDF by setting ExactMode to true.

 

Better Handling of Z-Order

Some PDFs with complex layers were not being reconstructed correctly. Solid Framework 10 now handles these files better.

 

FURTHER IMPROVEMENTS IN OCR

Improvements have been made in being more consistent with font types and styles to provide a more aesthetic document.

 

IMPROVED TOLERANCE OF CORRUPT PDFS

A number of PDF files that could not previously be converted due to errors can now be corrected and a valid document reconstructed from them.

Breaking Changes

A number of changes have been made to simplify the API and improve consistency between the managed and native C++ SDK

Run time errors

License.Import now throws InvalidLicenseException  immediately for invalid licenses, rather than delaying the error until conversion is attempted. This could result in slightly different behaviour than was previously the case.

 

Compile time errors

The following may cause compile time errors:

 

Code Removal

  • LicenseCollection has been removed. Use SolidFramework.License.Import instead
  • ValidateOnly and VerifyOnly properties have been removed from PdfToPdfAConverter. Use Validate and Verify methods instead

Deprecation

  • PagesModel.PagesCount has been deprecated in favor of PagesModel.PageCount

Renamed properties

  • PdfPageHolder.CommentsCount has been renamed to PdfPageHolder.CommentCount
  • PdfPageHolder.LinksCount has been renamed to PdfPageHolder.LinkCount
  • OcrTextRegion.OcrLines has been renamed to OcrTextRegion.OcrLineCount
  • OCRTransformationResult.GetPageWordsCount has been renamed to OCRTransformationResult.GetPageWordCount
  • OCRTransformationResult.GetPageConfidentWordsCount has been renamed to OCRTransformationResult.GetPageConfidentWordCount
  • OCRTransformationResult.GetDocumentWordsCount has been renamed to OCRTransformationResult.GetDocumentWordCount
  • OCRTransformationResult.GetDocumentConfidentWordsCount has been renamed to OCRTransformationResult.GetDocumentConfidentWordCount

Property replaced with Method

  • The ViewerPreferences property of PdfDocument.Catalog now has been replaced with a GetViewerPreferences(bool create) method and a RemoveViewerPreferences() method

C++ API specific changes

Changes to Names

  • Methods in the C++ API are now all use Pascal-case (e.g. setOutputPath is now SetOutputPath)
  • Methods starting GetIs, GetHas and GetWas have had their ‘Get‘ dropped
  • PagesModelBase class has been renamed to PagesModel
  • ConverterBase classes have been renamed to Converter (e.g. PdfToWordConverterBase is now PdfToWordConverter)

 

Other Changes

  • SolidFramework.cpp no longer includes stdafx.h
  • CustomData properties have been removed from the Converter classes. (Custom data can captured within a lambda expression when setting the std::function progress/warning callback or stored within a subclass when overriding OnProgress/OnWarning)
  • Collections in the C++ API are now exported as std::vectors

 

Solid Framework 9.2.8681

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8681.

 

 

During routine testing of our large PDF collection, it was found a very small number were no longer able to be converted. This affected approximately 1 PDF in 100,000.

This release resolves the issue.

 

There are no other changes from version 9.2.8680.

Solid Framework 9.2.8680

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8680.

 

MAIN FEATURES

 

Improved Handling of Z-Order when Reconstructing PowerPoint

PowerPoint and Word both support multiple layers of content, with those on top potentially obscuring those beneath. In some situations the order of these layers was incorrect, resulting in text or other items on the page being “lost”. This problem has been resolved for a number of samples which exhibited this problem.

 

Further improvements in OCR

Improvements have been made in beingh more consistent with font types and styles to provide a more aesthetic document.

 

Improved tolerance of corrupt PDFs

A number of PDF files that could not previously be converted due to errors can now be corrected and a valid document reconstructed from them.

 

 

Ongoing improvements in self-documentation

A number of parameters that used obscure numbers now use Enums instead. This is aimed at improving long term maintainability of code but may cause compile time errors.

This affects PagesModel and PdfDocument classes only.

 

SOLID FRAMEWORK 9.2.8564.1 RELEASED

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8564.

 

MAIN FEATURES

TextRecoveryLanguage

TextRecoveryLanguage is used to specify the language of the document that needs to have text recovered by OCR when creating a CoreModel.

Previously this was always set to “english”. It now defaults to “automatic”. For non-English documents this option will now allow the same result to be created whether conversion is performed using the “Converter” classes or the CoreModel.

 

Improved Merging of Logical Tables

If a table is split over multiple pages within the PDF then an attempt is made to stitch these back together into a single table.

We have resolved issues that prevented some tables from merging correctly.

 

Improved handling of Chinese Language files

Several issues associated with reconstruction of Chines Language files have been resolved.

 

Renaming of properties to improve self-documentation

KeepNonTableContent  has been created as an alias for TablesFromContent. The option specifies how non-table text and images within a PDF should be handled when reconstructing Excel documents.

The name TableFromContent does not clearly identify what the option does.

It has therefore been deprecated and users are advised ot use “KeepNonTableContent” instead.

Solid Framework 9.2.8492.1 Released

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8492.

 

Main features

 

Handling of column-based non-table text when converting to Excel

We have created a new Option “PreserveColumnsInNonTableContent”.

If this is true (the default)  then non-table content will be retained in columns when exporting to Excel. This will make the reconstructed spreadsheet look similar to the PDF. If it is false then non-table content will be placed in the first column, with one paragraph per row.

The default value is different from previous version of Solid Framework. This will result in a different, but more faithful reconstruction.

This option is only meaningful if TablesFromContent is true.

Note: in 8472 we implemented this mechanism by making TablesFromContent an enum. However, this could cause compile time issues for our customers, and therefore we have reverted to the previous mechanism.

 

“FireSaveProgress” and and other  PagesModel  events are now available in C++

SaveProgress and other, similar, events have been available within the .Net version of Solid Framework for a long time.

We have now made the same functionality available within the native C++ version of Solid Framework.

 

 

Solid Framework 9.2.8472.1 Released

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8472.

 

MAIN FEATURES

Avoidance of Unnecessary Columns when converting to Excel

Additional columns were created when converting some PDFs to Excel. This made editing difficult. Solid Framework 9.2.8472. no longer creates such columns.

Large Images are now Exportable to Excel

If a PDF contains an image larger than than the maximum row height in Excel (409 pt) then previously it has been discarded. In this release additional rows will be added to allow the image to be shown.

FURTHER OCR IMPROVEMENTS

SolidOCR continues to improve, and this release contains further refinements. In particular, in this release we have made improvements with regard to very small text.

OPTION TO IGNORE TAGS WHEN RECONSTRUCTING A PDF DOCUMENT

Solid Framework has supported tags within PDF documents for many years. Tags are used to guide the reconstruction process, particularly with regard to identifying tables. This can result in visually similar PDFs being reconstructed differently depending on whether or not they are tagged.

We have now added the option DetectTaggedTables (default = true). If the option is set to false then the tags will be ignored when reconstructing the document.

 

NEW OPTION WHEN EXTRACTING NON-TABLE CONTENT TO EXCEL

When creating an Excel spreadsheet from a PDF, we have always offered a choice as to how text that is not part of a table should be handled. In the past the only options have been to either remove it, or to place it into the first column of the spreadsheet, with one sentence per row.

We have now added the option of “KeepColumns” which will respect the horizontal location of text. This option allows the spreadsheet to look more like the original PDF.

For more information see the blog note.

This improvement has required ExcelTablesFromContent to be changed from  Boolean to an enum. This may cause compile-time errors in existing code. Please contact us if you require support.

 

Export to .Doc now creates an RTF file

We recommend choosing the “.docx” file format when converting to Word, as this has been the default format for more than ten years.

We have, however, also supported conversion to “.rtf “and to “.doc”.

While we will be continuing support for “.rtf”, from this release, conversion to “.doc” will actually result in creation of a “.rtf” file with only the file extension being “.doc”. Such files will still open  seamlessly in Word.

Solid Framework 9.2.8284.1 Released

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8284.

 

MAIN FEATURES

IMPROVED HANDLING OF SHADING

The rendering and conversion of Type 3 and Type 7 shading has been significantly improved.

RED NUMBERS ARE NO LONGER CONSIDERED TO ALWAYS BE NEGATIVE WHEN CONVERTING TO EXCEL

Previously red numeric text was always considered to represent negative numbers. It is now considered to be negative only if it is preceded by a “-” sign.

FURTHER OCR IMPROVEMENTS

SolidOCR continues to improve, and this release contains further refinements. In this release we have made improvements in the recognition of bold text within documents as well as specific improvements within the 32 bit version of the Framework.

SUPPORT FOR REMOVING TAGS FROM A PDF DOCUMENT

Solid Framework has supported tags within PDF documents for many years. Tags are used to guide the reconstruction process, which can result in visually similar PDFs being reconstructed differently depending on whether or not they are tagged.

We have now added functionality to allow tags to be removed if required.

Solid Framework 9.2.8186.1 Released

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8186.

Main Features

Better handling of PDF -> PDF/A Conversion

The content of some form fields and some badly encoded images were being lost when converting to PDF/A. This data is now retained.

 

Improved handling of annotations when converting to PowerPoint

Annotations in some PDF files were being reconstructed in the wrong location when converting from PDF to PowerPoint. This is now fixed.

 

Further OCR improvements

SolidOCR continues to improve, and this release contains further refinements.

 

Layout Information available for text within the source document

Solid Framework has been able to provide layout information for text within the reconstructed document for some time. We have now also provided access to the location of text in the original  document.

The two location values may differ slightly if the original document contains fonts or scaling, etc. that are not supported by the reconstructed document format.

Solid Framework 9.2.8150.1 Released

The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8150.

Main Features

Faster Detection of Language within OCR’d files

Solid Framework can detect the language of a scanned document which helps to improve OCR accuracy. For files that contained ambiguous language this could be slow.

We have modified the mechanism to significantly faster with these files.

 

Modified mechanism for specifying the location of Tesseract “traineddata” files

Solid Framework performs OCR for Chinese, Japanese, Korean and Greek language documents using Tesseract. For this to work it is necessary to specify the location of the folder than contains the “traineddata” files for the specific language.

The mechanism for doing this has been modified with the creation of a read/write property   TesseractDataDirectoryLocation. This replaces the method SetTesseractDataDirectory.

For further information see Performing OCR using Tesseract.

 

Further OCR improvements

SolidOCR continues to improve, and this release contains further refinements.