Solid Framework is great for extracting tables from a PDF and putting them into an Excel Spreadsheet. Options exist to allow you to specify whether to create a separate sheet for each table, or whether to have all of the tables put onto the same sheet.
If a table is split over several pages in the PDF then very often Solid Framework will stitch the various parts of the table back together, rather than creating separate tables.
But what should be done with text and images in a PDF that are not part of a table?
In the past, Solid Framework has offered two choices: either put the content into the first columns of the Excel file (TablesFromContent = true) , or else remove it entirely (TablesFromContent = false).
In version 8472 we changed ExcelTablesFromContent from being a Boolean to being an enum to support a third option. This was reverted in 8492 to avoid compile time errors.
It is recommended that users download the latest version of Solid Framework is they are using 8472.
To support this functionality we have added a new Boolean option PreserveColumnsInNonTableContent
The choices now are:
false is the equivalent of ExcelTablesFromContent = true in previous versions of SolidFramework.
true allows text that is in columns in the PDF to be placed in columns in the reconstructed Excel file. This is now the default.
How this compares in practice
These examples are based on the following text:
PreserveColumnsInNonTableContent = false
Each paragraph is placed into a single cell within the reconstructed spreadsheet. The text for each paragraph is always placed in first column.
PreserveColumnsInNonTableContent = true
Text is retained in columns. A number of sentences may be included in the same cell and row.
This mode may be particularly useful if table data is not correctly detected, since the spreadsheet will still look similar to the original file.
https://solidframework.net/wp-content/uploads/2023/02/Solid-Framework-X-Apryse-logo-300x101.png00Solid Documentshttps://solidframework.net/wp-content/uploads/2023/02/Solid-Framework-X-Apryse-logo-300x101.pngSolid Documents2018-03-13 02:27:052018-03-14 00:54:11Handling non-table Content when Exporting to Excel - New Options Added
The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8472.
MAIN FEATURES
Avoidance of Unnecessary Columns when converting to Excel
Additional columns were created when converting some PDFs to Excel. This made editing difficult. Solid Framework 9.2.8472. no longer creates such columns.
Large Images are now Exportable to Excel
If a PDF contains an image larger than than the maximum row height in Excel (409 pt) then previously it has been discarded. In this release additional rows will be added to allow the image to be shown.
FURTHER OCR IMPROVEMENTS
SolidOCR continues to improve, and this release contains further refinements. In particular, in this release we have made improvements with regard to very small text.
OPTION TO IGNORE TAGS WHEN RECONSTRUCTING A PDF DOCUMENT
Solid Framework has supported tags within PDF documents for many years. Tags are used to guide the reconstruction process, particularly with regard to identifying tables. This can result in visually similar PDFs being reconstructed differently depending on whether or not they are tagged.
We have now added the option DetectTaggedTables (default = true). If the option is set to false then the tags will be ignored when reconstructing the document.
NEW OPTION WHEN EXTRACTING NON-TABLE CONTENT TO EXCEL
When creating an Excel spreadsheet from a PDF, we have always offered a choice as to how text that is not part of a table should be handled. In the past the only options have been to either remove it, or to place it into the first column of the spreadsheet, with one sentence per row.
We have now added the option of “KeepColumns” which will respect the horizontal location of text. This option allows the spreadsheet to look more like the original PDF.
This improvement has required ExcelTablesFromContent to be changed from Boolean to an enum. This may cause compile-time errors in existing code. Please contact us if you require support.
Export to .Doc now creates an RTF file
We recommend choosing the “.docx” file format when converting to Word, as this has been the default format for more than ten years.
We have, however, also supported conversion to “.rtf “and to “.doc”.
While we will be continuing support for “.rtf”, from this release, conversion to “.doc” will actually result in creation of a “.rtf” file with only the file extension being “.doc”. Such files will still open seamlessly in Word.
https://solidframework.net/wp-content/uploads/2023/02/Solid-Framework-X-Apryse-logo-300x101.png00Solid Documentshttps://solidframework.net/wp-content/uploads/2023/02/Solid-Framework-X-Apryse-logo-300x101.pngSolid Documents2018-03-04 20:35:052021-01-17 21:40:07Solid Framework 9.2.8472.1 Released
The latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8284.
MAIN FEATURES
IMPROVED HANDLING OF SHADING
The rendering and conversion of Type 3 and Type 7 shading has been significantly improved.
RED NUMBERS ARE NO LONGER CONSIDERED TO ALWAYS BE NEGATIVE WHEN CONVERTING TO EXCEL
Previously red numeric text was always considered to represent negative numbers. It is now considered to be negative only if it is preceded by a “-” sign.
FURTHER OCR IMPROVEMENTS
SolidOCR continues to improve, and this release contains further refinements. In this release we have made improvements in the recognition of bold text within documents as well as specific improvements within the 32 bit version of the Framework.
SUPPORT FOR REMOVING TAGS FROM A PDF DOCUMENT
Solid Framework has supported tags within PDF documents for many years. Tags are used to guide the reconstruction process, which can result in visually similar PDFs being reconstructed differently depending on whether or not they are tagged.
We have now added functionality to allow tags to be removed if required.
https://solidframework.net/wp-content/uploads/2023/02/Solid-Framework-X-Apryse-logo-300x101.png00Solid Documentshttps://solidframework.net/wp-content/uploads/2023/02/Solid-Framework-X-Apryse-logo-300x101.pngSolid Documents2018-01-30 20:04:112021-01-17 21:55:46Solid Framework 9.2.8284.1 Released
Handling non-table Content when Exporting to Excel – New Options Added
/in Blog notes /by Solid DocumentsSolid Framework is great for extracting tables from a PDF and putting them into an Excel Spreadsheet. Options exist to allow you to specify whether to create a separate sheet for each table, or whether to have all of the tables put onto the same sheet.
If a table is split over several pages in the PDF then very often Solid Framework will stitch the various parts of the table back together, rather than creating separate tables.
But what should be done with text and images in a PDF that are not part of a table?
In the past, Solid Framework has offered two choices: either put the content into the first columns of the Excel file (TablesFromContent = true) , or else remove it entirely (TablesFromContent = false).
In version 8472 we changed ExcelTablesFromContent from being a Boolean to being an enum to support a third option. This was reverted in 8492 to avoid compile time errors.
It is recommended that users download the latest version of Solid Framework is they are using 8472.
To support this functionality we have added a new Boolean option PreserveColumnsInNonTableContent
The choices now are:
How this compares in practice
These examples are based on the following text:
PreserveColumnsInNonTableContent = false
Each paragraph is placed into a single cell within the reconstructed spreadsheet. The text for each paragraph is always placed in first column.
PreserveColumnsInNonTableContent = true
Text is retained in columns. A number of sentences may be included in the same cell and row.
This mode may be particularly useful if table data is not correctly detected, since the spreadsheet will still look similar to the original file.
Solid Framework 9.2.8472.1 Released
/in Release Summary /by Solid DocumentsThe latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8472.
MAIN FEATURES
Avoidance of Unnecessary Columns when converting to Excel
Additional columns were created when converting some PDFs to Excel. This made editing difficult. Solid Framework 9.2.8472. no longer creates such columns.
Large Images are now Exportable to Excel
If a PDF contains an image larger than than the maximum row height in Excel (409 pt) then previously it has been discarded. In this release additional rows will be added to allow the image to be shown.
FURTHER OCR IMPROVEMENTS
SolidOCR continues to improve, and this release contains further refinements. In particular, in this release we have made improvements with regard to very small text.
OPTION TO IGNORE TAGS WHEN RECONSTRUCTING A PDF DOCUMENT
Solid Framework has supported tags within PDF documents for many years. Tags are used to guide the reconstruction process, particularly with regard to identifying tables. This can result in visually similar PDFs being reconstructed differently depending on whether or not they are tagged.
We have now added the option DetectTaggedTables (default = true). If the option is set to false then the tags will be ignored when reconstructing the document.
NEW OPTION WHEN EXTRACTING NON-TABLE CONTENT TO EXCEL
When creating an Excel spreadsheet from a PDF, we have always offered a choice as to how text that is not part of a table should be handled. In the past the only options have been to either remove it, or to place it into the first column of the spreadsheet, with one sentence per row.
We have now added the option of “KeepColumns” which will respect the horizontal location of text. This option allows the spreadsheet to look more like the original PDF.
For more information see the blog note.
This improvement has required ExcelTablesFromContent to be changed from Boolean to an enum. This may cause compile-time errors in existing code. Please contact us if you require support.
Export to .Doc now creates an RTF file
We recommend choosing the “.docx” file format when converting to Word, as this has been the default format for more than ten years.
We have, however, also supported conversion to “.rtf “and to “.doc”.
While we will be continuing support for “.rtf”, from this release, conversion to “.doc” will actually result in creation of a “.rtf” file with only the file extension being “.doc”. Such files will still open seamlessly in Word.
Solid Framework 9.2.8284.1 Released
/in Release Summary /by Solid DocumentsThe latest public release of Solid Framework SDK is now available for download from the developer portal at www.solidframework.net. This is version 9.2.8284.
MAIN FEATURES
IMPROVED HANDLING OF SHADING
The rendering and conversion of Type 3 and Type 7 shading has been significantly improved.
RED NUMBERS ARE NO LONGER CONSIDERED TO ALWAYS BE NEGATIVE WHEN CONVERTING TO EXCEL
Previously red numeric text was always considered to represent negative numbers. It is now considered to be negative only if it is preceded by a “-” sign.
FURTHER OCR IMPROVEMENTS
SolidOCR continues to improve, and this release contains further refinements. In this release we have made improvements in the recognition of bold text within documents as well as specific improvements within the 32 bit version of the Framework.
SUPPORT FOR REMOVING TAGS FROM A PDF DOCUMENT
Solid Framework has supported tags within PDF documents for many years. Tags are used to guide the reconstruction process, which can result in visually similar PDFs being reconstructed differently depending on whether or not they are tagged.
We have now added functionality to allow tags to be removed if required.