SOLID FRAMEWORK 10.0.19752

Language detection, page orientation detection and OCR character recognition have been improved for non Latin languages. To include these improvements please update required files using traineddata.zip from Solid Framework downloads.

Next release expected on 1 Apr 26

Improvements: 

  • [office] Enabled text detection support for PDFs with non-standard encoding and based on the following languages:  Bengali, Gujarati, Hindi, Kannada, Malayalam, Manipuri (Meetei Meyah), Oriya, Punjabi, Santali, Tamil, Telugu and Thai.
  • [docx] Implemented a text filtering procedure for scanned pages when text presented with hexadecimal string.
  • [docx] Improved underline property detection.
  • [office] Implemented use of the font family from a PDF file to select fonts installed on the operating system.
  • [office] Improved cell merge algorithm table detection.
  • [docx] Improved detection of text when converted with trial watermark

Bugfixes: 

  • [docx] Fixed a structure detection issue preventing successful conversion of a document.
  • [docx] Fixed a bug where auto rotation prevented accurate text detection.
  • [pptx] Fixed a bug preventing the successful conversion of a document on a specific platform.
  • [docx] Fixed a bug causing the overlapping of text and shapes.
  • [docx] Fixed a bug where a signature component interfered with detection of adjacent text.

SOLID FRAMEWORK 10.0.19632

Language detection, page orientation detection and OCR character recognition have been improved for non Latin languages. To include these improvements please update required files using traineddata.zip from Solid Framework downloads.

Next release expected on 18 Feb 26

Improvements: 

  • [pptx] Add an option to control the un-tiling of PowerPoint presentation handouts, resulting in one slide per handout page.
  • [docx] Improved detection of multiple separate tables on one page.
  • [docx] Improved paragraph detection when line spacing is larger than default line spacing.
  • [docx] Improved detection of hyperlinks.
  • Improved detection of paragraph styles.
  • [docx] Improved the detection of annotations as part of table and layout and detection.

Bugfixes: 

  • [xlsx] Fixed a bug that caused certain text in a document to be detected outside of the table.
  • [pptx] Fixed a bug causing the last image of a slide to be omitted.

SOLID FRAMEWORK 10.0.19506

Language detection, page orientation detection and OCR character recognition have been improved for non Latin languages. To include these improvements please update required files using traineddata.zip from Solid Framework downloads.

Next release expected on 7 Jan 26

Improvements: 

  • [docx]  Expand support for the detection of Type3 fonts.
  • [docx/pptx] Improved the detection of hyperlinks containing special characters, multiple lines and E-mail addresses.
  • [docx] Support retention of special characters for URLs.
  • [pptx] Improved detection of multi-line hyperlinks.
  • [docx] Improved the detection of hyperlink text ending with ‘/*’.
  • [docx] Improved detection of text to display for E-mail address links.
  • [docx] Improved the conversion time of document containing a large amount of ortholine graphics.
  • [office]Improved the detection of Korean text.
  • [xlsx] Improved number detection to exclude IP addresses.

Bugfixes: 

  • [docx] Fixed an issue causing an incorrect style text style detection.
  • [docx] Fixed an issue preventing list detection in a certain document.
  • [docx] Fixed an issue causing a paragraph break to be detected in a multi-line list item.
  • [pptx] Fixed an issue preventing detection of an image in a document.
  • [pptx] Fixed an issue resulting in the trial watermark interfering with text detection.
  • [docx] Fixed an issue causing rows of a nested table to become merged in a document.
  • [docx] Fixed a reading issue of the Xref stream preventing successful conversion of a document.