SOLID FRAMEWORK 10.0.18950

Solid Framework SDK has been updated.

Note:

Language detection, page orientation detection and OCR character recognition has been improved for non Latin languages. To include these improvements please update required files using traineddata.zip from Solid Framework downloads.

Improvements: 

  • [docx]  Content detection algorithm extended using pdf tags to bias output.
  • [office]  Improved detection of bound orientation for specific languages and the preferred orientation of that language.
  • [office]  Improve content detection of Korean language documents.
  • [docx]  Improved the detection of textboxes with similar layout to a two-cell table.
  • [docx]  Improved the detection of table header content.
  • [docx]  Improved table detection when cells in multiple columns contain only one hyphen.
  • [docx]  Improved the detection of line spacing.
  • [docx]  Improved detection of textboxes between column breaks.
  • [docx]  Improved the detection of grouped objects with similar layout to a table.
  • [docx]  Improved the detection of page headers.
  • [json]  Implemented strikethrough property in json export.
  • [json]  Improved the detection of text bounds.
  • [docx]  Improved the detection of tables within columns of text.
  • [docx]  Improved the detection of header content after page orientation change within a document.
  • [pptx]  Improved detection of character spacing.
  • [office]  Improved OCR analysis of non Latin glyphs.

Bugfixes: 

  • [docx]  Fixed a bug preventing the detection of some tables in a certain document.
  • [docx]  Fixed a bug causing four characters on a variant color background to be omitted from conversion output.
  •   Fixed a bug causing additional text to be included in the target of a URL link.
  • [pptx]  Fixed a bug causing a word to be mislocated on a slide.
  • [xlsx]  Fixed a bug causing the cells of one row to become merged on a certain document.

SOLID FRAMEWORK 10.0.18816

Solid Framework SDK has been updated.

Improvements: 

  • [docx] Improved the detection of text when the left margin of the scanned document contains noise. 
  • [docx] Improved list detection in pdfs using image bullets. 
  • [office] Improved the detection of graphic tables. 
  • [xlsx] Improved column detection of partially bordered tables.  
  • [docx] Implemented internal bookmark to a specific page in the Word document to match the pdf links.  
  • [json] Improved detection of small caps.  
  • [docx] Improved the order detection of overlapping shapes and images.  
  • [docx] Improved detection of column breaks.  
  • [docx] Improved detection of vertical Japanese text. 
  • [docx] Improved detection of borderless tables.  
  • [json] Improved detection of ‘Table of Contents’ bounds.  
  • [json] Improved handling of arbitrary text rotation in json export.

Bugfixes: 

  • [json] Fixed an issue causing partial detection of a line of text. 
  • [docx] Fixed an issue causing incorrect merging of separate tables on a page. 
  • [docx] Fixed an issue causing rows of certain table content to become merged.

SOLID FRAMEWORK 10.0.18708

Solid Framework SDK has been updated.

Improvements: 

  • [docx]     Improved column and row detection of hybrid split tables.
  • [office]     Implemented the recognition of non standard encoded vertical Japanese characters.
  • [office]     Improved the precision of non standard encoded Arabic character coordinates.
  • [docx]     Improved detection of single column non-table content.
  • [docx]     Improved table detection.
  • [office]     Improved the rendering of Type 3 font glyphs.
  • [docx]     Improved the optical character recognition of large images on 32 bit platforms.
  • [docx]     Implemented list recognition in pdfs using image bullets.
  • [json]     Implemented the option to ignore the detection of tiled pages in json export.
  • [json]     Improved json export of pages that exceed Microsoft size
  • [json]     Support nested tables being placed inside corresponding cell contents.
  • [office]     Applied custom language string options for Chinese text recovery.
  • [office]     Improved initialization mode for use of Thai trained data language file.
  • [docx]     Improved the detection of list items to prevent inclusion of undesirable footnote content.
  • [office]     Implemented automatic rotation detection of Japanese and Korean documents using optical character recognition.

Bugfixes: 

  • [docx]     Fixed a bug causing the misdetection of multiple glyph shapes representing a single letter “e” in a document.
  • [docx]     Fixed a bug preventing detection of a borderless table when the table contained extended spaces between rows.
  • [docx]     Fixed a bug causing the detection of unnecessary column breaks in a document with right to left aligned Arabic text.
  • [docx]     Fixed a relative height calculation issue preventing a very large document from opening in Microsoft Word.
  • [docx]     Fixed a bug preventing the detection of columns when ignoring the tagged table structure of a pdf.

SOLID FRAMEWORK 10.0.18610

Solid Framework SDK has been updated.

Improvements: 

  • [docx]     Improved optical character recognition of black text on a gray background.
  • [json]      Support link property in JSON conversion output.
  • [json]      Improved the detection of annotation identification.
  • [docx]     Improved the forced detection of non-standard encoding of Arabic, Chinese, Japanese and Korean characters of multiple page documents.
  • [docx]     Improved the optical character recognition preprocessing of vector text.
  • [docx]     Improved detection of paragraph styles.
  • [docx/xlsx]     Improved the detection of borderless tables.
  • [json]     Improved hyperlink detection.
  • [json]     Improved reliability of paragraph coordinates of rotated textboxes.
  • [json]     Improved detection of the page rotation angle when autorotate is manually disabled.

Bugfixes: 

  • [docx]     Fixed a bug causing an extra line of body text to be detected in the header.
  • [json]      Improved the detection of page orientation containing inconsistently orientated text.
  • [docx]     Fixed a bug resulting in the partial loss of specific bounding boxes of a document.

SOLID FRAMEWORK 10.0.18460

Solid Framework SDK has been updated.

Improvements: 

  • [docx]    Improved the detection of small images comprised of thousands of objects in the original pdf container.
  • [docx]    Improved the order detection of overlapping shapes and images.
  • [docx]    Improved the detection of Japanese characters.
  • [docx]    Improved the stability of font color detection in text boxes with varying fill colors.
  • [docx]    Improved the optical character recognition preprocessing of vector text.
  • [docx]    Improved the stability of page orientation when vertical Japanese text is detected.
  • [docx]    Improved the detection of bullet and list items in Korean language documents.
  • [docx]    Improved the detection of Latin characters in Korean language documents.
  • [docx]    Improved the detection of Japanese language.
  • [docx]    Implemented post processing of images to improve optical character recognition.
  • [docx]    Improved detection of full page black image overlaid with images of white text.

Bugfixes: 

  • [xlsx]     Fixed an issue resulting in various rows of a large table to become combined.
  • [docx]    Fixed an issue causing a textbox to convert with an incorrect fill color.
  • [docx]    Fixed an issue preventing the detection of a hyperlink.
  • [docx]    Fixed an issue resulting in text being converted as an image.
  • [docx]    Fixed an issue that caused the conversion time of a specific document to be extended.

Security 

A limited number of third-party libraries have been updated to include the latest security fixes.

SOLID FRAMEWORK 10.0.18370

Improvements: 

  • [docx] Improved the detection of standard office bar charts and variants. 
  • [docx] Improved the detection of Chinese language.  
  • [docx] Improved the optical character recognition preprocessing of vector text.  
  • [docx] Improved the column detection of left to right aligned text.  
  • [docx] Improved the stability of graphic color detection.  
  • [docx] Improved the detection of header content.  
  • [docx] Improved detection of white text located on a dark background.   
  • [docx] Improved handling of text where the text and background colour match.  
  • [docx] Improved table detection.  
  • [docx] Improved the detection of diagrams.  
  • [docx] Improved detection of black text located on a grey background.    
  • [office] Improved language and page orientation detection. 

Bugfixes: 

  • [docx] Fixed an issue causing Latin characters in a Chinese document to be misplaced. 
  • [docx] Fixed an issue where a large graphic element caused text recovery failure. 
  • [pdf] Fixed an issue preventing the marked property from being retained. 
  • [docx] Fixed an issue preventing the detection of the correct bounds of a graphic element. 
  • [docx] Fixed a performance issue where dense vector graphics prevented successful optical character recognition of a file. 
  • [docx] Fixed an issue causing conversion delay of complex one-page document. 
  • [docx] Fixed a bug preventing the rendering of the first page of a detected Table of Contents. 

SOLID FRAMEWORK 10.0.18270

Solid Framework SDK has been updated.

Improvements: 

  • [pdf] Introduce option to save pdf page orientation as tagged data instead of auto rotating.
  • [office] Improved algorithm for averaging text properties of a paragraph that contains unicode groups to describe a single Arabic glyph. 
  • [docx] Improved list detection. 
  • [docx] Improved the text line assembly of Arabic content with diacritics. 
  • [office] Improved detection of small caps text. 
  • [office] Improved detection of Arabic language when minimal English text is near the Arabic language. 
  • [docx] Improved detection of header content. 
  • [docx] Improved z-order placement of graphic in conversion output. 
  • [docx] Improved detection of narrow columns on borderless tables. 
  • [docx] Improved conversion result of self-interesecting glyph outlines. 
  • [pdf] Improved tag support of various layout options. 
  • [office] Improved page margin calculation to be multiples of 1/4 inch for the imperial measurement system and 1/4 centimetre for the metric measurement system. 
  • [docx] Improved the rendering of Type 3 fonts. 
  • [docx] Improved use of tab stops to space content on a single line.  
  • [docx] Improved conversion result when encoding of original pdf contains large, broken text areas.  
  • [json] Support detection of table headers. 
  • [json] Support rectangle span element. 
  • [json] Improved detection of even-odd page header bounds. 
  • [json] Improved detection of table headers. 
  • [json] Support XObject ID for annotation graphic groups and textboxes. 
  • [json] Improved detection of span bounds for line with small caps.

Bugfixes: 

  • [pdf] Fixed a compression algorithm issue that caused the corruption of data during conversion of a specific file. 
  • [docx] Fixed an issue causing the background of an image to become transparent.  

Security 

  • A limited number of third-party libraries have been updated to include the latest security fixes.

SOLID FRAMEWORK 10.0.18108

Solid Framework SDK has been updated.

Improvements: 

  • [Office] Improved algorithms required to layout right to left body paragraph text.  
  • [Office] Improvements to right to left text character matching and diacritic handling. 
  • [Office] Improved Tatweel (Arabic) language detection. 
  • [Office] Optimized detection performance for non-standard encoded characters using Tesseract.  
  • [docx] Improved detection of footnote text. 
  • [docx] Improved header detection. 
  • [docx] Column detection improvements. 
  • [Office] Improved rendering of specific Type3 font. 

Bugfixes: 

  • [docx] Fixed an issue preventing successful conversion of a file. 

Misc: 

  • All projects are now compiled using C++17 language features

SOLID FRAMEWORK 10.0.18028

Solid Framework SDK has been updated.

Bugfixes: 

  • [docx] Fixed an issue preventing successful conversion of a file. 
  • [docx] Fixed a chunk usage error preventing successful conversion of a file.
  • [docx] Fixed an issue preventing all text boxes located in front of graphic from being detected. 
  • [pptx] Fixed and issue preventing an orange colored graph line from rendering. 
  • [office] Fixed and issue causing Arabic characters to be misordered. 
  • [docx] Fixed an issue preventing detection of the right table border. 
  • [office] Fixed a glyph width issue causing illegible text output. 
  • [docx] Fixed a bug detecting an extra tab after a bullet point.   

Office Fidelity: 

  • [docx] Improved detection of parenthesis in right-to-left aligned text when the PDF characters are incorrectly encoded.
  • [docx] Improved detection of hanging indents for right-to-left aligned text. 
  • [office] Improved detection of Latin characters within Arabic document. 
  • [office] Improved detection of parenthesis in right-to-left aligned Arabic text. 
  • [office] Improved alignment detection of right-to-left aligned Hebrew document. 
  • [office] Improved alignment detection of right-to-left aligned Hebrew document. 
  • [office] Improved detection of Arabic text when the PDF characters are incorrectly encoded. 
  • [docx] Improved hybrid table cell detection. 
  • [docx] Improved detection of custom numbered list. 
  • [docx] Improved hybrid table row detection. 
  • [docx] Implement custom metadata field support. 
  • [docx] Improved hybrid table column detection.

SOLID FRAMEWORK 10.0.17926

Solid Framework SDK has been updated.

Feature Update: 

  • Enable support of a licensed installation of IRIS. 

Bugfixes: 

  • [docx] Fixed an issue preventing successful conversion of a file. 
  • [docx] Fixed an issue preventing one image of many from being correctly rendered. 
  • [docx] Fixed an issue preventing successful conversion of a file on Linux operating systems only. 
  • [docx] Fixed an issue preventing the detection of a Table of Contents due to the text order of the file. 
  • [pdf] Fixed an issue preventing PDFA-2b validation of a document when certain font combinations are installed.  

Office Fidelity: 

  • [docx] Improved detection of breaks on scanned documents containing Arabic text. 
  • [office] Streamlined optical character recognition workflow of large documents containing non-standard encoded text. 
  • [office] Allowed page snapshot deletion where annotations exist. 
  • [office] Improved processing of non-standard encoded characters to unicode. 
  • [office] Improved detection of combined characters. 
  • [office] Improved detection of Arabic diacritic characters. 
  • [office] Improved detection of transparent watermarks over scanned pages. 
  • [docx] Improved detection of Table of Contents. 
  • [rtf] Improved detection of characters when converting to RTF. 
  • [docx] Improved detection of shapes when converting to DOCX. 
  • [docx] Improved detection of serial images that contain underlines.  

Security 

  • A limited number of third-party libraries have been updated to include the latest security fixes.