Solid Framework SDK 10.0.12602
SDK for converting and extracting value from PDFs
Public Member Functions | Static Public Member Functions | Properties | Events | List of all members
SolidFramework.Converters.PdfToExcelConverter Class Reference

Extract tables from PDF Pages to Microsoft Excel Spreadsheet. More...

Inheritance diagram for SolidFramework.Converters.PdfToExcelConverter:
SolidFramework.Converters.PdfToOfficeDocumentConverter SolidFramework.Converters.SolidConverterPdf SolidFramework.Converters.Converter

Public Member Functions

override void Dispose ()
 
 PdfToExcelConverter ()
 Initializes a new instance of the PdfToExcelConverter class.
 
override void Convert ()
 Starts extracting tables from PDF file to Excel document. More...
 
override void AddSourceFile (string fullpath)
 Add a PDF file to be converted to the source collection. More...
 
override void AddSourceFile (SolidFramework.Plumbing.Document document)
 Add a PDF file to be converted to the source collection from a loaded Document object. More...
 
override SolidFramework.Converters.Plumbing.ConversionStatus ConvertTo (string fullpath, bool overwrite)
 Converts single file to specified path More...
 
global::System.IO.Stream GetResultAsStream ()
 Gets the result as stream. More...
 
SolidFramework.Converters.Plumbing.ConversionStatus ConvertTo (string fullpath)
 Converts single file to specified path More...
 
void Cancel ()
 Cancels the conversion.
 
bool IsCanceled ()
 
void AddSourceFiles (SolidFramework.StringsArray paths)
 Add a collection of PDF file paths to be converted to the source collection. More...
 
void ClearSourceFiles ()
 Clear the current source files collection.
 
virtual void OnProgress (SolidFramework.ProgressEventArgs args)
 
virtual void OnWarning (SolidFramework.WarningEventArgs args)
 
bool Equals (SolidFramework.Converters.Converter other)
 
override bool Equals (object obj)
 
override int GetHashCode ()
 

Static Public Member Functions

static SolidFramework.Converters.Plumbing.ConversionStatus Convert (string sourceFileName, string destinationFileName)
 Convert the specified source file into the specified Excel file. More...
 
static SolidFramework.Converters.Plumbing.ConversionStatus Convert (string sourceFileName, string destinationFileName, bool overwrite)
 Convert the specified source file into the specified text file. More...
 
static bool operator== (SolidFramework.Converters.Converter obj1, SolidFramework.Converters.Converter obj2)
 
static bool operator!= (SolidFramework.Converters.Converter obj1, SolidFramework.Converters.Converter obj2)
 

Properties

bool AutoDetectSeparators [getset]
 Gets or sets whether to attempt to automatically detect decimal and thousands separators, based on the content of the PDF being converted. Default is true.
 
bool DetectTiledPages [getset]
 Gets or sets whether to attempt to automatically reconstruct tables that were so wide that they were tiled across multiple pages. Default is true.
 
SolidFramework.Converters.Plumbing.ExcelDocumentType OutputType [getset]
 Gets the data document format.
 
bool TablesFromContent [getset]
 Deprecated alias for KeepNonTableContent.
 
bool KeepNonTableContent [getset]
 Gets or sets whether to include non-table content such as images or text in the reconstructed Excel document.
 
bool PreserveColumnsInNonTableContent [getset]
 Gets or sets whether to place non-table content that is detected within columns in the PDF into separate columns in the reconstructed Excel file. If true then non-table content that is detected will be placed into separate columns. If false then such content will all be placed into the first column. If KeepNonTableContent is false then no non-table content will be included in the reconstructed file and this option will have no meaning.
 
SolidFramework.Converters.Plumbing.ExcelTablesOnSheet SingleTable [getset]
 Gets or sets whether to join all output information into a single sheet. Default is PlaceEachTableOnOwnSheet.
 
SolidFramework.Converters.Plumbing.DecimalSeparator DecimalSeparator [getset]
 Gets the decimal separator. More...
 
SolidFramework.Converters.Plumbing.ThousandsSeparator ThousandsSeparator [getset]
 Gets the thousands separator. More...
 
SolidFramework.Converters.Plumbing.HeaderAndFooterMode HeaderAndFooterMode [getset]
 Get or sets a E:HeaderAndFooterMode that specifies the header and footer mode. Default is HeaderAndFooterMode.Remove.
 
SolidFramework.Converters.Plumbing.FootnotesMode FootnotesMode [getset]
 Get or sets a E:FootnotesMode that specifies the footnotes mode. Default is FootnotesMode.Ignore.
 
SolidFramework.Converters.Plumbing.MarkupAnnotConversionType MarkupAnnotConversionType [getset]
 This method is deprecated. Use the property TextAnnotationsAsContent instead.
 
bool TextAnnotationsAsContent [getset]
 Gets or sets whether text annotations should be converted into content. This property will be ignored if KeepNonTableContent is false. More...
 
SolidFramework.Converters.Plumbing.SelectedAreas SelectedAreas [getset]
 Gets or sets the selected areas. The default is null. More...
 
string UserProperties [getset]
 Gets or sets the user properties. The default is an empty string. More...
 
string TextRecoveryLanguage [getset]
 Gets or sets the text recovery language. Default is an empty string. More...
 
bool AutoRotate [getset]
 Gets or sets whether to automatically rotate pages based on the orientation of the text on the pages. Default is true. More...
 
bool TextRecoverySuspects [getset]
 Gets or sets a value indicating whether to highlight low confidence words (any word with SolidFramework.Imaging.OcrWord.Confidence less than 100) in the OCR output in yellow. This property has no effect if OCR has not occurred. Default is false.
 
SolidFramework.Converters.Plumbing.TextRecovery TextRecoveryType [getset]
 Gets or sets the text recovery type. The default value is automatic. More...
 
SolidFramework.Converters.Plumbing.TextRecoveryNSE TextRecoveryNseType [getset]
 Gets os sets the text recovery NSE type. The default value is automatic. More...
 
SolidFramework.Converters.Plumbing.TextRecoveryEngine TextRecoveryEngine [getset]
 Gets or sets the text recovery engine. The default value is automatic. More...
 
SolidFramework.Converters.Plumbing.TextRecoveryEngineNse TextRecoveryEngineNse [getset]
 Gets or sets the non-standard encoding recovery engine. The default value is automatic. More...
 
SolidFramework.Converters.Plumbing.TextRecoveryAutomaticGNse TextRecoveryAutomaticGNse [getset]
 Gets the automatic GNSE options. More...
 
SolidFramework.Converters.Plumbing.ImageWatermark ImageWatermark [getset]
 Gets or sets an image as a watermark in the converted file More...
 
string Password [getset]
 Gets the password. More...
 
SolidFramework.PageRange PageRange [getset]
 Gets the page range. More...
 
bool DetectSoftHyphens [getset]
 Gets or sets a value indicating whether soft-hyphens should be detected. The default is false.
 
bool NoRepairing [getset]
 In the event of a damaged PDF (InternalError) do not attempt to repair the PDF before conversion.
 
bool GraphicsAsImages [getset]
 Gets or sets whether vector images should be converted to bitmap images. Default is false. More...
 
bool KeepInvisibleText [getset]
 Gets or sets whether text that is invisible because it has no stroke or fill (PDF rendering mode 3 is typically used for a searchable layer in scanned pages) should be recovered. Default is false. More...
 
bool KeepBackgroundColorText [getset]
 Gets or sets whether text that is invisible because it is the same colour as the background should be recovered. Default is false. More...
 
System.Object CustomData [getset]
 Custom data to provide during Progress events
 
SolidFramework.Converters.Plumbing.IConversionResultsCollection Results [get]
 Gets the results of conversion
 
string SourceDirectory [getset]
 Gets the source directory. More...
 
string OutputDirectory [getset]
 Gets or sets the directory where converted files will be placed when created using Convert. More...
 
SolidFramework.Plumbing.OverwriteMode OverwriteMode [getset]
 Get whether to overwrite an existing output file. More...
 

Events

global::System.EventHandler< SolidFramework.ProgressEventArgsProgress
 Triggered when progress occurs during PDF conversion. More...
 
global::System.EventHandler< SolidFramework.WarningEventArgsWarning
 Triggered when a warning is raised during PDF conversion.
 

Detailed Description

Extract tables from PDF Pages to Microsoft Excel Spreadsheet.

Member Function Documentation

◆ AddSourceFile() [1/2]

override void SolidFramework.Converters.PdfToExcelConverter.AddSourceFile ( SolidFramework.Plumbing.Document  document)
virtual

Add a PDF file to be converted to the source collection from a loaded Document object.

Parameters
documentDocument object loaded with a PDF file.
Exceptions
System.ArgumentNullExceptionThrown when Document object is null.

Reimplemented from SolidFramework.Converters.Converter.

◆ AddSourceFile() [2/2]

override void SolidFramework.Converters.PdfToExcelConverter.AddSourceFile ( string  fullpath)
virtual

Add a PDF file to be converted to the source collection.

  • Parameters
    fullpathString of the full path to the PDF file.
    Exceptions
    System.IO.FileNotFoundExceptionThrown when path is invalid.

Reimplemented from SolidFramework.Converters.Converter.

◆ AddSourceFiles()

void SolidFramework.Converters.Converter.AddSourceFiles ( SolidFramework.StringsArray  paths)
inherited

Add a collection of PDF file paths to be converted to the source collection.

Parameters
pathsReadOnlyCollection(String) collection of PDF source file paths.

◆ Convert() [1/3]

override void SolidFramework.Converters.PdfToExcelConverter.Convert ( )
virtual

Starts extracting tables from PDF file to Excel document.

Source and target files must be set before calling Convert().

Reimplemented from SolidFramework.Converters.Converter.

◆ Convert() [2/3]

static SolidFramework.Converters.Plumbing.ConversionStatus SolidFramework.Converters.PdfToExcelConverter.Convert ( string  sourceFileName,
string  destinationFileName 
)
static

Convert the specified source file into the specified Excel file.

Parameters
sourceFileNameString of the full path to the PDF file.
destinationFileNameString of the full path to the destination Excel file.
Returns
ConversionStatus of the resulting conversion.

◆ Convert() [3/3]

static SolidFramework.Converters.Plumbing.ConversionStatus SolidFramework.Converters.PdfToExcelConverter.Convert ( string  sourceFileName,
string  destinationFileName,
bool  overwrite 
)
static

Convert the specified source file into the specified text file.

Parameters
sourceFileNameString of the full path to the PDF file.
destinationFileNameString of the full path to the destination text file.
overwriteIf the destination files exists, overwrite it or not.
Returns
ConversionStatus of the resulting conversion.

◆ ConvertTo() [1/2]

SolidFramework.Converters.Plumbing.ConversionStatus SolidFramework.Converters.Converter.ConvertTo ( string  fullpath)
inherited

Converts single file to specified path

Parameters
fullpathThe fullpath.
Returns

◆ ConvertTo() [2/2]

override SolidFramework.Converters.Plumbing.ConversionStatus SolidFramework.Converters.PdfToExcelConverter.ConvertTo ( string  fullpath,
bool  overwrite 
)
virtual

Converts single file to specified path

Parameters
fullpathThe fullpath.
overwriteif set to true [overwrite].
Returns

Reimplemented from SolidFramework.Converters.Converter.

◆ Dispose()

override void SolidFramework.Converters.PdfToExcelConverter.Dispose ( )
virtual

◆ GetResultAsStream()

global.System.IO.Stream SolidFramework.Converters.PdfToOfficeDocumentConverter.GetResultAsStream ( )
inherited

Gets the result as stream.

Returns
The result as stream.

Property Documentation

◆ AutoRotate

bool SolidFramework.Converters.PdfToOfficeDocumentConverter.AutoRotate
getsetinherited

Gets or sets whether to automatically rotate pages based on the orientation of the text on the pages. Default is true.

If a page in the PDF file is set to landscape, and all text is flowing up and down, automatically rotate the page to portrait.

◆ DecimalSeparator

SolidFramework.Converters.Plumbing.DecimalSeparator SolidFramework.Converters.PdfToExcelConverter.DecimalSeparator
getset

Gets the decimal separator.

The decimal separator.

◆ GraphicsAsImages

bool SolidFramework.Converters.SolidConverterPdf.GraphicsAsImages
getsetinherited

Gets or sets whether vector images should be converted to bitmap images. Default is false.

If true then vector images will be converted to bitmap images; otherwise they will be retained as vector images.

◆ ImageWatermark

SolidFramework.Converters.Plumbing.ImageWatermark SolidFramework.Converters.SolidConverterPdf.ImageWatermark
getsetinherited

Gets or sets an image as a watermark in the converted file

A watermark is an impression that displays on top of or below the content of a PDF page. Image watermarks can be useful for placing a logo on each page. To set an ImageWatermark, assign the path to the Image, ImageWatermark.Path = "C:\\MyDocuments\\MyWatermark.jpg"; Alternatively watermarks can be text. TextWatermark watermark = new TextWatermark(); watermark.Text = "DRAFT";

◆ KeepBackgroundColorText

bool SolidFramework.Converters.SolidConverterPdf.KeepBackgroundColorText
getsetinherited

Gets or sets whether text that is invisible because it is the same colour as the background should be recovered. Default is false.

If true then text that is the same colour as the background will be recovered, else it will be discarded.

◆ KeepInvisibleText

bool SolidFramework.Converters.SolidConverterPdf.KeepInvisibleText
getsetinherited

Gets or sets whether text that is invisible because it has no stroke or fill (PDF rendering mode 3 is typically used for a searchable layer in scanned pages) should be recovered. Default is false.

If true then invisible text that has no stroke or fill will be recovered, else it will be discarded.

◆ OutputDirectory

string SolidFramework.Converters.Converter.OutputDirectory
getsetinherited

Gets or sets the directory where converted files will be placed when created using Convert.

The output directory.

◆ OverwriteMode

SolidFramework.Plumbing.OverwriteMode SolidFramework.Converters.Converter.OverwriteMode
getsetinherited

Get whether to overwrite an existing output file.

Set the desired action when the output file already exists. Default is FailIfExists.

◆ PageRange

SolidFramework.PageRange SolidFramework.Converters.SolidConverterPdf.PageRange
getsetinherited

Gets the page range.

The page range.

◆ Password

string SolidFramework.Converters.SolidConverterPdf.Password
getsetinherited

Gets the password.

The password.

◆ SelectedAreas

SolidFramework.Converters.Plumbing.SelectedAreas SolidFramework.Converters.PdfToOfficeDocumentConverter.SelectedAreas
getsetinherited

Gets or sets the selected areas. The default is null.

The selected areas.

◆ SourceDirectory

string SolidFramework.Converters.Converter.SourceDirectory
getsetinherited

Gets the source directory.

The source directory.

◆ TextAnnotationsAsContent

bool SolidFramework.Converters.PdfToExcelConverter.TextAnnotationsAsContent
getset

Gets or sets whether text annotations should be converted into content. This property will be ignored if KeepNonTableContent is false.


This property replaces the deprecated MarkupAnnotConversionType.

◆ TextRecoveryAutomaticGNse

SolidFramework.Converters.Plumbing.TextRecoveryAutomaticGNse SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryAutomaticGNse
getsetinherited

Gets the automatic GNSE options.

The automatic GNSE options.

◆ TextRecoveryEngine

SolidFramework.Converters.Plumbing.TextRecoveryEngine SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryEngine
getsetinherited

Gets or sets the text recovery engine. The default value is automatic.

The text recovery engine.

◆ TextRecoveryEngineNse

SolidFramework.Converters.Plumbing.TextRecoveryEngineNse SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryEngineNse
getsetinherited

Gets or sets the non-standard encoding recovery engine. The default value is automatic.

The NSE recovery engine.

◆ TextRecoveryLanguage

string SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryLanguage
getsetinherited

Gets or sets the text recovery language. Default is an empty string.

The text recovery language.

Format for each language is as follows: "es" for Spanish "de" for German "it" for Italian "pl" for Polish "pt" for Portugeuse "ru" for Russian "fr" for French "nl" for Dutch "en" for English

◆ TextRecoveryNseType

SolidFramework.Converters.Plumbing.TextRecoveryNSE SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryNseType
getsetinherited

Gets os sets the text recovery NSE type. The default value is automatic.

The type of the text recovering NSE.

◆ TextRecoveryType

SolidFramework.Converters.Plumbing.TextRecovery SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryType
getsetinherited

Gets or sets the text recovery type. The default value is automatic.

The type of the text recovery.

◆ ThousandsSeparator

SolidFramework.Converters.Plumbing.ThousandsSeparator SolidFramework.Converters.PdfToExcelConverter.ThousandsSeparator
getset

Gets the thousands separator.

The thousands separator.

◆ UserProperties

string SolidFramework.Converters.PdfToOfficeDocumentConverter.UserProperties
getsetinherited

Gets or sets the user properties. The default is an empty string.

The user properties.

Event Documentation

◆ Progress

global.System.EventHandler<SolidFramework.ProgressEventArgs> SolidFramework.Converters.Converter.Progress
inherited

Triggered when progress occurs during PDF conversion.

ProgressEventArgs contains the amount of progress, the maximum amount of progress, a status code, a description, plus optional custom data.