Convert PDF to HTML document. More...

Inheritance diagram for SolidFramework.Converters.PdfToHtmlConverter:

Public Member Functions
override void	Dispose ()

	PdfToHtmlConverter ()
	Initializes a new instance of the PdfToHtmlConverter class. More...

override void	Convert ()
	Starts the conversion of a PDF file to HTML. More...

override void	AddSourceFile (string fullpath)
	Add a PDF file to be converted to the source collection. More...

override void	AddSourceFile (SolidFramework.Plumbing.Document document)
	Add a PDF file to be converted to the source collection from a loaded `Document` object. More...

override SolidFramework.Converters.Plumbing.ConversionStatus	ConvertTo (string fullpath, bool overwrite)
	Convert to a specified file, stating whether to overwrite the file if it already exists. More...

global::System.IO.Stream	GetResultAsStream ()
	Gets the result as stream. More...

SolidFramework.Converters.Plumbing.ConversionStatus	ConvertTo (string fullpath)
	Converts single file to specified path More...

void	Cancel ()
	Cancels the conversion.

bool	IsCanceled ()

void	AddSourceFiles (SolidFramework.StringsArray paths)
	Add a collection of PDF file paths to be converted to the source collection. More...

void	ClearSourceFiles ()
	Clear the current source files collection.

virtual void	OnProgress (SolidFramework.ProgressEventArgs args)

virtual void	OnWarning (SolidFramework.WarningEventArgs args)

bool	Equals (SolidFramework.Converters.Converter other)

override bool	Equals (object obj)

override int	GetHashCode ()

Static Public Member Functions
static SolidFramework.Converters.Plumbing.ConversionStatus	Convert (string sourceFileName, string destinationFileName)
	Convert the specified source file into the specified HTML file. More...

static SolidFramework.Converters.Plumbing.ConversionStatus	Convert (string sourceFileName, string destinationFileName, bool overwrite)
	Convert the specified source file into the specified HTML file. More...

static void	RebaseHtml (string path, string oldFolder, string newFolder)
	Rebases the HTML. More...

static bool	operator== (SolidFramework.Converters.Converter obj1, SolidFramework.Converters.Converter obj2)

static bool	operator!= (SolidFramework.Converters.Converter obj1, SolidFramework.Converters.Converter obj2)

Properties
int	WidthLimit `[getset]`
	Get maximum width for HTML file images. More...

bool	DetectLists `[getset]`
	Get whether to detect lists. More...

bool	DetectTiledPages `[getset]`
	Attempt to automatically detect tables tiled across multiple pages.

bool	DetectLanguage `[getset]`
	Gets or sets a value indicating whether the document language should be detected from the document content. The default is `true`.

SolidFramework.Converters.HtmlNavigation	HtmlNavigation `[getset]`
	Gets the HTML navigation. More...

SolidFramework.Converters.HtmlSplittingUsing	HtmlSplittingUsing `[getset]`
	Gets the HTML splitting using. More...

SolidFramework.Converters.Plumbing.HtmlImages	Images `[getset]`
	Get or set the how images are handled during conversion. More...

SolidFramework.Converters.Plumbing.ImageDocumentType	ImageType `[getset]`
	Get the format of the image you are converting i.e bmp for Bitmap etc. More...

bool	KeepLineBreaks `[getset]`
	Gets or sets whether line breaks should be preserved in the content. The default is `false`. More...

bool	ExactMode `[getset]`
	Gets or sets a value indicating whether the reconstruction should attempt to represent the original pdf. The default is `false`.

SolidFramework.Converters.Plumbing.HeaderAndFooterMode	HeaderAndFooterMode `[getset]`
	Get or sets a E:HeaderAndFooterMode that specifies the header and footer mode. Default is `HeaderAndFooterMode.Remove`.

SolidFramework.Converters.Plumbing.SelectedAreas	SelectedAreas `[getset]`
	Gets or sets the selected areas. The default is `null`. More...

string	UserProperties `[getset]`
	Gets or sets the user properties. The default is an empty string. More...

string	TextRecoveryLanguage `[getset]`
	Gets or sets the text recovery language. Default is an empty string. More...

bool	AutoRotate `[getset]`
	Gets or sets whether to automatically rotate pages based on the orientation of the text on the pages. Default is `true`. More...

bool	TextRecoverySuspects `[getset]`
	Gets or sets a value indicating whether to highlight low confidence words (any word with SolidFramework.Imaging.OcrWord.Confidence less than 100) in the OCR output in yellow. This property has no effect if OCR has not occurred. Default is `false`.

SolidFramework.Converters.Plumbing.TextRecovery	TextRecoveryType `[getset]`
	Gets or sets the text recovery type. The default value is `automatic`. More...

SolidFramework.Converters.Plumbing.TextRecoveryNSE	TextRecoveryNseType `[getset]`
	Gets os sets the text recovery NSE type. The default value is `automatic`. More...

SolidFramework.Converters.Plumbing.TextRecoveryEngine	TextRecoveryEngine `[getset]`
	Gets or sets the text recovery engine. The default value is `automatic`. More...

SolidFramework.Converters.Plumbing.TextRecoveryEngineNse	TextRecoveryEngineNse `[getset]`
	Gets or sets the non-standard encoding recovery engine. The default value is `automatic`. More...

SolidFramework.Converters.Plumbing.TextRecoveryAutomaticGNse	TextRecoveryAutomaticGNse `[getset]`
	Gets the automatic GNSE options. More...

SolidFramework.Converters.Plumbing.ImageWatermark	ImageWatermark `[getset]`
	Gets or sets an image as a watermark in the converted file More...

string	Password `[getset]`
	Gets the password. More...

SolidFramework.PageRange	PageRange `[getset]`
	Gets the page range. More...

bool	DetectSoftHyphens `[getset]`
	Gets or sets a value indicating whether soft-hyphens should be detected. The default is `false`.

bool	NoRepairing `[getset]`
	In the event of a damaged PDF (InternalError) do not attempt to repair the PDF before conversion.

bool	GraphicsAsImages `[getset]`
	Gets or sets whether vector images should be converted to bitmap images. Default is `false`. More...

bool	KeepInvisibleText `[getset]`
	Gets or sets whether text that is invisible because it has no stroke or fill (PDF rendering mode 3 is typically used for a searchable layer in scanned pages) should be recovered. Default is `false`. More...

bool	KeepBackgroundColorText `[getset]`
	Gets or sets whether text that is invisible because it is the same colour as the background should be recovered. Default is `false`. More...

System.Object	CustomData `[getset]`
	Custom data to provide during Progress events

SolidFramework.Converters.Plumbing.IConversionResultsCollection	Results `[get]`
	Gets the results of conversion

string	SourceDirectory `[getset]`
	Gets the source directory. More...

string	OutputDirectory `[getset]`
	Gets or sets the directory where converted files will be placed when created using `Convert`. More...

SolidFramework.Plumbing.OverwriteMode	OverwriteMode `[getset]`
	Get whether to overwrite an existing output file. More...

Events
global::System.EventHandler< SolidFramework.ProgressEventArgs >	Progress
	Triggered when progress occurs during PDF conversion. More...

global::System.EventHandler< SolidFramework.WarningEventArgs >	Warning
	Triggered when a warning is raised during PDF conversion.

Detailed Description

Convert PDF to HTML document.

Constructor & Destructor Documentation

◆ PdfToHtmlConverter()

SolidFramework.Converters.PdfToHtmlConverter.PdfToHtmlConverter ( )

Initializes a new instance of the PdfToHtmlConverter class.

Defaults to HtmlImages.Default, ImageDocumentType.Default, GraphicsAsImages = true and widthLimit = 0.

Member Function Documentation

◆ AddSourceFile() [1/2]

override void SolidFramework.Converters.PdfToHtmlConverter.AddSourceFile ( SolidFramework.Plumbing.Document document )

virtual

Add a PDF file to be converted to the source collection from a loaded Document object.

Parameters

document Document object loaded with a PDF file.

Exceptions

System.ArgumentNullException Thrown when Document object is null.

Reimplemented from SolidFramework.Converters.Converter.

◆ AddSourceFile() [2/2]

override void SolidFramework.Converters.PdfToHtmlConverter.AddSourceFile ( string fullpath )

virtual

Add a PDF file to be converted to the source collection.

Parameters

fullpath String of the full path to the PDF file.

Exceptions

System.IO.FileNotFoundException Thrown when path is invalid.

Reimplemented from SolidFramework.Converters.Converter.

◆ AddSourceFiles()

void SolidFramework.Converters.Converter.AddSourceFiles ( SolidFramework.StringsArray paths )

inherited

Add a collection of PDF file paths to be converted to the source collection.

Parameters

paths ReadOnlyCollection(String) collection of PDF source file paths.

◆ Convert() [1/3]

override void SolidFramework.Converters.PdfToHtmlConverter.Convert ( )

virtual

Starts the conversion of a PDF file to HTML.

Source files must be set using AddSourceFile(s) before calling Convert(). Converted file(s) will be saved in the OutputDirectory specified. If no OutputDirectory is specified the default save location will be the same as the source file(s).

Exceptions

System.InvalidOperationException Thrown when an option fails.

See also: Converter.AddSourceFile(string)

Reimplemented from SolidFramework.Converters.Converter.

◆ Convert() [2/3]

static SolidFramework.Converters.Plumbing.ConversionStatus SolidFramework.Converters.PdfToHtmlConverter.Convert	(	string	sourceFileName,
		string	destinationFileName
	)

static

Convert the specified source file into the specified HTML file.

Parameters

sourceFileName	String of the full path to the PDF file.
destinationFileName	String of the full path to the destination text file.

Returns: ConversionStatus of the resulting conversion.

◆ Convert() [3/3]

static SolidFramework.Converters.Plumbing.ConversionStatus SolidFramework.Converters.PdfToHtmlConverter.Convert	(	string	sourceFileName,
		string	destinationFileName,
		bool	overwrite
	)

static

Convert the specified source file into the specified HTML file.

Parameters

sourceFileName	String of the full path to the PDF file.
destinationFileName	String of the full path to the destination text file.
overwrite	If the destination files exists, overwrite it or not.

Returns: ConversionStatus of the resulting conversion.

See also: SolidFramework.Converters.Plumbing.ConversionStatus

◆ ConvertTo() [1/2]

SolidFramework.Converters.Plumbing.ConversionStatus SolidFramework.Converters.Converter.ConvertTo ( string fullpath )

inherited

Converts single file to specified path

Parameters

fullpath The fullpath.

Returns

◆ ConvertTo() [2/2]

override SolidFramework.Converters.Plumbing.ConversionStatus SolidFramework.Converters.PdfToHtmlConverter.ConvertTo	(	string	fullpath,
		bool	overwrite
	)

virtual

Convert to a specified file, stating whether to overwrite the file if it already exists.

Parameters

fullpath	Full path and name of converted file.
overwrite	Overwrite converted file if it already exists or not.

Returns: ConversionStatus of the resulting conversion.

File to convert must be added using AddSourceFile.

Exceptions

System.InvalidOperationException Thrown when source file is not specified or more than one source file is specified.

See also: SolidFramework.Converters.Plumbing.ConversionStatus

Reimplemented from SolidFramework.Converters.Converter.

◆ Dispose()

override void SolidFramework.Converters.PdfToHtmlConverter.Dispose ( )

virtual

Reimplemented from SolidFramework.Converters.Converter.

◆ GetResultAsStream()

global.System.IO.Stream SolidFramework.Converters.PdfToOfficeDocumentConverter.GetResultAsStream ( )

inherited

Gets the result as stream.

Returns: The result as stream.

◆ RebaseHtml()

static void SolidFramework.Converters.PdfToHtmlConverter.RebaseHtml	(	string	path,
		string	oldFolder,
		string	newFolder
	)

static

Rebases the HTML.

Parameters

path	Path to html file
oldFolder	The old folder.
newFolder	The new folder.

Property Documentation

◆ AutoRotate

bool SolidFramework.Converters.PdfToOfficeDocumentConverter.AutoRotate

getsetinherited

Gets or sets whether to automatically rotate pages based on the orientation of the text on the pages. Default is true.

If a page in the PDF file is set to landscape, and all text is flowing up and down, automatically rotate the page to portrait.

◆ DetectLists

bool SolidFramework.Converters.PdfToHtmlConverter.DetectLists

getset

Get whether to detect lists.

///

DetectLists is ignored with HtmlSplitting options.

◆ GraphicsAsImages

bool SolidFramework.Converters.SolidConverterPdf.GraphicsAsImages

getsetinherited

Gets or sets whether vector images should be converted to bitmap images. Default is false.

If true then vector images will be converted to bitmap images; otherwise they will be retained as vector images.

◆ HtmlNavigation

SolidFramework.Converters.HtmlNavigation SolidFramework.Converters.PdfToHtmlConverter.HtmlNavigation

getset

Gets the HTML navigation.

The HTML navigation.

◆ HtmlSplittingUsing

SolidFramework.Converters.HtmlSplittingUsing SolidFramework.Converters.PdfToHtmlConverter.HtmlSplittingUsing

getset

Gets the HTML splitting using.

The HTML splitting using.

◆ Images

SolidFramework.Converters.Plumbing.HtmlImages SolidFramework.Converters.PdfToHtmlConverter.Images

getset

Get or set the how images are handled during conversion.

The default is SolidFramework.Converters.Plumbing.HtmlImages.Default, which indicates including linked images into output HTML file.

See also: SolidFramework.Converters.Plumbing.HtmlImages

◆ ImageType

SolidFramework.Converters.Plumbing.ImageDocumentType SolidFramework.Converters.PdfToHtmlConverter.ImageType

getset

Get the format of the image you are converting i.e bmp for Bitmap etc.

The image file type that will be used in images converted from the document. Default is SolidFramework.Converters.Plumbing.ImageDocumentType.Default, which will choose the most appropriate format for each image.

◆ ImageWatermark

SolidFramework.Converters.Plumbing.ImageWatermark SolidFramework.Converters.SolidConverterPdf.ImageWatermark

getsetinherited

Gets or sets an image as a watermark in the converted file

A watermark is an impression that displays on top of or below the content of a PDF page. Image watermarks can be useful for placing a logo on each page. To set an ImageWatermark, assign the path to the Image, ImageWatermark.Path = "C:\\MyDocuments\\MyWatermark.jpg"; Alternatively watermarks can be text. TextWatermark watermark = new TextWatermark(); watermark.Text = "DRAFT";

◆ KeepBackgroundColorText

bool SolidFramework.Converters.SolidConverterPdf.KeepBackgroundColorText

getsetinherited

Gets or sets whether text that is invisible because it is the same colour as the background should be recovered. Default is false.

If true then text that is the same colour as the background will be recovered, else it will be discarded.

◆ KeepInvisibleText

bool SolidFramework.Converters.SolidConverterPdf.KeepInvisibleText

getsetinherited

Gets or sets whether text that is invisible because it has no stroke or fill (PDF rendering mode 3 is typically used for a searchable layer in scanned pages) should be recovered. Default is false.

If true then invisible text that has no stroke or fill will be recovered, else it will be discarded.

◆ KeepLineBreaks

bool SolidFramework.Converters.PdfToHtmlConverter.KeepLineBreaks

getset

Gets or sets whether line breaks should be preserved in the content. The default is false.

◆ OutputDirectory

string SolidFramework.Converters.Converter.OutputDirectory

getsetinherited

Gets or sets the directory where converted files will be placed when created using Convert.

The output directory.

◆ OverwriteMode

SolidFramework.Plumbing.OverwriteMode SolidFramework.Converters.Converter.OverwriteMode

getsetinherited

Get whether to overwrite an existing output file.

Set the desired action when the output file already exists. Default is FailIfExists.

◆ PageRange

SolidFramework.PageRange SolidFramework.Converters.SolidConverterPdf.PageRange

getsetinherited

Gets the page range.

The page range.

◆ Password

string SolidFramework.Converters.SolidConverterPdf.Password

getsetinherited

Gets the password.

The password.

◆ SelectedAreas

SolidFramework.Converters.Plumbing.SelectedAreas SolidFramework.Converters.PdfToOfficeDocumentConverter.SelectedAreas

getsetinherited

Gets or sets the selected areas. The default is null.

The selected areas.

◆ SourceDirectory

string SolidFramework.Converters.Converter.SourceDirectory

getsetinherited

Gets the source directory.

The source directory.

◆ TextRecoveryAutomaticGNse

SolidFramework.Converters.Plumbing.TextRecoveryAutomaticGNse SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryAutomaticGNse

getsetinherited

Gets the automatic GNSE options.

The automatic GNSE options.

◆ TextRecoveryEngine

SolidFramework.Converters.Plumbing.TextRecoveryEngine SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryEngine

getsetinherited

Gets or sets the text recovery engine. The default value is automatic.

The text recovery engine.

◆ TextRecoveryEngineNse

SolidFramework.Converters.Plumbing.TextRecoveryEngineNse SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryEngineNse

getsetinherited

Gets or sets the non-standard encoding recovery engine. The default value is automatic.

The NSE recovery engine.

◆ TextRecoveryLanguage

string SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryLanguage

getsetinherited

Gets or sets the text recovery language. Default is an empty string.

The text recovery language.

Format for each language is as follows: "es" for Spanish "de" for German "it" for Italian "pl" for Polish "pt" for Portugeuse "ru" for Russian "fr" for French "nl" for Dutch "en" for English

◆ TextRecoveryNseType

SolidFramework.Converters.Plumbing.TextRecoveryNSE SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryNseType

getsetinherited

Gets os sets the text recovery NSE type. The default value is automatic.

The type of the text recovering NSE.

◆ TextRecoveryType

SolidFramework.Converters.Plumbing.TextRecovery SolidFramework.Converters.PdfToOfficeDocumentConverter.TextRecoveryType

getsetinherited

Gets or sets the text recovery type. The default value is automatic.

The type of the text recovery.

◆ UserProperties

string SolidFramework.Converters.PdfToOfficeDocumentConverter.UserProperties

getsetinherited

Gets or sets the user properties. The default is an empty string.

The user properties.

◆ WidthLimit

int SolidFramework.Converters.PdfToHtmlConverter.WidthLimit

getset

Get maximum width for HTML file images.

Default is 0, which indicates not to resize the image width.

Event Documentation

◆ Progress

global.System.EventHandler<SolidFramework.ProgressEventArgs> SolidFramework.Converters.Converter.Progress

inherited

Triggered when progress occurs during PDF conversion.

ProgressEventArgs contains the amount of progress, the maximum amount of progress, a status code, a description, plus optional custom data.

Public Member Functions

Static Public Member Functions

Properties

Events

Detailed Description

Constructor & Destructor Documentation

◆ PdfToHtmlConverter()

Member Function Documentation

◆ AddSourceFile() [1/2]

◆ AddSourceFile() [2/2]

◆ AddSourceFiles()

◆ Convert() [1/3]

◆ Convert() [2/3]

◆ Convert() [3/3]

◆ ConvertTo() [1/2]

◆ ConvertTo() [2/2]

◆ Dispose()

◆ GetResultAsStream()

◆ RebaseHtml()

Property Documentation

◆ AutoRotate

◆ DetectLists

◆ GraphicsAsImages

◆ HtmlNavigation

◆ HtmlSplittingUsing

◆ Images

◆ ImageType

◆ ImageWatermark

◆ KeepBackgroundColorText

◆ KeepInvisibleText

◆ KeepLineBreaks

◆ OutputDirectory

◆ OverwriteMode

◆ PageRange

◆ Password

◆ SelectedAreas

◆ SourceDirectory

◆ TextRecoveryAutomaticGNse

◆ TextRecoveryEngine

◆ TextRecoveryEngineNse

◆ TextRecoveryLanguage

◆ TextRecoveryNseType

◆ TextRecoveryType

◆ UserProperties

◆ WidthLimit

Event Documentation

◆ Progress