Solid Framework SDK 10.0.12602
SDK for converting and extracting value from PDFs
|
Convert PDF to HTML document. More...
Public Member Functions | |
override void | Dispose () |
PdfToHtmlConverter () | |
Initializes a new instance of the PdfToHtmlConverter class. More... | |
override void | Convert () |
Starts the conversion of a PDF file to HTML. More... | |
override void | AddSourceFile (string fullpath) |
Add a PDF file to be converted to the source collection. More... | |
override void | AddSourceFile (SolidFramework.Plumbing.Document document) |
Add a PDF file to be converted to the source collection from a loaded Document object. More... | |
override SolidFramework.Converters.Plumbing.ConversionStatus | ConvertTo (string fullpath, bool overwrite) |
Convert to a specified file, stating whether to overwrite the file if it already exists. More... | |
global::System.IO.Stream | GetResultAsStream () |
Gets the result as stream. More... | |
SolidFramework.Converters.Plumbing.ConversionStatus | ConvertTo (string fullpath) |
Converts single file to specified path More... | |
void | Cancel () |
Cancels the conversion. | |
bool | IsCanceled () |
void | AddSourceFiles (SolidFramework.StringsArray paths) |
Add a collection of PDF file paths to be converted to the source collection. More... | |
void | ClearSourceFiles () |
Clear the current source files collection. | |
virtual void | OnProgress (SolidFramework.ProgressEventArgs args) |
virtual void | OnWarning (SolidFramework.WarningEventArgs args) |
bool | Equals (SolidFramework.Converters.Converter other) |
override bool | Equals (object obj) |
override int | GetHashCode () |
Static Public Member Functions | |
static SolidFramework.Converters.Plumbing.ConversionStatus | Convert (string sourceFileName, string destinationFileName) |
Convert the specified source file into the specified HTML file. More... | |
static SolidFramework.Converters.Plumbing.ConversionStatus | Convert (string sourceFileName, string destinationFileName, bool overwrite) |
Convert the specified source file into the specified HTML file. More... | |
static void | RebaseHtml (string path, string oldFolder, string newFolder) |
Rebases the HTML. More... | |
static bool | operator== (SolidFramework.Converters.Converter obj1, SolidFramework.Converters.Converter obj2) |
static bool | operator!= (SolidFramework.Converters.Converter obj1, SolidFramework.Converters.Converter obj2) |
Properties | |
int | WidthLimit [getset] |
Get maximum width for HTML file images. More... | |
bool | DetectLists [getset] |
Get whether to detect lists. More... | |
bool | DetectTiledPages [getset] |
Attempt to automatically detect tables tiled across multiple pages. | |
bool | DetectLanguage [getset] |
Gets or sets a value indicating whether the document language should be detected from the document content. The default is true . | |
SolidFramework.Converters.HtmlNavigation | HtmlNavigation [getset] |
Gets the HTML navigation. More... | |
SolidFramework.Converters.HtmlSplittingUsing | HtmlSplittingUsing [getset] |
Gets the HTML splitting using. More... | |
SolidFramework.Converters.Plumbing.HtmlImages | Images [getset] |
Get or set the how images are handled during conversion. More... | |
SolidFramework.Converters.Plumbing.ImageDocumentType | ImageType [getset] |
Get the format of the image you are converting i.e bmp for Bitmap etc. More... | |
bool | KeepLineBreaks [getset] |
Gets or sets whether line breaks should be preserved in the content. The default is false . More... | |
bool | ExactMode [getset] |
Gets or sets a value indicating whether the reconstruction should attempt to represent the original pdf. The default is false . | |
SolidFramework.Converters.Plumbing.HeaderAndFooterMode | HeaderAndFooterMode [getset] |
Get or sets a E:HeaderAndFooterMode that specifies the header and footer mode. Default is HeaderAndFooterMode.Remove . | |
SolidFramework.Converters.Plumbing.SelectedAreas | SelectedAreas [getset] |
Gets or sets the selected areas. The default is null . More... | |
string | UserProperties [getset] |
Gets or sets the user properties. The default is an empty string. More... | |
string | TextRecoveryLanguage [getset] |
Gets or sets the text recovery language. Default is an empty string. More... | |
bool | AutoRotate [getset] |
Gets or sets whether to automatically rotate pages based on the orientation of the text on the pages. Default is true . More... | |
bool | TextRecoverySuspects [getset] |
Gets or sets a value indicating whether to highlight low confidence words (any word with SolidFramework.Imaging.OcrWord.Confidence less than 100) in the OCR output in yellow. This property has no effect if OCR has not occurred. Default is false . | |
SolidFramework.Converters.Plumbing.TextRecovery | TextRecoveryType [getset] |
Gets or sets the text recovery type. The default value is automatic . More... | |
SolidFramework.Converters.Plumbing.TextRecoveryNSE | TextRecoveryNseType [getset] |
Gets os sets the text recovery NSE type. The default value is automatic . More... | |
SolidFramework.Converters.Plumbing.TextRecoveryEngine | TextRecoveryEngine [getset] |
Gets or sets the text recovery engine. The default value is automatic . More... | |
SolidFramework.Converters.Plumbing.TextRecoveryEngineNse | TextRecoveryEngineNse [getset] |
Gets or sets the non-standard encoding recovery engine. The default value is automatic . More... | |
SolidFramework.Converters.Plumbing.TextRecoveryAutomaticGNse | TextRecoveryAutomaticGNse [getset] |
Gets the automatic GNSE options. More... | |
SolidFramework.Converters.Plumbing.ImageWatermark | ImageWatermark [getset] |
Gets or sets an image as a watermark in the converted file More... | |
string | Password [getset] |
Gets the password. More... | |
SolidFramework.PageRange | PageRange [getset] |
Gets the page range. More... | |
bool | DetectSoftHyphens [getset] |
Gets or sets a value indicating whether soft-hyphens should be detected. The default is false . | |
bool | NoRepairing [getset] |
In the event of a damaged PDF (InternalError) do not attempt to repair the PDF before conversion. | |
bool | GraphicsAsImages [getset] |
Gets or sets whether vector images should be converted to bitmap images. Default is false . More... | |
bool | KeepInvisibleText [getset] |
Gets or sets whether text that is invisible because it has no stroke or fill (PDF rendering mode 3 is typically used for a searchable layer in scanned pages) should be recovered. Default is false . More... | |
bool | KeepBackgroundColorText [getset] |
Gets or sets whether text that is invisible because it is the same colour as the background should be recovered. Default is false . More... | |
System.Object | CustomData [getset] |
Custom data to provide during Progress events | |
SolidFramework.Converters.Plumbing.IConversionResultsCollection | Results [get] |
Gets the results of conversion | |
string | SourceDirectory [getset] |
Gets the source directory. More... | |
string | OutputDirectory [getset] |
Gets or sets the directory where converted files will be placed when created using Convert . More... | |
SolidFramework.Plumbing.OverwriteMode | OverwriteMode [getset] |
Get whether to overwrite an existing output file. More... | |
Events | |
global::System.EventHandler< SolidFramework.ProgressEventArgs > | Progress |
Triggered when progress occurs during PDF conversion. More... | |
global::System.EventHandler< SolidFramework.WarningEventArgs > | Warning |
Triggered when a warning is raised during PDF conversion. | |
Convert PDF to HTML document.
SolidFramework.Converters.PdfToHtmlConverter.PdfToHtmlConverter | ( | ) |
Initializes a new instance of the PdfToHtmlConverter class.
Defaults to HtmlImages.Default
, ImageDocumentType.Default
, GraphicsAsImages = true
and widthLimit = 0
.
|
virtual |
Add a PDF file to be converted to the source collection from a loaded Document
object.
document | Document object loaded with a PDF file. |
System.ArgumentNullException | Thrown when Document object is null. |
Reimplemented from SolidFramework.Converters.Converter.
|
virtual |
Add a PDF file to be converted to the source collection.
fullpath | String of the full path to the PDF file. |
System.IO.FileNotFoundException | Thrown when path is invalid. |
Reimplemented from SolidFramework.Converters.Converter.
|
inherited |
Add a collection of PDF file paths to be converted to the source collection.
paths | ReadOnlyCollection(String) collection of PDF source file paths. |
|
virtual |
Starts the conversion of a PDF file to HTML.
Source files must be set using AddSourceFile
(s) before calling Convert()
. Converted file(s) will be saved in the OutputDirectory
specified. If no OutputDirectory is specified the default save location will be the same as the source file(s).
System.InvalidOperationException | Thrown when an option fails. |
Reimplemented from SolidFramework.Converters.Converter.
|
static |
Convert the specified source file into the specified HTML file.
sourceFileName | String of the full path to the PDF file. |
destinationFileName | String of the full path to the destination text file. |
ConversionStatus
of the resulting conversion.
|
static |
Convert the specified source file into the specified HTML file.
sourceFileName | String of the full path to the PDF file. |
destinationFileName | String of the full path to the destination text file. |
overwrite | If the destination files exists, overwrite it or not. |
ConversionStatus
of the resulting conversion.
|
inherited |
Converts single file to specified path
fullpath | The fullpath. |
|
virtual |
Convert to a specified file, stating whether to overwrite the file if it already exists.
fullpath | Full path and name of converted file. |
overwrite | Overwrite converted file if it already exists or not. |
ConversionStatus
of the resulting conversion.File to convert must be added using AddSourceFile
.
System.InvalidOperationException | Thrown when source file is not specified or more than one source file is specified. |
Reimplemented from SolidFramework.Converters.Converter.
|
virtual |
Reimplemented from SolidFramework.Converters.Converter.
|
inherited |
Gets the result as stream.
|
static |
Rebases the HTML.
path | Path to html file |
oldFolder | The old folder. |
newFolder | The new folder. |
|
getsetinherited |
Gets or sets whether to automatically rotate pages based on the orientation of the text on the pages. Default is true
.
If a page in the PDF file is set to landscape, and all text is flowing up and down, automatically rotate the page to portrait.
|
getset |
Get whether to detect lists.
///
DetectLists is ignored with HtmlSplitting options.
|
getsetinherited |
Gets or sets whether vector images should be converted to bitmap images. Default is false
.
If true
then vector images will be converted to bitmap images; otherwise they will be retained as vector images.
|
getset |
Gets the HTML navigation.
The HTML navigation.
|
getset |
Gets the HTML splitting using.
The HTML splitting using.
|
getset |
Get or set the how images are handled during conversion.
The default is SolidFramework.Converters.Plumbing.HtmlImages.Default
, which indicates including linked images into output HTML file.
|
getset |
Get the format of the image you are converting i.e bmp for Bitmap etc.
The image file type that will be used in images converted from the document. Default is SolidFramework.Converters.Plumbing.ImageDocumentType.Default
, which will choose the most appropriate format for each image.
|
getsetinherited |
Gets or sets an image as a watermark in the converted file
A watermark is an impression that displays on top of or below the content of a PDF page. Image watermarks can be useful for placing a logo on each page. To set an ImageWatermark, assign the path to the Image, ImageWatermark.Path = "C:\\MyDocuments\\MyWatermark.jpg";
Alternatively watermarks can be text. TextWatermark watermark = new TextWatermark(); watermark.Text = "DRAFT";
|
getsetinherited |
Gets or sets whether text that is invisible because it is the same colour as the background should be recovered. Default is false
.
If true
then text that is the same colour as the background will be recovered, else it will be discarded.
|
getsetinherited |
Gets or sets whether text that is invisible because it has no stroke or fill (PDF rendering mode 3 is typically used for a searchable layer in scanned pages) should be recovered. Default is false
.
If true
then invisible text that has no stroke or fill will be recovered, else it will be discarded.
|
getset |
Gets or sets whether line breaks should be preserved in the content. The default is false
.
|
getsetinherited |
Gets or sets the directory where converted files will be placed when created using Convert
.
The output directory.
|
getsetinherited |
Get whether to overwrite an existing output file.
Set the desired action when the output file already exists. Default is FailIfExists.
|
getsetinherited |
Gets the page range.
The page range.
|
getsetinherited |
Gets the password.
The password.
|
getsetinherited |
Gets or sets the selected areas. The default is null
.
The selected areas.
|
getsetinherited |
Gets the source directory.
The source directory.
|
getsetinherited |
Gets the automatic GNSE options.
The automatic GNSE options.
|
getsetinherited |
Gets or sets the text recovery engine. The default value is automatic
.
The text recovery engine.
|
getsetinherited |
Gets or sets the non-standard encoding recovery engine. The default value is automatic
.
The NSE recovery engine.
|
getsetinherited |
Gets or sets the text recovery language. Default is an empty string.
The text recovery language.
Format for each language is as follows: "es" for Spanish "de" for German "it" for Italian "pl" for Polish "pt" for Portugeuse "ru" for Russian "fr" for French "nl" for Dutch "en" for English
|
getsetinherited |
Gets os sets the text recovery NSE type. The default value is automatic
.
The type of the text recovering NSE.
|
getsetinherited |
Gets or sets the text recovery type. The default value is automatic
.
The type of the text recovery.
|
getsetinherited |
Gets or sets the user properties. The default is an empty string.
The user properties.
|
getset |
Get maximum width for HTML file images.
Default is 0, which indicates not to resize the image width.
|
inherited |
Triggered when progress occurs during PDF conversion.
ProgressEventArgs contains the amount of progress, the maximum amount of progress, a status code, a description, plus optional custom data.