The Solid Framework Core Model is our in-memory model of the reconstructed content of your PDF file. We reconstruct clean Unicode text, in reading order, and provide the bounds of the major text blocks as per their position in the original PDF layout.
Typically, we then export from this model to another format like Word .docx but this sample shows you how to iterate this model programmatically in-memory.
The sample outputs two files:
- a text file that includes the bounding box coordinates and text for each of the text blocks.
- an HTML file that visually illustrates the different bounding rectangles for all the text blocks/paragraphs in the PDF.
The source code and SDK for this sample is provided in the form of a zip file, the project file below and the Solid Framework C++ for Windows SDK zip file, available from the developer portal (self-service).
Steps to Extracting Text from Model – C++ Sample Application
- Download the TextBounding.zip (6KB)
- Create a directory like C:\SampleCode and download and extract the sample project into your new directory.
- Create a free portal account and generate a Developer SDK license.
- Download the Solid Framework Native SDK and extract it into your new directory.Note:It must be in the same directory as TextBounding.zip.
- Extract the .zip file above and the x86 or x64 Solid Framework C++ for Windows zip file from the developers portal (without editing the path names) into your new directory).
- Open TextBounding.sln in Visual Studio.
- Right click on the project and select Rebuild.
- Open a cmd window and navigate to your project’s Debug folder where the .exe file of the project is. i.e cd C:\SampleCode\Debug.
- Type in the name of your .exe file followed by the paths to the following files:
- Your license.xml file
- The pdf file to parse
- Path to where you want your .html file to be saved to
- Path where you want your .txt file to be saved to
i.e. TextBounding.exe C:\SampleCode\license.xml C:\SampleCode\YourPDF.pdf C:\SampleCode\layout.html C:\SampleCode\layout.txt
- Press Enter. You can then view the .html file which visually illustrates the reconstructed text blocks/paragraphs and their bounds.