Introduction
The Portable Document Format (PDF) is a popular file format for documents. Due to their ubiquity and layout capabilities, it's not uncommon for a websites to use PDF technology. For example, an eCommerce store may offer a "printable receipt" option that, when selected, displays a PDF file within the browser. Last week's article, Filling in PDF Forms with ASP.NET and iTextSharp, looked at how to work with a special kind of PDF document, namely one that has one or more fields defined. A PDF document can contain various types of user interface elements, which are referred to as fields. For instance, there is a text field, a checkbox field, a combobox field, and more. Typically, the person viewing the PDF on her computer interacts with the document's fields; however, it is possible to enumerate and fill a PDF's fields programmatically, as we saw in last week's article.
This article continues our investigation into iTextSharp, a .NET open source library for PDF generation, showing how to use iTextSharp to create PDF documents from scratch. We start with an example of how to programmatically define and piece together paragraphs, tables, and images into a single PDF file. Following that, we explore how to use iTextSharp's built-in capabilities to convert HTML into PDF. Read on to learn more!
Getting Started with iTextSharp
There are a variety of .NET libraries available to programmatically create PDF documents. Perhaps the most popular is iTextSharp, which is the .NET version of the Java-based iText PDF library.
Part of iTextSharp's popularity stems from the fact that it's open source. However, it's important to keep in mind that starting with version 5.0, iTextSharp is released under the GNU Affero General Public License (AGPL) version 3. This license requires that any application that uses iTextSharp must also be released under the same license and that you must make your application's source code freely available (like iTextSharp's is). You can optionally buy a license to be released from the AGPL. While version 5.0 (and beyond) is released under the more restrictive AGPL, previous versions were released under the GNU Lesser General Public License (LGPL), which allows the use of iTextSharp within an application without requiring that the application also be released under the LGPL. In other words, by using version 4 or earlier you can use iTextSharp in your web application without having to buy a license and without having to release your web application's source code. (The download available at the end of the article uses iTextSharp version 4.1.6.)
You can download iTextSharp from its project page at: http://sourceforge.net/projects/itextsharp/. (Alternatively, you can download the code at the end of this article, which includes the iTextSharp version 4.1.6 assembly in the
Bin
folder, named itextsharp.dll
.) For assistance with iTextSharp, I suggest the iText-question listserv.Creating a PDF Document from the Ground Up
Creating a PDF document from the ground up using iTextSharp involves the following steps:
- Create a
Document
object, which models the PDF document you are creating. - Create a
PdfWriter
object, which is the bridge between theDocument
object and a backing store. In other words, thePdfWriter
object is responsible for serializing the PDF document you create to some store, such as in memory or to disk. - Add various elements to the
Document
object - paragraphs, tables, images, and so on.
Steps 1 and 2: Creating the Document
and PdfWriter
Objects
Before we get bogged down in the details of Step 3, let's first take a moment to examine the code necessary to accomplish Steps 1 and 2:
// Create a Document object |
The first line of code creates a
Document
object specifying the document's dimensions and left, right, top, and bottom margins, respectively.Next, we create a
PdfWriter
object. In doing so we need to specify two bits of information - the Document
object being created and aStream
where the Document
object's output should be serialized when it is closed. In the code above we are using a FileStream
, which will cause the PDF document's contents to be serialized to a file on disk named MyFirstPDF.pdf
.Following that the document object is opened. At this point we're ready for Step 3 - adding the assorted elements to the document. Once all of the elements have been added we close the document, which prompts the
PdfWriter
object to "save" the Document
object to the specified Stream
- in this case, to the file MyFirstPDF.pdf
.Step 3: Adding Elements to the Document
When creating a PDF document you can add a number of different element types, including: annotations, chunks, tables, lists, images, and paragraphs. There are classes in the iTextSharp library that model these various element types. To add an element type to the document you (typically) create an instance of the appropriate element type, set some properties, and then add it to the
Document
object via the Add
method. For example, the following code snippet adds a new Paragraph
object to the document with the text, "Hello, World!"// Create a new Paragraph object with the text, "Hello, World!" |
In fact, if we run the above code (namely, the code snippet presented in Steps 1 and 2: Creating the
Document
and PdfWriter
Objects with the code snippet above) we get a PDF named MyFirstPDF.pdf
that contains the text, "Hello, World!", as the screen shot below show.For a good primer on adding common elements to a PDF document I recommend Mike Brind's excellent series of articles on iTextSharp:Create PDFs in ASP.NET. There are individual articles on fonts, adding text, working with tables, and adding images, among others.
Putting It All Together: Dynamically Creating a Receipt PDF
The demo available for download at the end of this article includes a web page named
CreatePDFFromScratch.aspx
that builds up a PDF receipt. The page contains user interface elements where the user can enter the Order number, price, and what items were ordered, and these selections are used to dynamically create the PDF receipt. Of course, in a real-world application this information would be pulled from a database and not hand-entered by a user.The screen shot below shows the
CreatePDFFromScratch.aspx
user interface. Here, we are creating a receipt for Order 1234, which cost $55.95 and contained four widgets, one whatchacallit, and seven thingamabops. Clicking the "Create Receipt" button causes a postback and on postback a PDF is generated.The code that runs when the "Create Receipt" button is clicked a bit long to post in its entirety, so instead let me post just the germane portions, starting with Steps 1 and 2:
// Create a Document object |
The above code snippet is quite familiar to the code snippet examined back in Steps 1 and 2: Creating the
Document
and PdfWriter
Objects with one important difference - in the earlier example the created PDF was serialized to a file. Here, we are serializing the PDF to a MemoryStream
. The reason is because rather than saving the PDF to the web server's file system, we simply want to send the PDF back to the browser, where the user can open or save it. We'll see how this is done in a moment.Before adding any elements to the document a number of
Font
objects are created, which specify the font family, font size, and style for the receipt title, its subtitles, and so on.... |
Next, the receipt title is added. Note that when creating a new
Paragraph
object we can optionally specify its font. In this case, we use the titleFont
, which will display the receipt title in an 18pt Arial bold font.... |
The order details are defined using a table, which is accomplished by using the
PdfPTable
class. Here we specify that the table has two columns. Next we specify various table properties - it's HorizontalAlignment, how much spacing should appear before and after the table, and any default cell settings (in this case, we indicate that cells, by default, should have no border).Next, the cells are added to the table, one at a time, from top left to bottom right. The code below creates a 2x2 table that displays the order ID and total price. After the table is constructed it is added to the document object via the
Add
method.... |
There's another table in the receipt that shows the items ordered, but I'll skip that code since it is nearly identical to the order details table code. There's also an ending message at the bottom of the receipt - "Thank you for your business..." - which is added via a
Paragraph
object.The receipt also contains an image. This image -
4guysfromrolla.gif
- is located on the web server's file system in the ~/Images
folder. It gets added to the PDF receipt by creating a new Image
object. If you add the Image
object to the document like the other text elements it will appear in the document based on the order it was added. However, you can specify an absolute position for the image, which I do here, to locate it in the upper right corner of the receipt.... |
After all of the document content has been added, the
Document
object is closed and the PDF is streamed back to the visitor's browser. This is accomplished by the following lines of code:... |
The
Response.ContentType
property tells the browser the type of content it is being sent from the server. application/pdf
is the standard MIME type for PDF documents; this notifies the browser that it is receiving a PDF document. The next line of code adds theContent-Disposition
HTTP header to the response. This tells the browser to treat the content like an attachment, meaning the user will be prompted whether to open or save the PDF (rather than having it open directly in the browser window). Note that we can tell the browser the name of the file being sent, which the browser will use as the suggested name should the user opt to save the PDF to their hard drive. Here we use the filename Receipt-OrderID.pdf
.The
Response.BinaryWrite
statement sends the contents of a specified byte array to the browser. Recall that output
is theMemoryStream
object we created when instantiating the PdfWriter
object. output.ToArray()
returns the contents of the MemoryStream
- namely, the binary contents of the generated PDF document - as a byte array, which is then sent down to the client.The screen shot below shows the receipt PDF generated when using the inputs shown in the previous screen shot (namely, an Order ID of 1243, a total price of $55.95, and so on).
Creating a PDF Document from a String of HTML
iTextSharp includes a simple HTML parser class that can be used to translate HTML into a PDF document. Using this class you can, with just a few lines of code, turn an HTML document into a PDF file. For example, rather than building the receipt programmatically, adding each element one at a time as we did in the previous demo, we could instead opt to generate the receipt using an HTML template.
The demo available for download includes an HTML template file named
Receipt.htm
, which is located in the ~/HTMLTemplate
folder. This HTML file contains the following markup (note - some markup has been removed for brevity):<h1 style="font-weight: bold">Northwind Traders Receipt</h1> |
This markup defines a receipt layout not unlike the receipt created programmatically in the previous section. There's the "Northwind Traders Receipt" title at the top, here implemented as an
<h1>
element. There's a table for the order details, a "Thank you for your business..." message at the bottom, and so on.Note that the above markup contains four placeholders - text surrounded by brackets. The idea here is that before we ask iTextSharp to turn the above markup into a PDF we will first replace those placeholders with the Order ID, total price, and other metrics for the order we are generating a receipt for.
Turning HTML into a PDF involves the following steps:
- Create a
Document
object. - Create a
PdfWriter
object. - Read in the HTML as a string.
- Call iTextSharp's
HTMLWorker.ParseToList
method, passing in the HTML to convert into PDF. This returns a collection of elements. - Add each element returned in Step 3 to the
Document
object
Receipt.htm
template. Following that, we need to replace the placeholders with the appropriate values. In the demo available for download you'll find a page named ConvertHTMLtoPDF.aspx
, which has the same user interface as CreatePDFFromScratch.aspx
. In short, the page prompts the user to enter an Order ID, total price, and select what items were part of the order. These user-supplied values are what are used to populate the placeholders inReceipt.htm
.These two sub-steps - reading the contents of
Receipt.htm
into a string and then replacing the placeholders - are accomplished by the code snippet below:// Read in the contents of the Receipt.htm file... |
The above code snippet does not include the code that sets the [ITEMS] placeholder, which is where the order details are displayed. The code is a little lengthy, but it's not terribly complex. The code simply builds the markup for a
<table>
by looping through the CheckBoxList and adding a table row (<tr>
) for each selected purchased item.Once the HTML string has been composed we are ready for Steps 4 and 5. Step 4 - calling iTextSharp's
HTMLWorker.ParseToList
method - parses the HTML string and returns a collection of elements. Step 5 enumerates this collection of elements, adding them to the Document
object.... |
That's all there is to it! Keep in mind that the HTML parser is simply converting HTML into elements that can be added to the PDF document. In addition to adding these parsed elements you can also add elements you create, just like we did in our earlier demo (
CreatePDFFromScratch.aspx
). For instance, we can add the logo to the upper right corner of the receipt using the same code as before:var logo = iTextSharp.text.Image.GetInstance(Server.MapPath("~/Images/4guysfromrolla.gif")); |
The generated PDF, shown below, is quite similar to the receipt created from the ground up.
If you decide to use iTextSharp's HTML to PDF capabilities, keep in mind that they are pretty rudimentary. iTextSharp will not correctly parse a complex HTML document with many layers and overlays and it's stylesheet support is limited (although I hear this has been improved upon in iTextSharp 5.0). For maximum control you will want to either create PDFs from the ground up using the techniques discussed at the start of this article or you will want to create your PDFs using Adobe Acrobat with form fields to fill in the dynamic bits, as was discussed in Filling in PDF Forms with ASP.NET and iTextSharp.
0 Comments