Header Ads Widget

Ticker

6/recent/ticker-posts


Introduction


The Portable Document Format (PDF) is a popular file format for documents. Due to their ubiquity and layout capabilities, it's not uncommon for a websites to use PDF technology. For example, an eCommerce store may offer a "printable receipt" option that, when selected, displays a PDF file within the browser. Last week's article, Filling in PDF Forms with ASP.NET and iTextSharp, looked at how to work with a special kind of PDF document, namely one that has one or more fields defined. A PDF document can contain various types of user interface elements, which are referred to as fields. For instance, there is a text field, a checkbox field, a combobox field, and more. Typically, the person viewing the PDF on her computer interacts with the document's fields; however, it is possible to enumerate and fill a PDF's fields programmatically, as we saw in last week's article.
This article continues our investigation into iTextSharp, a .NET open source library for PDF generation, showing how to use iTextSharp to create PDF documents from scratch. We start with an example of how to programmatically define and piece together paragraphs, tables, and images into a single PDF file. Following that, we explore how to use iTextSharp's built-in capabilities to convert HTML into PDF. Read on to learn more!



Getting Started with iTextSharp


There are a variety of .NET libraries available to programmatically create PDF documents. Perhaps the most popular is iTextSharp, which is the .NET version of the Java-based iText PDF library.
Part of iTextSharp's popularity stems from the fact that it's open source. However, it's important to keep in mind that starting with version 5.0, iTextSharp is released under the GNU Affero General Public License (AGPL) version 3. This license requires that any application that uses iTextSharp must also be released under the same license and that you must make your application's source code freely available (like iTextSharp's is). You can optionally buy a license to be released from the AGPL. While version 5.0 (and beyond) is released under the more restrictive AGPL, previous versions were released under the GNU Lesser General Public License (LGPL), which allows the use of iTextSharp within an application without requiring that the application also be released under the LGPL. In other words, by using version 4 or earlier you can use iTextSharp in your web application without having to buy a license and without having to release your web application's source code. (The download available at the end of the article uses iTextSharp version 4.1.6.)
You can download iTextSharp from its project page at: http://sourceforge.net/projects/itextsharp/. (Alternatively, you can download the code at the end of this article, which includes the iTextSharp version 4.1.6 assembly in the Bin folder, named itextsharp.dll.) For assistance with iTextSharp, I suggest the iText-question listserv.

Creating a PDF Document from the Ground Up


Creating a PDF document from the ground up using iTextSharp involves the following steps:
  1. Create a Document object, which models the PDF document you are creating.
  2. Create a PdfWriter object, which is the bridge between the Document object and a backing store. In other words, the PdfWriterobject is responsible for serializing the PDF document you create to some store, such as in memory or to disk.
  3. Add various elements to the Document object - paragraphs, tables, images, and so on.
And that's it! Steps one and two are easy enough to implement and take just a couple of lines of code in total. Step 3, however, is where the bulk of the work is done. Here is where we go about creating the PDF document's elements and adding them, one at a time, to the document. The code for Step 3 can be long - the bigger and more complex a PDF document you are trying to create, the more intricate this code will be.

Steps 1 and 2: Creating the Document and PdfWriter Objects


Before we get bogged down in the details of Step 3, let's first take a moment to examine the code necessary to accomplish Steps 1 and 2:
// Create a Document object
var document = new Document(PageSize.A4, 50, 50, 25, 25);

// Create a new PdfWriter object, specifying the output stream
var output = new FileStream(Server.MapPath("MyFirstPDF.pdf"), FileMode.Create);
var writer = PdfWriter.GetInstance(document, output);

// Open the Document for writing
document.Open();

... Step 3: Add elements to the document! ...

// Close the Document - this saves the document contents to the output stream
document.Close();

The first line of code creates a Document object specifying the document's dimensions and left, right, top, and bottom margins, respectively.
Next, we create a PdfWriter object. In doing so we need to specify two bits of information - the Document object being created and aStream where the Document object's output should be serialized when it is closed. In the code above we are using a FileStream, which will cause the PDF document's contents to be serialized to a file on disk named MyFirstPDF.pdf.
Following that the document object is opened. At this point we're ready for Step 3 - adding the assorted elements to the document. Once all of the elements have been added we close the document, which prompts the PdfWriter object to "save" the Document object to the specified Stream - in this case, to the file MyFirstPDF.pdf.

Step 3: Adding Elements to the Document


When creating a PDF document you can add a number of different element types, including: annotations, chunks, tables, lists, images, and paragraphs. There are classes in the iTextSharp library that model these various element types. To add an element type to the document you (typically) create an instance of the appropriate element type, set some properties, and then add it to the Documentobject via the Add method. For example, the following code snippet adds a new Paragraph object to the document with the text, "Hello, World!"
// Create a new Paragraph object with the text, "Hello, World!"
var welcomeParagraph = new Paragraph("Hello, World!");

// Add the Paragraph object to the document
document.Add(welcomeParagraph);

In fact, if we run the above code (namely, the code snippet presented in Steps 1 and 2: Creating the Document and PdfWriterObjects with the code snippet above) we get a PDF named MyFirstPDF.pdf that contains the text, "Hello, World!", as the screen shot below show.
The PDF document contains a single paragraph.
For a good primer on adding common elements to a PDF document I recommend Mike Brind's excellent series of articles on iTextSharp:Create PDFs in ASP.NET. There are individual articles on fontsadding textworking with tables, and adding images, among others.

Putting It All Together: Dynamically Creating a Receipt PDF


The demo available for download at the end of this article includes a web page named CreatePDFFromScratch.aspx that builds up a PDF receipt. The page contains user interface elements where the user can enter the Order number, price, and what items were ordered, and these selections are used to dynamically create the PDF receipt. Of course, in a real-world application this information would be pulled from a database and not hand-entered by a user.
The screen shot below shows the CreatePDFFromScratch.aspx user interface. Here, we are creating a receipt for Order 1234, which cost $55.95 and contained four widgets, one whatchacallit, and seven thingamabops. Clicking the "Create Receipt" button causes a postback and on postback a PDF is generated.
The user specifies the Order ID, total price, and purchased items used to generate the receipt.
The code that runs when the "Create Receipt" button is clicked a bit long to post in its entirety, so instead let me post just the germane portions, starting with Steps 1 and 2:
// Create a Document object
var document = new Document(PageSize.A4, 50, 50, 25, 25);

// Create a new PdfWriter object, specifying the output stream
var output = new MemoryStream();
var writer = PdfWriter.GetInstance(document, output);

// Open the Document for writing
document.Open();

...

The above code snippet is quite familiar to the code snippet examined back in Steps 1 and 2: Creating the Document and PdfWriterObjects with one important difference - in the earlier example the created PDF was serialized to a file. Here, we are serializing the PDF to a MemoryStream. The reason is because rather than saving the PDF to the web server's file system, we simply want to send the PDF back to the browser, where the user can open or save it. We'll see how this is done in a moment.
Before adding any elements to the document a number of Font objects are created, which specify the font family, font size, and style for the receipt title, its subtitles, and so on.
...

var titleFont = FontFactory.GetFont("Arial", 18, Font.BOLD);
var subTitleFont = FontFactory.GetFont("Arial", 14, Font.BOLD);
var boldTableFont = FontFactory.GetFont("Arial", 12, Font.BOLD);
var endingMessageFont = FontFactory.GetFont("Arial", 10, Font.ITALIC);
var bodyFont = FontFactory.GetFont("Arial", 12, Font.NORMAL);

...

Next, the receipt title is added. Note that when creating a new Paragraph object we can optionally specify its font. In this case, we use the titleFont, which will display the receipt title in an 18pt Arial bold font.
...

document.Add(new Paragraph("Northwind Traders Receipt", titleFont));

...

The order details are defined using a table, which is accomplished by using the PdfPTable class. Here we specify that the table has two columns. Next we specify various table properties - it's HorizontalAlignment, how much spacing should appear before and after the table, and any default cell settings (in this case, we indicate that cells, by default, should have no border).
Next, the cells are added to the table, one at a time, from top left to bottom right. The code below creates a 2x2 table that displays the order ID and total price. After the table is constructed it is added to the document object via the Add method.
...

var orderInfoTable = new PdfPTable(2);
orderInfoTable.HorizontalAlignment = 0;
orderInfoTable.SpacingBefore = 10;
orderInfoTable.SpacingAfter = 10;
orderInfoTable.DefaultCell.Border = 0;
orderInfoTable.SetWidths(new int[] { 1, 4 });

orderInfoTable.AddCell(new Phrase("Order:", boldTableFont));
orderInfoTable.AddCell(txtOrderID.Text);
orderInfoTable.AddCell(new Phrase("Price:", boldTableFont));
orderInfoTable.AddCell(Convert.ToDecimal(txtTotalPrice.Text).ToString("c"));

document.Add(orderInfoTable);

...

There's another table in the receipt that shows the items ordered, but I'll skip that code since it is nearly identical to the order details table code. There's also an ending message at the bottom of the receipt - "Thank you for your business..." - which is added via aParagraph object.
The receipt also contains an image. This image - 4guysfromrolla.gif - is located on the web server's file system in the ~/Imagesfolder. It gets added to the PDF receipt by creating a new Image object. If you add the Image object to the document like the other text elements it will appear in the document based on the order it was added. However, you can specify an absolute position for the image, which I do here, to locate it in the upper right corner of the receipt.
...

var logo = iTextSharp.text.Image.GetInstance(Server.MapPath("~/Images/4guysfromrolla.gif"));
logo.SetAbsolutePosition(440, 800);
document.Add(logo);

...

After all of the document content has been added, the Document object is closed and the PDF is streamed back to the visitor's browser. This is accomplished by the following lines of code:
...

document.Close();

Response.ContentType = "application/pdf";
Response.AddHeader("Content-Disposition", string.Format("attachment;filename=Receipt-{0}.pdf", txtOrderID.Text));
Response.BinaryWrite(output.ToArray());

The Response.ContentType property tells the browser the type of content it is being sent from the server. application/pdf is the standard MIME type for PDF documents; this notifies the browser that it is receiving a PDF document. The next line of code adds theContent-Disposition HTTP header to the response. This tells the browser to treat the content like an attachment, meaning the user will be prompted whether to open or save the PDF (rather than having it open directly in the browser window). Note that we can tell the browser the name of the file being sent, which the browser will use as the suggested name should the user opt to save the PDF to their hard drive. Here we use the filename Receipt-OrderID.pdf.
The Response.BinaryWrite statement sends the contents of a specified byte array to the browser. Recall that output is theMemoryStream object we created when instantiating the PdfWriter object. output.ToArray() returns the contents of the MemoryStream- namely, the binary contents of the generated PDF document - as a byte array, which is then sent down to the client.
The screen shot below shows the receipt PDF generated when using the inputs shown in the previous screen shot (namely, an Order ID of 1243, a total price of $55.95, and so on).
The PDF receipt is complete!

Creating a PDF Document from a String of HTML


iTextSharp includes a simple HTML parser class that can be used to translate HTML into a PDF document. Using this class you can, with just a few lines of code, turn an HTML document into a PDF file. For example, rather than building the receipt programmatically, adding each element one at a time as we did in the previous demo, we could instead opt to generate the receipt using an HTML template.
The demo available for download includes an HTML template file named Receipt.htm, which is located in the ~/HTMLTemplate folder. This HTML file contains the following markup (note - some markup has been removed for brevity):
<h1 style="font-weight: bold">Northwind Traders Receipt</h1>
<p>
   Thank you for shopping at Northwind Traders. Your order details are below.
</p>
<br /><br />
<h2 style="font-weight: bold">Order Information</h2>
<table>
   <tr>
      <td style="font-weight: bold">Order:</td>
      <td>[ORDERID]</td>
   </tr>
   <tr>
      <td style="font-weight: bold">Price:</td>
      <td>[TOTALPRICE]</td>
   </tr>
   <tr>
      <td style="font-weight: bold">Order Date:</td>
      <td>[ORDERDATE]</td>
   </tr>
</table>
<br /><br />
<h2 style="font-weight: bold">Items In Your Order</h2>
[ITEMS]
<br /><br />
<p style="text-align: center; font-style: italic; font-size: 10pt">
   Thank you for your business! If you have any questions about your order, please contact us at
   800-555-NORTH.
</p>

This markup defines a receipt layout not unlike the receipt created programmatically in the previous section. There's the "Northwind Traders Receipt" title at the top, here implemented as an <h1> element. There's a table for the order details, a "Thank you for your business..." message at the bottom, and so on.
Note that the above markup contains four placeholders - text surrounded by brackets. The idea here is that before we ask iTextSharp to turn the above markup into a PDF we will first replace those placeholders with the Order ID, total price, and other metrics for the order we are generating a receipt for.
Turning HTML into a PDF involves the following steps:
  1. Create a Document object.
  2. Create a PdfWriter object.
  3. Read in the HTML as a string.
  4. Call iTextSharp's HTMLWorker.ParseToList method, passing in the HTML to convert into PDF. This returns a collection of elements.
  5. Add each element returned in Step 3 to the Document object
Steps 1 and 2 are identical to the first two steps for creating a PDF document from scratch. For our demo, Step 3 - reading in the HTML - involves two sub-steps. First, we must read in the HTML contents of the Receipt.htm template. Following that, we need to replace the placeholders with the appropriate values. In the demo available for download you'll find a page named ConvertHTMLtoPDF.aspx, which has the same user interface as CreatePDFFromScratch.aspx. In short, the page prompts the user to enter an Order ID, total price, and select what items were part of the order. These user-supplied values are what are used to populate the placeholders inReceipt.htm.
These two sub-steps - reading the contents of Receipt.htm into a string and then replacing the placeholders - are accomplished by the code snippet below:
// Read in the contents of the Receipt.htm file...
string contents = File.ReadAllText(Server.MapPath("~/HTMLTemplate/Receipt.htm"));

// Replace the placeholders with the user-specified text
contents = contents.Replace("[ORDERID]", txtOrderID.Text);
contents = contents.Replace("[TOTALPRICE]", Convert.ToDecimal(txtTotalPrice.Text).ToString("c"));
contents = contents.Replace("[ORDERDATE]", DateTime.Now.ToShortDateString());

...

The above code snippet does not include the code that sets the [ITEMS] placeholder, which is where the order details are displayed. The code is a little lengthy, but it's not terribly complex. The code simply builds the markup for a <table> by looping through the CheckBoxList and adding a table row (<tr>) for each selected purchased item.
Once the HTML string has been composed we are ready for Steps 4 and 5. Step 4 - calling iTextSharp's HTMLWorker.ParseToListmethod - parses the HTML string and returns a collection of elements. Step 5 enumerates this collection of elements, adding them to the Document object.
...

// Step 4: Parse the HTML string into a collection of elements...
var parsedHtmlElements = HTMLWorker.ParseToList(new StringReader(contents), null);

// Enumerate the elements, adding each one to the Document...
foreach (var htmlElement in parsedHtmlElements)
   document.Add(htmlElement as IElement);

That's all there is to it! Keep in mind that the HTML parser is simply converting HTML into elements that can be added to the PDF document. In addition to adding these parsed elements you can also add elements you create, just like we did in our earlier demo (CreatePDFFromScratch.aspx). For instance, we can add the logo to the upper right corner of the receipt using the same code as before:
var logo = iTextSharp.text.Image.GetInstance(Server.MapPath("~/Images/4guysfromrolla.gif"));
logo.SetAbsolutePosition(440, 800);
document.Add(logo);

The generated PDF, shown below, is quite similar to the receipt created from the ground up.
The PDF receipt - save for the logo in the upper right corner - was created from an HTML template.
If you decide to use iTextSharp's HTML to PDF capabilities, keep in mind that they are pretty rudimentary. iTextSharp will not correctly parse a complex HTML document with many layers and overlays and it's stylesheet support is limited (although I hear this has been improved upon in iTextSharp 5.0). For maximum control you will want to either create PDFs from the ground up using the techniques discussed at the start of this article or you will want to create your PDFs using Adobe Acrobat with form fields to fill in the dynamic bits, as was discussed in Filling in PDF Forms with ASP.NET and iTextSharp.

Post a Comment

0 Comments