Header Ads Widget

Ticker

6/recent/ticker-posts

ASP.NET - Filling in PDF Forms with iTextSharp


Introduction


The Portable Document Format (PDF) is a popular file format for documents. PDF files are a popular document format for two primary reasons: first, because the PDF standard is an open standard, there are many vendors that provide PDF readers across virtually all operating systems, and many proprietary programs, such as Microsoft Word, include a "Save as PDF" option. Consequently, PDFs serve as a sort of common currency of exchange. A person writing a document using Microsoft Word for Windows can save the document as a PDF, which can then be read by others whether or not they are using Windows and whether or not they have Microsoft Word installed. Second, PDF files are self-contained. Each PDF file includes its complete text, fonts, images, input fields, and other content. This means that even complicated documents with many images, an intricate layout, and with user interface elements like textboxes and checkboxes can be encapsulated in a single PDF file.
Due to their ubiquity and layout capabilities, it's not uncommon for a websites to use PDF technology. For example, when purchasing goods at an online store you may be offered the ability to download an invoice as a PDF file. PDFs also support form fields, which are user interface elements like textboxes, checkboxes, comboboxes, and the like. These form fields can be entered by a user viewing the PDF or, with a bit of code, they can be entered programmatically.
This article is the first in a multi-part series that examines how to programmatically work with PDF files from an ASP.NET application using iTextSharp, a .NET open source library for PDF generation. This installment shows how to use iTextSharp to open an existing PDF document with form fields, fill those form fields with user-supplied values, and then save the combined output to a new PDF file. Read on to learn more!

An Overview of the Demo Application


This article shows how to use iTextSharp to programmatically populate the form fields in an PDF document. To facilitate this discussion I created a demo available for download at the end of this article that shows how to programmatically populate the form fields in theIRS's Form W-9. (The IRS is the Internal Revenue Service for the United States, which is charged with collecting taxes. Form W-9 is used to provide taxpayer information to a requesting person or business.)In particular, the demo includes a web page named CreateW9.aspx that has a number of textboxes, radio buttons, and other input elements that prompt the user to provide taxpayer identification information. The screen shot below shows the CreateW9.aspx page when viewed through a browser.

The web page prompts the user for information required by the IRS Form W-9.
After entering the information and clicking the "Generate Completed W-9" button (not shown in the above screen shot), there is a postback. On postback the code-behind class uses the iTextSharp library to generate a Form W-9 PDF document whose form fields contain the text entered by the user. The original Form W-9 PDF document that contains the form fields is stored on the web server in the PDFTemplates folder. (This PDF file, fw9.pdf, was downloaded from the IRS's website.) When the user opts to generate a W-9, iTextSharp creates a new PDF document that takes the original Form W-9 PDF and populates its form fields with the user-supplied values. This new PDF document is not saved on the web server; rather, it is streamed back directly to the client's browser, prompting them to save or open it.
The screen shot of the PDF below shows the PDF generated by CreateW9.aspx based on the user's inputs.

The completed Form W-9, populated with the user's inputs.
While this article demonstrates filling form fields on an IRS-supplied PDF document, there's no reason why this technique could not be used to fill the forms on a PDF generated by you or your company. If your company has invoices, NDAs, or other forms that need to commonly be filled out based on user input or data residing in a database, you could create those PDFs to contain form fields and then write code to populate them (based either on user input and/or the results of a database query). To create your own PDFs graphically you will need to buy a copy of Adobe Acrobat. Future installments will explore how to create PDFs programmatically using iTextSharp.

Getting Started with iTextSharp


There are a variety of .NET libraries available to programmatically create PDF documents. Perhaps the most popular is iTextSharp, which is the .NET version of the Java-based iText PDF library.Part of iTextSharp's popularity stems from the fact that it's open source. However, it's important to keep in mind that starting with version 5.0, iTextSharp is released under the GNU Affero General Public License (AGPL) version 3. This license requires that any application that uses iTextSharp must also be released under the same license and that you must make your application's source code freely available (like iTextSharp's is). You can optionally buy a license to be released from the AGPL. While version 5.0 (and beyond) is released under the more restrictive AGPL, previous versions were released under the GNU Lesser General Public License (LGPL), which allows the use of iTextSharp within an application without requiring that the application also be released under the LGPL. In other words, by using version 4 or earlier you can use iTextSharp in your web application without having to buy a license and without having to release your web application's source code. (The download available at the end of the article uses iTextSharp version 4.1.6.)
You can download iTextSharp from its project page at: http://sourceforge.net/projects/itextsharp/. (Alternatively, you can download the code at the end of this article, which includes the iTextSharp version 4.1.6 assembly in the Bin folder, named itextsharp.dll.) For assistance with iTextSharp, I suggest the iText-question listserv.

Determining Form Fields Details


In a moment we'll talk about how to use iTextSharp to take a user-supplied value and stick in in a form field. In a nutshell, it involves one line of code:
formFields.SetField(fieldNamevalue);

The above code assigns value to the form field named fieldName.
In order to populate the form fields in a PDF document we need to know the names of the fields. If you are the one who created the PDF document then you already know the field names, but what if you were given the PDF document or downloaded it (like how I downloaded the Form W-9)? If you have Adobe Acrobat on your computer then you are in luck - you can open the PDF in Adobe Acrobat and view the properties for each form field, which includes its name.
If all you have is the free Adobe Reader then things are a bit more challenging, as Adobe Reader does not provide details about the form field elements. However, using iTextSharp we can write a bit of code to get all of the fields and display their information on a web page. The demo includes a page named ListFormFields.aspx that has two options:
  • Show Form Fields - displays the name and type of each field in the PDF document as a numbered list. For checkbox fields, the Export Value is also displayed, which is the value you need to set the field to in order to check the checkbox. (More on this later!)
  • Generate Sample PDF - generates a PDF with the form fields filled in with the values 1, 2, .... When used with the Show Form Fields options you can match up the number of the form field listed on the web page with the form field number in the generated PDF to see what field name corresponds to what field position in the PDF.
The screen shot below shows the ListFormFields.aspx when using the Show Form Fields for the Form W-9 PDF (fw9.pf). For Form W-9, the form field names are a bit lengthy - for instance, the Name text field is named topmostSubform[0].Page1[0].f1_01_0_[0] - but this output clearly shows the name of each form field on the document along with its type (TextField or CheckBox). Also, for the checkboxes the Export Value is reported.
The form fields in the Form W-9 are listed.
The Generate Sample PDF option creates a PDF with each form field value populated with its corresponding index value. The PDF screen shot below shows that the Name text field has a value of 1, whereas the Business name field has a value of 21. This indicates that the Name text field's name is topmostSubform[0].Page1[0].f1_01_0_[0] (the first item displayed in the Show Form Fields list) whereas the Business name text field's name is topmostSubform[0].Page1[0].f1_02_0_[0] (the 21st item displayed in the Show Form Fields list).


The generated PDF displays the form field's index value in each form field element.
Using the ListFormFields.aspx page to determine a PDF's form fields is doable, but is a bit of a hassle. If you routinely work with PDF files you'll find the form field tools in Adobe Acrobat to be well worth the purchase price.

Generating a New PDF with Form Field Values Filled Programmatically


iTextSharp makes it easy to fill the form fields in an existing PDF, creating a new PDF in the process. Start by creating a new PdfReaderobject. (The PdfReader class is one of many classes provided by the iTextSharp library. This class, along with the others used in this demo, are found in the iTextSharp.text.pdf namespace, so add an appropriate using or Imports statement to the top of your class file.)
var reader = new PdfReader(pdfPath);

pdfPath is the full physical path to the PDF file that contains the form fields. Recall that in the demo the Form W-9 file is stored in the~/PDFTemplates folder and is named fw9.pdf. Therefore, we'd use a pdfPath value of Server.MapPath("~/PDFTemplates/fw9.pdf").
Next, we need to create a PdfStamper object. The PdfStamper is used to populate the form fields in a PDF document and generates a new PDF. This new PDF is saved to a stream that you must specify when creating the PdfStamper object. In the demo we are interested in streaming the generated PDF back to the user - there's no need to save it so we have the generated PDF outputted to a MemoryStream. To save the generated PDF to disk, you could have the PdfStamper output to a FileStream, instead.

...

var output = new MemoryStream();
var stamper = new PdfStamper(reader, output);

With the PdfStamper object created we're now ready to assign values to the PDF's form fields. The PdfStamper object has a property named AcroFields that returns an AcroFields object. This object has a SetField method, which is used to assign a value to a particular field in the PDF.
For example, to assign the value "Scott Mitchell" to the Name form field (which has a name oftopmostSubform[0].Page1[0].f1_01_0_[0]) and the value "N/A" to the List account number(s) here field (which has a name oftopmostSubform[0].Page1[0].f1_07_0_[0]) we would use the following code:

...

// Set the Name field to "Scott Mitchell"
stamper.AcroFields.SetField("topmostSubform[0].Page1[0].f1_01_0_[0]", "Scott Mitchell");

// Set the List account number(s) here field to "N/A"
stamper.AcroFields.SetField("topmostSubform[0].Page1[0].f1_07_0_[0]", "N/A");

Things are a little more complicated with checkboxes. To check a checkbox you need to call the SetField method and pass in the checkbox's Export Value as the value. For example, to check the Individual/sole proprietor checkbox, which has a name oftopmostSubform[0].Page1[0].c1_01[0] and an Export Value of 1, and the Exempt payee checkbox, which has a name oftopmostSubform[0].Page1[0].c1_01[7] and an Export Value of 8, we'd use the following code:

...

// Check the Individual/sole proprietor checkbox stamper.AcroFields.SetField("topmostSubform[0].Page1[0].c1_01[0]", "1");

// Check the Exempt payee checkbox stamper.AcroFields.SetField("topmostSubform[0].Page1[0].c1_01[7]", "8");

Note how we pass in "1" to check the Individual/sole proprietor checkbox and "8" to check the Exempt payee checkbox. The precise value passed in to check a checkbox depends on the checkbox field's Export Value - there is no hard and fast standard, unfortunately. Therefore, to check a checkbox you must know its Export Value.
Once the form fields have been populated we need to close the PdfStamper and PdfReader objects. You can optionally indicate whether the generated PDF document's form fields should still be editable by setting the PdfStamper object's FormFlatteningproperty. Setting this property to true indicates that the form fields should no longer be editable in the generated document.

...

// Form fields should no longer be editable
stamper.FormFlattening = true;

stamper.Close();
reader.Close();

Once the PdfStamper has been closed the stream specified when instantiating the PdfStamper object contains the generated PDF. If you used a FileStream object then that means the generated PDF now exists on disk. In the demo I use a MemoryStream, which means the generated PDF now resides in memory. At this point we're ready to send it back to the browser for display. This is accomplished using the following code:

...

Response.AddHeader("Content-Disposition", "attachment; filename=YourPDF.pdf");
Response.ContentType = "application/pdf";

Response.BinaryWrite(output.ToArray());
Response.End();

The first line of code adds the Content-Disposition HTTP header. This tells the browser to treat the content like an attachment, meaning the user will be prompted whether to open or save the PDF (rather than having it open directly in the browser window). The second line of code tells the browser the type of content it is being sent. application/pdf is the standard MIME type for PDF documents; this notifies the browser that it is receiving a PDF document.
The Response.BinaryWrite statement writes back the contents of a specified byte array. Recall that output is the MemoryStream we created earlier. output.ToArray() returns the contents of the MemoryStream - namely, the binary contents of the generated PDF document - as a byte array, which is then sent down to the client.

A Note About the Demo Code...
The code presented in the text of this article is a simplified version of the code in the demo available for download. The demo encapsulates much of the above functionality in a series of classes in the App_Code folder. For example, the code that streams the PDF back to the client is handled by the PDFHelper class's ReturnPDF method.The code presented in this article's text exists (verbatim) in the demo, but not all in one spot like the text above implies. I think the structure of the code in the demo is more reusable and easy to understand once you examine it, but I wanted to point this out in case you go to the demo and get confused because you cannot find the exact code snippet presented in the text above.


Conclusion... And Looking Forward


This article (and demo) showed how to use ASP.NET and iTextSharp to programmatically fill the form fields in a PDF. In particular, we saw how to populate the form fields of the IRS's Form W-9 PDF file. This article is just the first in a series of articles that explore using iTextSharp to work with PDF documents in an ASP.NET application. Future installments will detail how to programmatically create PDFs, among other topics.

Post a Comment

0 Comments