Introduction
The Portable Document Format (PDF) is a popular file format for documents. PDF files are a popular document format for two primary reasons: first, because the PDF standard is an open standard, there are many vendors that provide PDF readers across virtually all operating systems, and many proprietary programs, such as Microsoft Word, include a "Save as PDF" option. Consequently, PDFs serve as a sort of common currency of exchange. A person writing a document using Microsoft Word for Windows can save the document as a PDF, which can then be read by others whether or not they are using Windows and whether or not they have Microsoft Word installed. Second, PDF files are self-contained. Each PDF file includes its complete text, fonts, images, input fields, and other content. This means that even complicated documents with many images, an intricate layout, and with user interface elements like textboxes and checkboxes can be encapsulated in a single PDF file.
Due to their ubiquity and layout capabilities, it's not uncommon for a websites to use PDF technology. For example, when purchasing goods at an online store you may be offered the ability to download an invoice as a PDF file. PDFs also support form fields, which are user interface elements like textboxes, checkboxes, comboboxes, and the like. These form fields can be entered by a user viewing the PDF or, with a bit of code, they can be entered programmatically.
This article is the first in a multi-part series that examines how to programmatically work with PDF files from an ASP.NET application using iTextSharp, a .NET open source library for PDF generation. This installment shows how to use iTextSharp to open an existing PDF document with form fields, fill those form fields with user-supplied values, and then save the combined output to a new PDF file. Read on to learn more!
An Overview of the Demo Application
This article shows how to use iTextSharp to programmatically populate the form fields in an PDF document. To facilitate this discussion I created a demo available for download at the end of this article that shows how to programmatically populate the form fields in theIRS's Form W-9. (The IRS is the Internal Revenue Service for the United States, which is charged with collecting taxes. Form W-9 is used to provide taxpayer information to a requesting person or business.)In particular, the demo includes a web page named
CreateW9.aspx
that has a number of textboxes, radio buttons, and other input elements that prompt the user to provide taxpayer identification information. The screen shot below shows the CreateW9.aspx
page when viewed through a browser.PDFTemplates
folder. (This PDF file, fw9.pdf
, was downloaded from the IRS's website.) When the user opts to generate a W-9, iTextSharp creates a new PDF document that takes the original Form W-9 PDF and populates its form fields with the user-supplied values. This new PDF document is not saved on the web server; rather, it is streamed back directly to the client's browser, prompting them to save or open it.The screen shot of the PDF below shows the PDF generated by
CreateW9.aspx
based on the user's inputs.Getting Started with iTextSharp
There are a variety of .NET libraries available to programmatically create PDF documents. Perhaps the most popular is iTextSharp, which is the .NET version of the Java-based iText PDF library.Part of iTextSharp's popularity stems from the fact that it's open source. However, it's important to keep in mind that starting with version 5.0, iTextSharp is released under the GNU Affero General Public License (AGPL) version 3. This license requires that any application that uses iTextSharp must also be released under the same license and that you must make your application's source code freely available (like iTextSharp's is). You can optionally buy a license to be released from the AGPL. While version 5.0 (and beyond) is released under the more restrictive AGPL, previous versions were released under the GNU Lesser General Public License (LGPL), which allows the use of iTextSharp within an application without requiring that the application also be released under the LGPL. In other words, by using version 4 or earlier you can use iTextSharp in your web application without having to buy a license and without having to release your web application's source code. (The download available at the end of the article uses iTextSharp version 4.1.6.)
You can download iTextSharp from its project page at: http://sourceforge.net/projects/itextsharp/. (Alternatively, you can download the code at the end of this article, which includes the iTextSharp version 4.1.6 assembly in the
Bin
folder, named itextsharp.dll
.) For assistance with iTextSharp, I suggest the iText-question listserv.Determining Form Fields Details
In a moment we'll talk about how to use iTextSharp to take a user-supplied value and stick in in a form field. In a nutshell, it involves one line of code:
formFields.SetField(fieldName, value); |
The above code assigns value to the form field named fieldName.
In order to populate the form fields in a PDF document we need to know the names of the fields. If you are the one who created the PDF document then you already know the field names, but what if you were given the PDF document or downloaded it (like how I downloaded the Form W-9)? If you have Adobe Acrobat on your computer then you are in luck - you can open the PDF in Adobe Acrobat and view the properties for each form field, which includes its name.
If all you have is the free Adobe Reader then things are a bit more challenging, as Adobe Reader does not provide details about the form field elements. However, using iTextSharp we can write a bit of code to get all of the fields and display their information on a web page. The demo includes a page named
ListFormFields.aspx
that has two options:- Show Form Fields - displays the name and type of each field in the PDF document as a numbered list. For checkbox fields, the Export Value is also displayed, which is the value you need to set the field to in order to check the checkbox. (More on this later!)
- Generate Sample PDF - generates a PDF with the form fields filled in with the values 1, 2, .... When used with the Show Form Fields options you can match up the number of the form field listed on the web page with the form field number in the generated PDF to see what field name corresponds to what field position in the PDF.
ListFormFields.aspx
when using the Show Form Fields for the Form W-9 PDF (fw9.pf
). For Form W-9, the form field names are a bit lengthy - for instance, the Name text field is named topmostSubform[0].Page1[0].f1_01_0_[0]
- but this output clearly shows the name of each form field on the document along with its type (TextField or CheckBox). Also, for the checkboxes the Export Value is reported.topmostSubform[0].Page1[0].f1_01_0_[0]
(the first item displayed in the Show Form Fields list) whereas the Business name text field's name is topmostSubform[0].Page1[0].f1_02_0_[0]
(the 21st item displayed in the Show Form Fields list).ListFormFields.aspx
page to determine a PDF's form fields is doable, but is a bit of a hassle. If you routinely work with PDF files you'll find the form field tools in Adobe Acrobat to be well worth the purchase price.Generating a New PDF with Form Field Values Filled Programmatically
iTextSharp makes it easy to fill the form fields in an existing PDF, creating a new PDF in the process. Start by creating a new
PdfReader
object. (The PdfReader
class is one of many classes provided by the iTextSharp library. This class, along with the others used in this demo, are found in the iTextSharp.text.pdf
namespace, so add an appropriate using
or Imports
statement to the top of your class file.)var reader = new PdfReader(pdfPath); |
pdfPath is the full physical path to the PDF file that contains the form fields. Recall that in the demo the Form W-9 file is stored in the
~/PDFTemplates
folder and is named fw9.pdf
. Therefore, we'd use a pdfPath value of Server.MapPath("~/PDFTemplates/fw9.pdf")
.Next, we need to create a
PdfStamper
object. The PdfStamper
is used to populate the form fields in a PDF document and generates a new PDF. This new PDF is saved to a stream that you must specify when creating the PdfStamper
object. In the demo we are interested in streaming the generated PDF back to the user - there's no need to save it so we have the generated PDF outputted to a MemoryStream
. To save the generated PDF to disk, you could have the PdfStamper
output to a FileStream
, instead.... |
With the
PdfStamper
object created we're now ready to assign values to the PDF's form fields. The PdfStamper
object has a property named AcroFields
that returns an AcroFields
object. This object has a SetField
method, which is used to assign a value to a particular field in the PDF.For example, to assign the value "Scott Mitchell" to the Name form field (which has a name of
topmostSubform[0].Page1[0].f1_01_0_[0]
) and the value "N/A" to the List account number(s) here field (which has a name oftopmostSubform[0].Page1[0].f1_07_0_[0]
) we would use the following code:... |
Things are a little more complicated with checkboxes. To check a checkbox you need to call the
SetField
method and pass in the checkbox's Export Value as the value. For example, to check the Individual/sole proprietor checkbox, which has a name oftopmostSubform[0].Page1[0].c1_01[0]
and an Export Value of 1, and the Exempt payee checkbox, which has a name oftopmostSubform[0].Page1[0].c1_01[7]
and an Export Value of 8, we'd use the following code:... |
Note how we pass in "1" to check the Individual/sole proprietor checkbox and "8" to check the Exempt payee checkbox. The precise value passed in to check a checkbox depends on the checkbox field's Export Value - there is no hard and fast standard, unfortunately. Therefore, to check a checkbox you must know its Export Value.
Once the form fields have been populated we need to close the
PdfStamper
and PdfReader
objects. You can optionally indicate whether the generated PDF document's form fields should still be editable by setting the PdfStamper
object's FormFlattening
property. Setting this property to true
indicates that the form fields should no longer be editable in the generated document.... |
Once the
PdfStamper
has been closed the stream specified when instantiating the PdfStamper
object contains the generated PDF. If you used a FileStream
object then that means the generated PDF now exists on disk. In the demo I use a MemoryStream
, which means the generated PDF now resides in memory. At this point we're ready to send it back to the browser for display. This is accomplished using the following code:... |
The first line of code adds the
Content-Disposition
HTTP header. This tells the browser to treat the content like an attachment, meaning the user will be prompted whether to open or save the PDF (rather than having it open directly in the browser window). The second line of code tells the browser the type of content it is being sent. application/pdf
is the standard MIME type for PDF documents; this notifies the browser that it is receiving a PDF document.The
Response.BinaryWrite
statement writes back the contents of a specified byte
array. Recall that output
is the MemoryStream
we created earlier. output.ToArray()
returns the contents of the MemoryStream
- namely, the binary contents of the generated PDF document - as a byte array, which is then sent down to the client.A Note About the Demo Code... |
---|
The code presented in the text of this article is a simplified version of the code in the demo available for download. The demo encapsulates much of the above functionality in a series of classes in the App_Code folder. For example, the code that streams the PDF back to the client is handled by the PDFHelper class's ReturnPDF method.The code presented in this article's text exists (verbatim) in the demo, but not all in one spot like the text above implies. I think the structure of the code in the demo is more reusable and easy to understand once you examine it, but I wanted to point this out in case you go to the demo and get confused because you cannot find the exact code snippet presented in the text above. |
Conclusion... And Looking Forward
This article (and demo) showed how to use ASP.NET and iTextSharp to programmatically fill the form fields in a PDF. In particular, we saw how to populate the form fields of the IRS's Form W-9 PDF file. This article is just the first in a series of articles that explore using iTextSharp to work with PDF documents in an ASP.NET application. Future installments will detail how to programmatically create PDFs, among other topics.
0 Comments