Monday, January 07, 2013

Reading multiple files with servlets. HTML 5.

Intro

In this example we check how to process the form having one input of file type where a user chooses several files at once. We use Servlet specification of version 3.0 which is implemented in Tomcat server 7. First of all I'd recommend to get familiar with this post to perform proper configuration of your Java application and development environment. Then.. Let's start.
The example (tutorial) application will use simple html page (JSP is not used in this example). That page has one form allowing to choose several files through one dialog and then post button to submit the data. Under the form there will be a frame where the form will return the result to.
Not to make the example too complex lets say our application will be printing the content of each file to that frame with the corresponding filenames. Once you understand how it works you may process the files in any other way rather than printing their content.
So we need to pass through several steps to accomplish the goal. They are:

  1. Create html form
  2. Create servlet
  3. Process the data (read multiple files with servlet)

HTML Form

Form is the facility to post your data from your client to the server. Either you're using CGI approach or servlet like in this example you have to consider two things:
  • which data and in which format you're going to send
  • which URL the server should interpret as your data handler
The simple page body we'll be using looks like this

HTML form - Upload files with servlet example


<body>
 <form id="example-form" 
  action="HandlerServlet" 
  method="post"
  enctype="multipart/form-data" 
  target="resultFrame">
   <input type="file" name="example-input" multiple />
   <input type="submit"/>
 </form>
 <br />
 <iframe name="resultFrame" frameborder="1" width="100%" height="400" scrolling="yes" />
</body>



There are some important things in lines: 3,5,6 and 7.

  • HandlerServlet is actually relative URL to be considered by the server as handler of the data that is coming from your form
  • enctype is the attribute which should be set to "multipart/form-data" if you're going send file content. Otherwise you'll be sending the only file names
  • target attribute specifies the frame the server response will be printed out to (you should have the frame with corresponding name on your page like we do)
  • multiple is the attribute wich should be specified if you'd like to choose several files simultaneously (not working in IE pre 10)
Now you're ready to write some server code.

Creating servlet

Writing a servlet is not a rocket science. You simply need to create new class, extend it from javax.servlet.http.HttpServlet and override some methods. We're going to override the only one - "protected void doPost(..)" which actually takes the client request wrapped with "HttpServletRequest req". Latter contains all the data we need to reach our goal (retrieve the content of the files sent from the client). So the skeleton will look like this:

@MultipartConfig
public class HandlerServlet extends HttpServlet{

 /**
  * 
  */
 private static final long serialVersionUID = -717277851360696409L;

 @Override
 protected void doPost(HttpServletRequest req, HttpServletResponse resp)
   throws ServletException, IOException {

 }
}
NB! Notice that the annotation (line: 1) is pretty much important. Having no such one specified you would not be able to parse your request into the parts.
Register your servlet in web.xml file so that the server knows which URL is associated with it. Use this example:

  <servlet>
    <description></description>
    <display-name>HandlerServlet</display-name>
    <servlet-name>HandlerServlet</servlet-name>
    <servlet-class>your.package.HandlerServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>HandlerServlet</servlet-name>
    <url-pattern>/HandlerServlet</url-pattern>
  </servlet-mapping>

, where your.package is the package you place your servlet to.

Lets now distract from the java code and check what the client actually sends. To know that we need to use some traffic sniffer. I use Fiddler. So, I prepared two files and going to send them to the server (no matter we have no one yet).
If you configured your project right (as described here) you should be ale to start your project. Start it and open your page with the form. Send some files (nothing will be done on the server as we haven't implemented doPost yet). 
The data captured by Fiddler looks like this

----------------------------------------------------------------------
-----------------------------187161971819895
Content-Disposition: form-data; name="example-input"; filename="file1.txt"
Content-Type: text/plain

string 11
string 12
-----------------------------187161971819895
Content-Disposition: form-data; name="example-input"; filename="file2.txt"
Content-Type: text/plain

string 21
string 22
string 23
-----------------------------187161971819895--
----------------------------------------------------------------------

Processing the data

The request sends the data divided by the parts. Once we get the request in the servlet we need the approach to distinguish those parts and handle each piece.
Multipart schema - Upload several files with servlet example

Servlet specification of version 3.0 provides convenient mechanism to parse such the parts. So let's see how to get the pieces of the request data

Collection<Part> webParts = req.getParts();

Now you have the collection of the separate parts. The last thing to we need to do is somehow print the file content and the file name. Here we encounter another pitfall of mutlipart processing. As it is seen from the capture dump both the files are assigned with the same input name "example-input" so we have to distinguish them basing on something else. The good choice is to do that by the file name. So we need to get Content-Disposition header for each retrieved part. Add the following method to your servlet:

 private String getFileName(Part part){
  Pattern p = Pattern.compile("filename=\"(.+?)\"");
  Matcher m = p.matcher(part.getHeader("Content-Disposition"));
  if(m.find()){
   return m.group(1);
  }else{
   throw new IllegalStateException("Cannot fin filename in web request header.");
  }
 }

This method will help us to determine the file name which is stored in the part we're currently processing.
Each part provides the inputStream so that one can retrieve the data from the part. Lets finally do the following steps:
  1. Get all the parts which are coming with the request
  2. While coming through the collection get the inputStream of the item
  3. Get the name of sequential file from the part
  4. Wrap the inputStream of the sequential part with LineNumberReader.
  5. Read the part content line-by-line
  6. Print the lines to the servlet output (use resp.getWriter().println(..))
So the final result will be looking so:

@MultipartConfig
public class HandlerServlet extends HttpServlet{

 /**
  * 
  */
 private static final long serialVersionUID = -717277851360696409L;

 @Override
 protected void doPost(HttpServletRequest req, HttpServletResponse resp)
   throws ServletException, IOException {

  Collection<Part> webParts = req.getParts();
  for(Part webPart: webParts){
   resp.getWriter().println("File Name: " + getFileName(webPart));
   LineNumberReader lnr = new LineNumberReader(new InputStreamReader(webPart.getInputStream()));
   String line = null;
   while((line = lnr.readLine()) != null){
    resp.getWriter().println(line);
   }
  }
  resp.flushBuffer();  
 }
 
 private String getFileName(Part part){
  Pattern p = Pattern.compile("filename=\"(.+?)\"");
  Matcher m = p.matcher(part.getHeader("Content-Disposition"));
  if(m.find()){
   return m.group(1);
  }else{
   throw new IllegalStateException("Cannot fin filename in web request header.");
  }
 }

}
That is it!