com.bm.utils.csv
Class CSVParser

java.lang.Object
  extended by com.bm.utils.csv.CSVParser
All Implemented Interfaces:
CSVParse

public class CSVParser
extends java.lang.Object
implements CSVParse

CSV is a file format used as a portable representation of a database. Each line is one entry or record and the fields in a record are separated by commas. Commas may be preceded or followed by arbitrary space and/or tab characters which are ignored.

If field includes a comma or a new line, the whole field must be surrounded with double quotes. When the field is in quotes, any quote literals must be escaped by \" Backslash literals must be escaped by \\. Otherwise a backslash and the character following will be treated as the following character, IE. "\n" is equivalent to "n". Other escape sequences may be set using the setEscapes() method. Text that comes after quotes that have been closed but come before the next comma will be ignored.

Empty fields are returned as as String of length zero: "". The following line has three empty fields and three non-empty fields in it. There is an empty field on each end, and one in the middle. One token is returned as a space.

   ,second,," ",fifth,
 

Blank lines are always ignored. Other lines will be ignored if they start with a comment character as set by the setCommentStart() method.

An example of how CVSLexer might be used:

 CSVParser shredder = new CSVParser(System.in);
 shredder.setCommentStart("#;!");
 shredder.setEscapes("nrtf", "\n\r\t\f");
 String t;
 while ((t = shredder.nextValue()) != null) {
        System.out.println("" + shredder.lastLineNumber() + " " + t);
 }
 

Some applications do not output CSV according to the generally accepted standards and this parse may not be able to handle it. One such application is the Microsoft Excel spreadsheet. A separate class must be use to read

Since:
17.04.2006
Author:
Daniel Wiese

Constructor Summary
CSVParser(java.io.InputStream in)
          Create a parser to parse comma separated values from an InputStream.
CSVParser(java.io.InputStream in, char delimiter)
          Create a parser to parse delimited values from an InputStream.
CSVParser(java.io.InputStream in, char delimiter, java.lang.String escapes, java.lang.String replacements, java.lang.String commentDelims)
          Create a parser to parse delimited values from an InputStream.
CSVParser(java.io.InputStream in, java.lang.String escapes, java.lang.String replacements, java.lang.String commentDelims)
          Create a parser to parse comma separated values from an InputStream.
CSVParser(java.io.Reader in)
          Create a parser to parse comma separated values from a Reader.
CSVParser(java.io.Reader in, char delimiter)
          Create a parser to parse delimited values from a Reader.
CSVParser(java.io.Reader in, char delimiter, java.lang.String escapes, java.lang.String replacements, java.lang.String commentDelims)
          Create a parser to parse delimited values from a Reader.
CSVParser(java.io.Reader in, java.lang.String escapes, java.lang.String replacements, java.lang.String commentDelims)
          Create a parser to parse comma separated values from a Reader.
 
Method Summary
 void changeDelimiter(char newDelim)
          Change this parser so that it uses a new delimiter.
 void changeQuote(char newQuote)
          Change this parser so that it uses a new character for quoting.
 void close()
          Close any stream upon which this parser is based.
 int getLastLineNumber()
          Get the number of the line from which the last value was retrieved.
 int lastLineNumber()
          Get the line number that the last token came from.
 java.lang.String nextValue()
          get the next value.
 void setCommentStart(java.lang.String commentDelims)
          Set the characters that indicate a comment at the beginning of the line.
 void setEscapes(java.lang.String escapes, java.lang.String replacements)
          Specify escape sequences and their replacements.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CSVParser

public CSVParser(java.io.InputStream in)
Create a parser to parse comma separated values from an InputStream.

Byte to character conversion is done using the platform default locale.

Parameters:
in - stream that contains comma separated values.

CSVParser

public CSVParser(java.io.InputStream in,
                 char delimiter)
Create a parser to parse delimited values from an InputStream.

Byte to character conversion is done using the platform default locale.

Parameters:
in - stream that contains comma separated values.
delimiter - record separator
Throws:
BadDelimiterException - if the specified delimiter cannot be used

CSVParser

public CSVParser(java.io.Reader in)
Create a parser to parse comma separated values from a Reader.

Parameters:
in - reader that contains comma separated values.

CSVParser

public CSVParser(java.io.Reader in,
                 char delimiter)
Create a parser to parse delimited values from a Reader.

Parameters:
in - reader that contains comma separated values.
delimiter - record separator
Throws:
BadDelimiterException - if the specified delimiter cannot be used

CSVParser

public CSVParser(java.io.InputStream in,
                 char delimiter,
                 java.lang.String escapes,
                 java.lang.String replacements,
                 java.lang.String commentDelims)
Create a parser to parse delimited values from an InputStream.

Byte to character conversion is done using the platform default locale.

Parameters:
in - stream that contains comma separated values.
escapes - a list of characters that will represent escape sequences.
replacements - the list of replacement characters for those escape sequences.
commentDelims - list of characters a comment line may start with.
delimiter - record separator
Throws:
BadDelimiterException - if the specified delimiter cannot be used

CSVParser

public CSVParser(java.io.InputStream in,
                 java.lang.String escapes,
                 java.lang.String replacements,
                 java.lang.String commentDelims)
Create a parser to parse comma separated values from an InputStream.

Byte to character conversion is done using the platform default locale.

Parameters:
in - stream that contains comma separated values.
escapes - a list of characters that will represent escape sequences.
replacements - the list of replacement characters for those escape sequences.
commentDelims - list of characters a comment line may start with.

CSVParser

public CSVParser(java.io.Reader in,
                 char delimiter,
                 java.lang.String escapes,
                 java.lang.String replacements,
                 java.lang.String commentDelims)
Create a parser to parse delimited values from a Reader.

Parameters:
in - reader that contains comma separated values.
escapes - a list of characters that will represent escape sequences.
replacements - the list of replacement characters for those escape sequences.
commentDelims - list of characters a comment line may start with.
delimiter - record separator
Throws:
BadDelimiterException - if the specified delimiter cannot be used

CSVParser

public CSVParser(java.io.Reader in,
                 java.lang.String escapes,
                 java.lang.String replacements,
                 java.lang.String commentDelims)
Create a parser to parse comma separated values from a Reader.

Parameters:
in - reader that contains comma separated values.
escapes - a list of characters that will represent escape sequences.
replacements - the list of replacement characters for those escape sequences.
commentDelims - list of characters a comment line may start with.
Method Detail

close

public void close()
           throws java.io.IOException
Close any stream upon which this parser is based.

Specified by:
close in interface CSVParse
Throws:
java.io.IOException - if an error occurs while closing the stream.

nextValue

public java.lang.String nextValue()
                           throws java.io.IOException
get the next value.

Specified by:
nextValue in interface CSVParse
Returns:
the next value or null if there are no more values.
Throws:
java.io.IOException - if an error occurs while reading.

lastLineNumber

public int lastLineNumber()
Get the line number that the last token came from.

New line breaks that occur in the middle of a token are no counted in the line number count.

Specified by:
lastLineNumber in interface CSVParse
Returns:
line number or -1 if no tokens have been returned yet.

setEscapes

public void setEscapes(java.lang.String escapes,
                       java.lang.String replacements)
Specify escape sequences and their replacements. Escape sequences set here are in addition to \\ and \". \\ and \" are always valid escape sequences. This method allows standard escape sequenced to be used. For example "\n" can be set to be a newline rather than an 'n'. A common way to call this method might be:
setEscapes("nrtf", "\n\r\t\f");
which would set the escape sequences to be the Java escape sequences. Characters that follow a \ that are not escape sequences will still be interpreted as that character.
The two arguments to this method must be the same length. If they are not, the longer of the two will be truncated.

Parameters:
escapes - a list of characters that will represent escape sequences.
replacements - the list of replacement characters for those escape sequences.

changeDelimiter

public void changeDelimiter(char newDelim)
Change this parser so that it uses a new delimiter.

The initial character is a comma, the delimiter cannot be changed to a quote or other character that has special meaning in CSV.

Specified by:
changeDelimiter in interface CSVParse
Parameters:
newDelim - delimiter to which to switch.
Throws:
BadDelimiterException - if the character cannot be used as a delimiter.

changeQuote

public void changeQuote(char newQuote)
Change this parser so that it uses a new character for quoting.

The initial character is a double quote ("), the delimiter cannot be changed to a comma or other character that has special meaning in CSV.

Specified by:
changeQuote in interface CSVParse
Parameters:
newQuote - character to use for quoting.
Throws:
BadQuoteException - if the character cannot be used as a quote.

setCommentStart

public void setCommentStart(java.lang.String commentDelims)
Set the characters that indicate a comment at the beginning of the line. For example if the string "#;!" were passed in, all of the following lines would be comments:
    # Comment
    ; Another Comment
    ! Yet another comment
 
By default there are no comments in CVS files. Commas and quotes may not be used to indicate comment lines.

Parameters:
commentDelims - list of characters a comment line may start with.

getLastLineNumber

public int getLastLineNumber()
Get the number of the line from which the last value was retrieved.

Specified by:
getLastLineNumber in interface CSVParse
Returns:
line number or -1 if no tokens have been returned.


Copyright © 2008. All Rights Reserved.