Database Forum / DB2 Topics / June 2006
thoughts on regular expression matching
|
|
Thread rating:  |
Serge Rielau - 18 Jun 2006 18:56 GMT I'm playing with the ICU library (slow weekend :-) and it's quite simple to expose the regular expression functionality for Unicode databases.
Presuming that the majority is not on unicode, would a developerWorks article which supports ASCII only for non unicode DBs be useful? (Otherwise I'll have to do some serious digging for translation tables)
Here is an example of what I've got:
-- validate Canadian postal code: db2 => values matches('L0H1H0', '[A-Z][0-9][A-Z][0-9][A-Z][0-9]');
1 ------ 1
1 record(s) selected.
Other functions possible are listen here: http://icu.sourceforge.net/userguide/regexp.html
Input welcome Serge
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
IOD Conference http://www.ibm.com/software/data/ondemandbusiness/conf2006/
Bob Stearns - 19 Jun 2006 04:19 GMT > I'm playing with the ICU library (slow weekend :-) and it's quite simple > to expose the regular expression functionality for Unicode databases. [quoted text clipped - 19 lines] > Input welcome > Serge It would indeed. I miss RE pattern matching in DB2. I've been using v2/sed/awk/perl for too long. I know it is too much to ask for, but do the functions exposed include anything like PHP's ereg_replace?
4.spam@mail.ru - 19 Jun 2006 06:51 GMT Hello.
I think the easiest way to get regex functionality in DB2 is using some java package (or using built-in java 1.4 functionality). You have to write fairly simple java UDFs using this package to get match, group, replace regex functionality. I have been doing so since DB2 v6.1. Packages: First of all it was IBM's regex4j.jar (no replace functionality), now it is package from http://jregex.sourceforge.net.
If someone needs samples, write at mark(dot)b(at)mail(dot)ru
Sincerely, Mark B.
> It would indeed. I miss RE pattern matching in DB2. I've been using > v2/sed/awk/perl for too long. I know it is too much to ask for, but do > the functions exposed include anything like PHP's ereg_replace? gimme_this_gimme_that@yahoo.com - 20 Jun 2006 01:34 GMT I could use installation instructions.
I've tried to use the Java oro library but no matter what I do DB2 can't find the jars with the UDFs I put in it's jar directory.
I studied several articles on IBMs site but nothing worked for me.
I couldn't even create a UDF to return an integer value, say 5.
4.spam@mail.ru - 20 Jun 2006 08:04 GMT Hello.
Jar directory??? If you need some jar (zip) for your UDF you have to register this jar by calling SP sqlj.install_jar(...); If you bundle your own classes into jar you have to do the same with this jar. If you do not bundle your own classes into jar you have to place your class file into sqllib/function directory. And don't forget that you have to place your class into the same directory structure inside sqllib/function directory according the package name of your java class.
Sincerely, Mark B.
> I could use installation instructions. > [quoted text clipped - 4 lines] > > I couldn't even create a UDF to return an integer value, say 5. gimme_this_gimme_that@yahoo.com - 22 Jun 2006 00:50 GMT > If you need some jar (zip) for your UDF you have to register this jar by calling SP sqlj.install_jar(...); Uh huh. "..." That's not much help.
...
I'm not certain what it means to have EXECUTE privilige on SQLJ.INSTALL_JAR but I can run java programs.
I tried various:
call sqlj.install_jar('file://W:/Program Files/IBM/SQLLIB/FUNCTION/",UdfUtils.jar',0);
And used this SQL to register RegEx as:
CREATE FUNCTION RegExp(pattern VARCHAR(2048), string CLOB(10M)) RETURNS INTEGER FENCED VARIANT NO SQL LANGUAGE JAVA PARAMETER STYLE JAVA EXTERNAL NAME 'com.rh.bb.udf.utils.RegExp!Match'
> If you bundle your own classes into jar you have to do the same with this jar. I named the jar file UdfUtils.jar .
I placed it in:
W:\Program Files\IBM\SQLLIB\FUNCTION/UdfUtils.jar
And FYI :
jar tvf UdfUtils.jar 0 Tue Mar 28 01:39:38 PST 2006 META-INF/ 55 Tue Mar 28 01:39:38 PST 2006 META-INF/MANIFEST.MF 0 Tue Mar 28 01:39:38 PST 2006 com/ 0 Tue Mar 28 01:39:38 PST 2006 com/rh/ 0 Tue Mar 28 01:39:38 PST 2006 com/rh/bb/ 0 Tue Mar 28 01:39:38 PST 2006 com/rh/bb/udf/ 0 Tue Mar 28 01:39:38 PST 2006 com/rh/bb/udf/utils/ 703 Tue Mar 28 01:39:38 PST 2006 com/rh/bb/udf/utils/RegExp.class
Here is RegExp.java:
package com.rh.bb.udf.utils;
import java.sql.Clob;
import java.util.regex.Pattern; import java.util.regex.Matcher;
import COM.ibm.db2.app.UDF;
public class RegExp extends UDF {
public static int Match(java.lang.String pattern, java.sql.Clob clob) { Pattern p = Pattern.compile(pattern); Matcher m = p.matcher( clob.getSubString(Long.parseLong("1"), (int)clob.length()) ); if ( m.matches() ) { return 1; } return 0; } }
I also couldn't get the simpler testUDF method to work. (It's not in the jar shown above but once it was.) Here it is:
package com.rh.bb.udf.utils;
import java.sql.*;
import java.lang.*;
import COM.ibm.db2.app.*;
class DB2Appl1 extends UDF {
public static int testUDF(int value1, int value2) { try { return(value1 & value2); } catch (Exception e) { return 0; } } }
upon executing RegEx(table.someclob) , or testUDF, nothing not a zero, not a one is returned, and there are no error messages.
I couldn't find any log files that had any traces suggesting errors.
> If you do not bundle your own classes into jar you have to place your > class file into sqllib/function directory. > And don't forget that you have to place your class into the same > directory structure inside sqllib/function directory according the > package name of your java class. Didn't have to do that.
4.spam@mail.ru - 22 Jun 2006 07:27 GMT Ok, lets start from simple example. testUDF returns sum to 2 arguments. If you extends UDF class in your java UDF, you have to do these things: 1. --- package com.rh.bb.udf.utils; import java.sql.*; import COM.ibm.db2.app.*; class DB2Appl1 extends UDF { public void testUDF(int value1, int value2, int result) { try { set(3, value1 + value2); } catch (Exception e) { set(3, 0); } } } --- Place compiled class file into sqllib/function/com/rh/bb/udf/utils directory. 2. --- CREATE FUNCTION testUDF(int a1, int a2) RETURNS int EXTERNAL NAME 'com.rh.bb.udf.utils.DB2Appl1!testUDF' LANGUAGE java PARAMETER STYLE db2general DETERMINISTIC FENCED NO SQL NO EXTERNAL ACTION; --- 3. --- select testUDF(1, 2) from sysibm.sysdummy1; ---
Is it works?
gimme_this_gimme_that@yahoo.com - 22 Jun 2006 21:04 GMT Thanks. Here are results so far:
1.
I modified testUDF() so it would compile. I removed the code in the catch clause since set throws an Exception :
package com.rh.bb.udf.utils; import java.sql.*; import COM.ibm.db2.app.*; class DB2Appl1 extends UDF { public void testUDF(int value1, int value2, int result) { try { set(3, value1 + value2); } catch (Exception e) { // do nothing } } }
BTW, is it OK to change test testUDF to ?
public void testUDF(int value1, int value2, int result) throws Exception { set(3, value1 + value2); }
> Place compiled class file into sqllib/function/com/rh/bb/udf/utils directory. Got it.
Do I need to restart the database? That wasn't in the instructions.
I executed the following script (under DB2-8.2 ) which you were kind enough to write:
> --- > CREATE FUNCTION testUDF(int a1, int a2) [quoted text clipped - 6 lines] > NO SQL > NO EXTERNAL ACTION; This results in the following SQL error: SQL0204N "A1" is an undefined name. LINE NUMBER1. SQLSTATE=42704
The Java method testUDF has three arguments while the SQL testUDF function has 2 arguments. Is that intended?
> --- > 3. [quoted text clipped - 3 lines] > > Is it works? I didn't get that far.
4.spam@mail.ru - 23 Jun 2006 06:38 GMT See answers below.
> BTW, is it OK to change test testUDF to ? > > public void testUDF(int value1, int value2, int result) throws > Exception { > set(3, value1 + value2); > } Yes.
> > Place compiled class file into sqllib/function/com/rh/bb/udf/utils directory. > > Got it. > > Do I need to restart the database? That wasn't in the instructions. No. Do this: call sqlj.refresh_classes();
> I executed the following script (under DB2-8.2 ) which you were kind > enough to write: [quoted text clipped - 12 lines] > This results in the following SQL error: > SQL0204N "A1" is an undefined name. LINE NUMBER1. SQLSTATE=42704 Excuse me. Should be: CREATE FUNCTION testUDF(a1 int, a2 int) ...
Sincerely, Mark B.
gimme_this_gimme_that@yahoo.com - 23 Jun 2006 19:05 GMT Steps 1,2,3, and 4 completed.
I'm using AQT as my database interface tool. (I wouldn't guess that should matter.)
Executing:
select testUDF(1, 2) from sysibm.sysdummy1
results in a column without any content.
Executing:
select 7,testUDF(1, 2) from sysibm.sysdummy1
results in two columns without any content, not even the 7.
4.spam@mail.ru - 28 Jun 2006 06:45 GMT I haven't ever heard about this tool. Try this select from db2 command line processor. Does it work from CLP?
> Steps 1,2,3, and 4 completed. > [quoted text clipped - 12 lines] > > results in two columns without any content, not even the 7. gimme_this_gimme_that@yahoo.com - 29 Jun 2006 01:24 GMT Whoo hooo! We're cooking with gasoline!
I set things up so I could excute the function from a SQL statement embedded in a Java program. Just using AQT wasn't doing the job.
At first I got a bunch of SQLCODE -4304 errors.
The trick was decyphering the error message in the format <name>:<spec-name>:<class>:<reason-code>.
I fixed a few class name type-os (reason-code 1 was saying the class wasn't in my CLASSPATH).
I also placed a jar file having DB2App1.class in my NT CLASSPATH into SQLLIB/java/UdfUtils.jar, storing it in the same directory as other IBM jar files.
So testUDF is good to go in both Java JDBC and from the AQT GUI.
Question 1:
I created two version of testUDF.
public void testUDF(int value1, int value2, int result) public void testUDF(int value1, int value2)
Off hand I'm not certain which method ran. What was the significance of creating a version with a third argument "int result" ?
Question 2:
Where can I find the API (javadocs) to COM.ibm.db2.app.*;
Question 3:
Does set only work with integers?
Question 4:
Do you have any examples that are more complicated than this simple one?
What do you want to try next?
gimme_this_gimme_that@yahoo.com - 29 Jun 2006 02:10 GMT Here is another example.
This one comes from: http://www.dbforums.com/archive/index.php/t-447057.html
In this case:
package com.rh.bb.udf.utils;
import java.sql.*; import COM.ibm.db2.app.*;
class DB2Appl1 extends UDF {
public static String AllTrim (String s, String c) throws Exception { int len = s.length(); int st = 0; char[] val=s.toCharArray(); char fch=c.charAt(0); while ((st < len) && (val[st] ==fch)) {st++;} while ((st < len) && (val[len - 1] == fch)) {len--;} return s.substring(st, len); } }
CREATE FUNCTION AllTrim (VARCHAR(4000), VARCHAR(1)) RETURNS VARCHAR(4000) FENCED LANGUAGE JAVA PARAMETER STYLE JAVA EXTERNAL NAME 'com.rh.bb.udf.utils.DB2Appl1!AllTrim' NO EXTERNAL ACTION DETERMINISTIC NO SQL;
call sqlj.refresh_classes();
select AllTrim('aaaabb','a') from sysibm.sysdummy1;
Here's the error :
bad SQL grammar [select AllTrim('aaabbb';,'a') DD from sysibm.sysdummy1]; nested exception is com.ibm.db2.jcc.b.SqlException: DB2 SQL error: SQLCODE: -4306, SQLSTATE: 42724, SQLERRMC: EMMMCL01.ALLTRIM;SQL060628174547500;AllTrim;(Ljava/lang/String;Ljava/l" type="org.springframework.jdbc.BadSqlGrammarException
I gather that the function signature in the CREATE statement doesn't match the Java version. Now what?
Thanks.
4.spam@mail.ru - 29 Jun 2006 12:49 GMT Use --- public class DB2Appl1 --- instead of --- class DB2Appl1 extends UDF ----------------------------- Remember these things for java UDFs: 1. If you use PAREMETER STYLE JAVA you: -have to declare your methods as static -have to use return clause in java method -don't have to extend UDF class 2. if you use PAREMETER STYLE DB2GENERAL you: -have to extend UDF class -have to declare your method as void -have to use last paremeter of your method for returning a value and set them by "set" method implemented in UDF class -aren't allowed to declare this last parameter in your CREATE FUNCTION statement -----------------------------
> class DB2Appl1 extends UDF > { [quoted text clipped - 35 lines] > > Thanks. Serge Rielau - 29 Jun 2006 03:23 GMT > Whoo hooo! We're cooking with gasoline! > [quoted text clipped - 39 lines] > > What do you want to try next? have you looked at sqllib/samples?
Cheers Serge
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
IOD Conference http://www.ibm.com/software/data/ondemandbusiness/conf2006/
Knut Stolze - 19 Jun 2006 09:10 GMT > I'm playing with the ICU library (slow weekend :-) and it's quite simple > to expose the regular expression functionality for Unicode databases. [quoted text clipped - 16 lines] > Other functions possible are listen here: > http://icu.sourceforge.net/userguide/regexp.html There is a regex-article here: http://tinyurl.com/pyzto
It builds on C function. Carrying them to Java and Unicode may be a good idea, though.
 Signature Knut Stolze DB2 Information Integration Development IBM Germany
Serge Rielau - 19 Jun 2006 11:42 GMT >> I'm playing with the ICU library (slow weekend :-) and it's quite simple >> to expose the regular expression functionality for Unicode databases. [quoted text clipped - 21 lines] > It builds on C function. Carrying them to Java and Unicode may be a good > idea, though. Fascinating, I had no clue this article was out there. Now it talks about "PERL" semantics. Are there differences? Does PCRE support Unicode? Is there binary code available somewhere for PCRE? I can't find it at least...
Cheers Serge
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
IOD Conference http://www.ibm.com/software/data/ondemandbusiness/conf2006/
Knut Stolze - 19 Jun 2006 11:58 GMT I wrote this article 3 years ago. So please don't ask me any details
> Now it talks about "PERL" semantics. Are there differences? Perl has some additional metacharacters (compared to traditional regexps). For example, \s, \w, ... are there. At a first glance, it is very similar to ICU. There is at least one difference, though: \b matches in Perl the "beginning/end of a word". Maybe there are some more such things...
> Does PCRE support Unicode? Yes, it does. The man page states: The current implementation of PCRE (release 6.x) corresponds approximately with Perl 5.8, including support for UTF-8 encoded strings and Unicode general category properties. However, this support has to be explicitly enabled; it is not the default.
> Is there binary code available somewhere for PCRE? I can't find it at > least... There is something for Windows. I don't know about any precompiled packages for the UNIX platforms (except the RPMs delivered for the various Linux distributions).
p.s: http://www.pcre.org/
 Signature Knut Stolze DB2 Information Integration Development IBM Germany
gimme_this_gimme_that@yahoo.com - 29 Jun 2006 17:56 GMT Assuming DB2 on Linux is compiled with gcc, only Linux users are so lucky to have a free DB2 compliant C compiler.
Assuming you have compatible C compiler, the libraries you'd build using C probably won't be cross platform complatible.
I'd recommend passing over the C RegEx implementations they aren't likely to be helpful to anyone aside from hobbyists..
Also, note that JRegex does not have any DB2 specific methods. JRegex is a Perl 5 compatible regular expression package.
My 2 cents.
Gregor Kovač - 19 Jun 2006 11:15 GMT > I'm playing with the ICU library (slow weekend :-) and it's quite simple > to expose the regular expression functionality for Unicode databases. [quoted text clipped - 19 lines] > Input welcome > Serge Hi!
It would be usefull, but I'd VERY MUCH like to see that in UNICODE, since all our databases user UTF-8.
Best regards, Kovi
 Signature -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
| Gregor Kovac | Gregor.Kovac@mikropis.si | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| In A World Without Fences Who Needs Gates? | | Experience Linux. | -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
Serge Rielau - 19 Jun 2006 11:31 GMT >> I'm playing with the ICU library (slow weekend :-) and it's quite simple >> to expose the regular expression functionality for Unicode databases. [quoted text clipped - 27 lines] > Best regards, > Kovi That's the trivial part. :-)
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
IOD Conference http://www.ibm.com/software/data/ondemandbusiness/conf2006/
|
|
|