BREGEXP.DLL

Release Notes  August 1, 1999
Written by Hippo2000

BREGEXP.DLL provides you APIs for Reguar Expressions. If you want the power of regular expressions like Perl5, Try and use it! Methods for Regular Expressions in BASP21.DLL are using these APIs. These APIs supports NULL-character. So these are useful for not only text but also binary data. You can call these APIs from Microsoft Visual Basic, too.

05/13/99 Updated

Installing Arimac's BRegIF enables you to use regular expressions of BREGEXP from HIDEMARU editor Version 3.01.
`'s Room(Only in Japanese)

For Perl and regular expressions, refer to Perl man page.
Perl Newbies(Only in Japanese)

List of the Functions

What is pattern

Regular expressions treat a string as a pattern. A pattern is specified by enclosing with "/" like below:

/pattern/qualifiers

If a pattern contains "/",you can use other character with 'm' like below:
m#pattern#qualifiers

In pattern, there are special characters called 'Metacharacter'. After mastering these metacharacters, you may feel the real power of the regular expressions. Metacharacters that can be used in BREGEXP.DLL are almost same to Perl5. So, you can use same regular expressions in BREGEXP.DLL, which were used in Perl5 .

Follwing Metacharacters are recognized in BREGEXP.DLL:

    \    Quote the next metacharacter
    ^    Match the beginning of the line
    .   	Match any character (except newline)
    $	  Match the end of the line (or before newline at the end)
    |    Alternation
    ()    Grouping
    []    Character class
    \w  Match a "word" character (alphanumeric plus "_")
    \W Match a non-word character
    \s   Match a whitespace character
    \S   Match a non-whitespace character
    \d   Match a digit character
    \D  Match a non-digit character
    \b   Match a word boundary
    \B   Match a non-(word boundary)
    \A   Match only at beginning of string
    \Z   Match only at end of string, or before newline at the end
    \t    tab                   (HT, TAB)
    \n   newline               (LF, NL)
    \r    return                (CR)
    \f     form feed             (FF)
    \a    alarm (bell)          (BEL)
    \e    escape (think troff)  (ESC)
    \033 octal char (think of a PDP-11)
    \x1B hex char
    \c[    control char

The following standard quantifiers are recognized:

    *        Match 0 or more times
    +        Match 1 or more times
    ?        Match 1 or 0 times
    {n}      Match exactly n times
    {n,}     Match at least n times
    {n,m}  Match at least n but not more than m times

The following qualifiers are recognized:

    k    Treat string as Japanese (SJIS). (Perl does NOT have this qualifier.)
    m   Treat string as multiple lines.(Metachar-$ will be affected.)
    g 	  Match globally, i.e., find all occurrences. 
    c     replace:Complement the SEARCHLIST.
    d     replace:Delete found but unreplaced characters.
    s     replace:Squash duplicate replaced characters.
    

@

About Incompatibility

The following features are not supported by BREGEXP.DLL:

Variables in a pattern will not be expanded, in Visual Basic etc.
Metacharcter:
         \G   Match only where previous m//g left off 
Qualifiers (parameters on right side of a pattern)
         o     Compile only once.
         x     Use Expanded regular expressions.
         e     Evaluate the right side as an expressions.

Using from Visual C++

#include "bregexp.h" // Regular Expressions DLL


=================== BEGIN of bregexp.h ======================================

#ifdef _BREGEXP_
#define BREGEXPAPI __declspec(dllexport)
#else
#define BREGEXPAPI __declspec(dllimport)
#endif

typedef struct bregexp {
const char *outp; // BSubst: Pointer to Replace data
const char *outendp; // BSubst : Pointer to last of replace data + 1
const int splitctr; // BSplit: Number of array items.
const char **splitp; // BSplit: Pointer to the data
int rsv1; // Reserved (free to use)
char *parap; // Pointer to pattern data
char *paraendp; // Pointer to pattern data + 1
char *transtblp; // BTrans : Pointer to Translation table
char **startp; // Pointer to first of matched data.
char **endp; // Pointer to last of matched data.
int nparens; // Number of ()s in pattern. It is useful to examine $1, $2 and $n.
} BREGEXP;

#if defined(__cplusplus)
extern "C"
{
#endif

BREGEXPAPI
int BMatch(char* str,char *target,char *targetendp,
BREGEXP **rxp,char *msg) ;
BREGEXPAPI
int BSubst(char* str,char *target,char *targetendp,
BREGEXP **rxp,char *msg) ;
BREGEXPAPI
int BTrans(char* str,char *target,char *targetendp,
BREGEXP **rxp,char *msg) ;
BREGEXPAPI
int BSplit(char* str,char *target,char *targetendp,
int limit,BREGEXP **rxp,char *msg);
BREGEXPAPI
void BRegfree(BREGEXP* rx);

BREGEXPAPI
char* BRegexpVersion(void);

#if defined(__cplusplus)
}
#endif


#undef BREGEXPAPI

=================== END of bregexp.h ======================================

Struct BREGEXP

BREGEXP.DLL uses struct BREGEXP as a parameter. Struct BREGEXP is also called 'compile block'. It can not be used from Visual Basic.
Struct BREGEXP has following features:

If you use struct BREGEXP effectively, your program runs faster. use same compile block for the functions which use same regular expressions.

Following example, BregPool class shows the sample that makes to run faster by pooling struct BREGEXPs.

class BregPool
{
public:
	BregPool(int max){
		m_nmax = max;
		m_rxpool = new BREGEXP*[m_nmax]; 
		ZeroMemory(m_rxpool,sizeof(BREGEXP*)*m_nmax);
	};
	~BregPool() {
		Free();
	};
	void Free() {
		if (m_rxpool == 0)
			return;
		for (int i = 0;i < m_nmax;i++) {
			if (m_rxpool[i])
				BRegfree(m_rxpool[i]);
		}
		delete [] m_rxpool;
		m_rxpool = NULL;
	};
	BREGEXP* Get(char *regstr)
	{
		BREGEXP *r;
		for (int i = 0;i < m_nmax;i++) {
			r = m_rxpool[i];
			if (r == 0)
				break;
			if (r->parap == 0)
				break;
			// Check same Regular Expression
			if (memcmp(regstr,r->parap,(r->paraendp - r->parap) + 1) == 0)
				return r;		// we got !!!
		}
		if (i > m_nmax - 1)
			i = m_nmax - 1;
		if (m_rxpool[i])
			return m_rxpool[i];
		char msg[80];
		char p[] = " ";
		// Make Compile Block
		BMatch(regstr,p,p+1,&m_rxpool[i],msg);

		return m_rxpool[i];
	}
private:
	int m_nmax;
	BREGEXP **m_rxpool;
};
How to Use BregPool Class:
	static BregPool bpool(8);  // Number of Pools
	char patern1[] = "tr/A-Z0-9/a-zx/g";
	BREGEXP *rxp = bpool.Get(patern1);
	int pos = 0;	// Search position
	while (BMatch(patern1,t1+pos,t1+lstrlen(t1),&rxp,msg)) {

	// BregPool's destructor calls BRegfree.
	// So you don't have to call BRegfree clearly, 
 

Code Sample

Code Sample of BMatch

	char msg[80];	// Message buffer
	BREGEXP *rxp = NULL;	// You should clear up!
	// Sample of Search
	char t1[] = " Yokohama 045-222-1111  Osaka 06-5555-6666  Tokyo 03-1111-9999 ";
	char patern1[] = "/(03|045)-(\\d{3,4})-(\\d{4})/";	// /(03|045)-(\d{3,4})-(\d{4})/
						// Searches telepone numbers begining with 03 or 045
						// () means to memory the specified numbers.
	int pos = 0;	// Seaching position
	while (BMatch(patern1,t1+pos,t1+lstrlen(t1),&rxp,msg)) {
		TRACE1("data=%s\n",t1+pos);		// String to be searched
		TRACE1("found=%s\n",rxp->startp[0]);	// Matched string
		TRACE1("length=%d\n",rxp->endp[0] - rxp->startp[0]);	// Number of matched characters 
		for (int i = 1;i <= rxp->nparens;i++) {		// Data specified by ()
			TRACE2("$%d = %s\n",i,rxp->startp[i]);
			TRACE2("$%d length = %d\n",i,rxp->endp[i]-rxp->startp[i]);
		}
		pos = rxp->endp[0] - t1;		// Searching position for next character
	}

	pos = 0;
	char t2[] = " abcdabce abcdabcd abcdabcf abcgabcg ";
	char patern2[] = "/abc(.)abc\\1/";	// Example of searching with pattern memory
	while(BMatch(patern2,t2+pos,t2+lstrlen(t2),&rxp,msg)) {
		TRACE1("data=%s\n",t2);			// String to be searched
		TRACE1("found=%s\n",rxp->startp[0]);	// Matched String
		TRACE1("length=%d\n",rxp->endp[0] - rxp->startp[0]);	// Number of matched characters
		for (int i = 1;i <= rxp->nparens;i++) {		// Data specified by ()
			TRACE2("$%d = %s\n",i,rxp->startp[i]);
			TRACE2("$%d length = %d\n",i,rxp->endp[i]-rxp->startp[i]);
		}
		pos = rxp->endp[0] - t2;	// Searching position for next character
	}

	if (rxp)			// Set free compile block
		BRegfree(rxp);		// Don't forget this!

Result of TRACE:

data= Yokohama 045-222-1111  Osaka 06-5555-6666  Tokyo 03-1111-9999 
found=045-222-1111  Osaka 06-5555-6666  Tokyo 03-1111-9999 
length=12
$1 = 045-222-1111  Osaka 06-5555-6666  Tokyo 03-1111-9999 
$1 length = 3
$2 = 222-1111  Osaka 06-5555-6666  Tokyo 03-1111-9999 
$2 length = 3
$3 = 1111  Osaka 06-5555-6666  Tokyo 03-1111-9999 
$3 length = 4
data=  Osaka 06-5555-6666  Tokyo 03-1111-9999 
found=03-1111-9999 
length=12
$1 = 03-1111-9999 
$1 length = 2
$2 = 1111-9999 
$2 length = 4
$3 = 9999 
$3 length = 4
data= abcdabce abcdabcd abcdabcf abcgabcg 
found=abcdabcd abcdabcf abcgabcg 
length=8
$1 = dabcd abcdabcf abcgabcg 
$1 length = 1
data= abcdabce abcdabcd abcdabcf abcgabcg 
found=abcgabcg 
length=8
$1 = gabcg 
$1 length = 1

Code Sample for BSubst

Substitutes a inner-city part of the telepone number that has 2digit city-number into 'xxxx-xxxx'.

	char msg[80];	// Message buffer
	BREGEXP *rxp = NULL;	// You should clear up!
	// Sample of string substitution
	char t1[] = " Yokohama 045-222-1111  Osaka 06-5555-6666  Tokyo 03-1111-9999 ";
	char patern1[] = "s/(\\d\\d)-\\d{4}-\\d{4}/$1-xxxx-xxxx/g";
	int ctr;
	if (ctr = BSubst(patern1,t1,t1+lstrlen(t1),&rxp,msg)) {
		TRACE2("after(%d)=%s\n",ctr,rxp->outp);	// Number of substituted pattern and characters.
		TRACE1("length=%d\n",rxp->outendp - rxp->outp);	// Number of characters that containd result of substitution
	}

	if (rxp)			// Set free compile block
		BRegfree(rxp);		// Don't forget this.

Result of TRACE:

after(2)= Yokohama 045-222-1111  Osaka 06-xxxx-xxxx  Tokyo 03-xxxx-xxxx 
length=63

Code Sample for BTrans

Translates upper case to lower, a digit to 'x'.

	char msg[80];	// Message buffer
	BREGEXP *rxp = NULL;	// You should clear up.
	// Sample of Translation
	char t1[] = " Yokohama 045-222-1111  Osaka 06-5555-6666  Tokyo 03-1111-9999 ";
	char patern1[] = "tr/A-Z0-9/a-zx/g";
	int ctr;
	if (ctr = BTrans(patern1,t1,t1+lstrlen(t1),&rxp,msg)) {
		TRACE2("after(%d)=%s\n",ctr,rxp->outp);	// Number of translated characters and string
		TRACE1("length=%d\n",rxp->outendp - rxp->outp);	// Number of characters in result of the translation
	}

	if (rxp)				// Set free compile block.
		BRegfree(rxp);		// Don't forget this!


Result of TRACE:

after(33)= yokohama xxx-xxx-xxxx  osaka xx-xxxx-xxxx  tokyo xx-xxxx-xxxx 
length=63

Code Sample for BSplit

Splits the telephone number by parts.

	static BregPool bpool(8);
	char msg[80];
	char t1[] = " Yokohama 045-222-1111  Osaka 06-5555-6666  Tokyo 03-1111-9999 ";
	char patern1[] = "/ *\\d{2,3}-\\d{3,4}-\\d{4} */";
	BREGEXP *rxp = bpool.Get(patern1);
   	int splitcnt = BSplit(patern1,t1,t1+lstrlen(t1),0,&rxp,msg);
	if (splitcnt > 0 ) {
		int i = 0;
		for (int j = 0;j < splitcnt;j++) {
			int len = rxp->splitp[i+1] - rxp->splitp[i];
			char *tp = (char*)rxp->splitp[i];
			char ch = tp[len]; // save delmitter
			tp[len] = 0;	// set stopper
			TRACE3("len=%d [%d]=%s\n",len,j,tp);
			tp[len] = ch;	// restore the char
			i += 2;
		}
	}

Result of TRACE:

len=9 [0]= Yokohama
len=5 [1]=Osaka
len=5 [2]=Tokyo

BREGEXP.DLL library for Delphi

osamu@big.or.jp has produced BREGEXP.Dll library for Delphi. Please try and use it.
You can download here. Category [Miscellaneous] - [Perl compatible regular expressions unit].

Warning:

I've made sure to work these functions ONLY ON Visual C++ compiler.


Important Note of BABAQ Free Soft

@

Home


Copyright 1999 Tatsuo Baba,All rights reserved.