Release Notes August 1, 1999
Written by Hippo2000
BREGEXP.DLL provides you APIs for Reguar Expressions. If you want the power of regular expressions like Perl5, Try and use it! Methods for Regular Expressions in BASP21.DLL are using these APIs. These APIs supports NULL-character. So these are useful for not only text but also binary data. You can call these APIs from Microsoft Visual Basic, too.
05/13/99 Updated
Installing Arimac's BRegIF enables you to
use regular expressions of BREGEXP from HIDEMARU editor Version 3.01.
`'s Room(Only in Japanese)
For Perl and regular expressions, refer to Perl man page.
Perl
Newbies(Only in Japanese)
Regular expressions treat a string as a pattern. A pattern is specified by enclosing with "/" like below:
/pattern/qualifiers
If a pattern contains "/",you
can use other character with 'm' like below:
m#pattern#qualifiers
In pattern, there are special characters called 'Metacharacter'. After mastering these metacharacters, you may feel the real power of the regular expressions. Metacharacters that can be used in BREGEXP.DLL are almost same to Perl5. So, you can use same regular expressions in BREGEXP.DLL, which were used in Perl5 .
Follwing Metacharacters are recognized in BREGEXP.DLL:
\ Quote the next metacharacter ^ Match the beginning of the line . Match any character (except newline) $ Match the end of the line (or before newline at the end) | Alternation () Grouping [] Character class \w Match a "word" character (alphanumeric plus "_") \W Match a non-word character \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character \D Match a non-digit character \b Match a word boundary \B Match a non-(word boundary) \A Match only at beginning of string \Z Match only at end of string, or before newline at the end \t tab (HT, TAB) \n newline (LF, NL) \r return (CR) \f form feed (FF) \a alarm (bell) (BEL) \e escape (think troff) (ESC) \033 octal char (think of a PDP-11) \x1B hex char \c[ control char
The following standard quantifiers are recognized:
* Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n} Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m times
The following qualifiers are recognized:
k Treat string as Japanese (SJIS). (Perl does NOT have this qualifier.) m Treat string as multiple lines.(Metachar-$ will be affected.) g Match globally, i.e., find all occurrences. c replace:Complement the SEARCHLIST. d replace:Delete found but unreplaced characters. s replace:Squash duplicate replaced characters.
@
The following features are not supported by BREGEXP.DLL:
Variables in a pattern will not be expanded, in Visual Basic etc. Metacharcter: \G Match only where previous m//g left off Qualifiers (parameters on right side of a pattern) o Compile only once. x Use Expanded regular expressions. e Evaluate the right side as an expressions.
#include "bregexp.h" // Regular Expressions DLL
=================== BEGIN of bregexp.h ======================================
#ifdef _BREGEXP_
#define BREGEXPAPI __declspec(dllexport)
#else
#define BREGEXPAPI __declspec(dllimport)
#endif
typedef struct bregexp {
const char *outp; // BSubst: Pointer to Replace data
const char *outendp; // BSubst : Pointer to last of replace data + 1
const int splitctr; // BSplit: Number of array items.
const char **splitp; // BSplit: Pointer to the data
int rsv1; // Reserved (free to use)
char *parap; // Pointer to pattern data
char *paraendp; // Pointer to pattern data + 1
char *transtblp; // BTrans : Pointer to Translation table
char **startp; // Pointer to first of matched data.
char **endp; // Pointer to last of matched data.
int nparens; // Number of ()s in pattern. It is useful to examine $1, $2 and $n.
} BREGEXP;
#if defined(__cplusplus)
extern "C"
{
#endif
BREGEXPAPI
int BMatch(char* str,char *target,char *targetendp,
BREGEXP **rxp,char *msg) ;
BREGEXPAPI
int BSubst(char* str,char *target,char *targetendp,
BREGEXP **rxp,char *msg) ;
BREGEXPAPI
int BTrans(char* str,char *target,char *targetendp,
BREGEXP **rxp,char *msg) ;
BREGEXPAPI
int BSplit(char* str,char *target,char *targetendp,
int limit,BREGEXP **rxp,char *msg);
BREGEXPAPI
void BRegfree(BREGEXP* rx);
BREGEXPAPI
char* BRegexpVersion(void);
#if defined(__cplusplus)
}
#endif
#undef BREGEXPAPI
=================== END of bregexp.h ======================================
char msg[256]; BREGEXP *rxp = NULL; int matched = BMatch("m/abc/",szTarget, szTarget+strlen(szTarget),&rxp,msg);
BREGEXP.DLL uses struct BREGEXP as a
parameter. Struct BREGEXP is also called 'compile block'. It can
not be used from Visual Basic.
Struct BREGEXP has following features:
If you use struct BREGEXP effectively, your program runs faster. use same compile block for the functions which use same regular expressions.
Following example, BregPool class shows the sample that makes to run faster by pooling struct BREGEXPs.
class BregPool { public: BregPool(int max){ m_nmax = max; m_rxpool = new BREGEXP*[m_nmax]; ZeroMemory(m_rxpool,sizeof(BREGEXP*)*m_nmax); }; ~BregPool() { Free(); }; void Free() { if (m_rxpool == 0) return; for (int i = 0;i < m_nmax;i++) { if (m_rxpool[i]) BRegfree(m_rxpool[i]); } delete [] m_rxpool; m_rxpool = NULL; }; BREGEXP* Get(char *regstr) { BREGEXP *r; for (int i = 0;i < m_nmax;i++) { r = m_rxpool[i]; if (r == 0) break; if (r->parap == 0) break; // Check same Regular Expression if (memcmp(regstr,r->parap,(r->paraendp - r->parap) + 1) == 0) return r; // we got !!! } if (i > m_nmax - 1) i = m_nmax - 1; if (m_rxpool[i]) return m_rxpool[i]; char msg[80]; char p[] = " "; // Make Compile Block BMatch(regstr,p,p+1,&m_rxpool[i],msg); return m_rxpool[i]; } private: int m_nmax; BREGEXP **m_rxpool; };How to Use BregPool Class: static BregPool bpool(8); // Number of Pools char patern1[] = "tr/A-Z0-9/a-zx/g"; BREGEXP *rxp = bpool.Get(patern1); int pos = 0; // Search position while (BMatch(patern1,t1+pos,t1+lstrlen(t1),&rxp,msg)) { // BregPool's destructor calls BRegfree. // So you don't have to call BRegfree clearly,
char msg[80]; // Message buffer BREGEXP *rxp = NULL; // You should clear up! // Sample of Search char t1[] = " Yokohama 045-222-1111 Osaka 06-5555-6666 Tokyo 03-1111-9999 "; char patern1[] = "/(03|045)-(\\d{3,4})-(\\d{4})/"; // /(03|045)-(\d{3,4})-(\d{4})/ // Searches telepone numbers begining with 03 or 045 // () means to memory the specified numbers. int pos = 0; // Seaching position while (BMatch(patern1,t1+pos,t1+lstrlen(t1),&rxp,msg)) { TRACE1("data=%s\n",t1+pos); // String to be searched TRACE1("found=%s\n",rxp->startp[0]); // Matched string TRACE1("length=%d\n",rxp->endp[0] - rxp->startp[0]); // Number of matched characters for (int i = 1;i <= rxp->nparens;i++) { // Data specified by () TRACE2("$%d = %s\n",i,rxp->startp[i]); TRACE2("$%d length = %d\n",i,rxp->endp[i]-rxp->startp[i]); } pos = rxp->endp[0] - t1; // Searching position for next character } pos = 0; char t2[] = " abcdabce abcdabcd abcdabcf abcgabcg "; char patern2[] = "/abc(.)abc\\1/"; // Example of searching with pattern memory while(BMatch(patern2,t2+pos,t2+lstrlen(t2),&rxp,msg)) { TRACE1("data=%s\n",t2); // String to be searched TRACE1("found=%s\n",rxp->startp[0]); // Matched String TRACE1("length=%d\n",rxp->endp[0] - rxp->startp[0]); // Number of matched characters for (int i = 1;i <= rxp->nparens;i++) { // Data specified by () TRACE2("$%d = %s\n",i,rxp->startp[i]); TRACE2("$%d length = %d\n",i,rxp->endp[i]-rxp->startp[i]); } pos = rxp->endp[0] - t2; // Searching position for next character } if (rxp) // Set free compile block BRegfree(rxp); // Don't forget this!
data= Yokohama 045-222-1111 Osaka 06-5555-6666 Tokyo 03-1111-9999 found=045-222-1111 Osaka 06-5555-6666 Tokyo 03-1111-9999 length=12 $1 = 045-222-1111 Osaka 06-5555-6666 Tokyo 03-1111-9999 $1 length = 3 $2 = 222-1111 Osaka 06-5555-6666 Tokyo 03-1111-9999 $2 length = 3 $3 = 1111 Osaka 06-5555-6666 Tokyo 03-1111-9999 $3 length = 4 data= Osaka 06-5555-6666 Tokyo 03-1111-9999 found=03-1111-9999 length=12 $1 = 03-1111-9999 $1 length = 2 $2 = 1111-9999 $2 length = 4 $3 = 9999 $3 length = 4 data= abcdabce abcdabcd abcdabcf abcgabcg found=abcdabcd abcdabcf abcgabcg length=8 $1 = dabcd abcdabcf abcgabcg $1 length = 1 data= abcdabce abcdabcd abcdabcf abcgabcg found=abcgabcg length=8 $1 = gabcg $1 length = 1
Substitutes a inner-city part of the
telepone number that has 2digit city-number into 'xxxx-xxxx'.
char msg[80]; // Message buffer BREGEXP *rxp = NULL; // You should clear up! // Sample of string substitution char t1[] = " Yokohama 045-222-1111 Osaka 06-5555-6666 Tokyo 03-1111-9999 "; char patern1[] = "s/(\\d\\d)-\\d{4}-\\d{4}/$1-xxxx-xxxx/g"; int ctr; if (ctr = BSubst(patern1,t1,t1+lstrlen(t1),&rxp,msg)) { TRACE2("after(%d)=%s\n",ctr,rxp->outp); // Number of substituted pattern and characters. TRACE1("length=%d\n",rxp->outendp - rxp->outp); // Number of characters that containd result of substitution } if (rxp) // Set free compile block BRegfree(rxp); // Don't forget this.
after(2)= Yokohama 045-222-1111 Osaka 06-xxxx-xxxx Tokyo 03-xxxx-xxxx length=63
Translates upper case to lower, a digit
to 'x'.
char msg[80]; // Message buffer BREGEXP *rxp = NULL; // You should clear up. // Sample of Translation char t1[] = " Yokohama 045-222-1111 Osaka 06-5555-6666 Tokyo 03-1111-9999 "; char patern1[] = "tr/A-Z0-9/a-zx/g"; int ctr; if (ctr = BTrans(patern1,t1,t1+lstrlen(t1),&rxp,msg)) { TRACE2("after(%d)=%s\n",ctr,rxp->outp); // Number of translated characters and string TRACE1("length=%d\n",rxp->outendp - rxp->outp); // Number of characters in result of the translation } if (rxp) // Set free compile block. BRegfree(rxp); // Don't forget this!
after(33)= yokohama xxx-xxx-xxxx osaka xx-xxxx-xxxx tokyo xx-xxxx-xxxx length=63
Splits the telephone number by parts.
static BregPool bpool(8); char msg[80]; char t1[] = " Yokohama 045-222-1111 Osaka 06-5555-6666 Tokyo 03-1111-9999 "; char patern1[] = "/ *\\d{2,3}-\\d{3,4}-\\d{4} */"; BREGEXP *rxp = bpool.Get(patern1); int splitcnt = BSplit(patern1,t1,t1+lstrlen(t1),0,&rxp,msg); if (splitcnt > 0 ) { int i = 0; for (int j = 0;j < splitcnt;j++) { int len = rxp->splitp[i+1] - rxp->splitp[i]; char *tp = (char*)rxp->splitp[i]; char ch = tp[len]; // save delmitter tp[len] = 0; // set stopper TRACE3("len=%d [%d]=%s\n",len,j,tp); tp[len] = ch; // restore the char i += 2; } }
len=9 [0]= Yokohama len=5 [1]=Osaka len=5 [2]=Tokyo
osamu@big.or.jp has produced BREGEXP.Dll library for Delphi. Please try and use it. You can download here. Category [Miscellaneous] - [Perl compatible regular expressions unit].
I've made sure to work these functions ONLY ON Visual C++ compiler.
@
Copyright 1999 Tatsuo Baba,All rights reserved.