Saturday, March 6, 2010

Wildcard character pattern matching

The problem is how to do string matching on a pattern including '*' (0-any characters) and '?' (0-1 character) wildcard characters.

Obviously when come into '*', you can ignore it, or eat the current input char, then recursively repeat this process. When come into '?', you can ignore it or eat the current input char. Therefore comes this backtracking recursive solution:

int match(char * s, char * p) {
printf("string:%s \t pattern:%s\n", s, p);

if (*s == 0) return *p == 0;
if (*p == 0) return 1; // or "return *s == 0;" if you don't want extra char in s.
if ('?' == *p) return match(s+1, p+1) || match(s, p+1);
if ('*' == *p) return match(s+1, p) || match(s, p+1);
if (*s == *p) return match(s+1, p+1);
return 0;
}

int DoMatch(char * s, char * p) {
if (s == NULL) return NULL == p;
if (p == NULL) return 1; // or NULL == s if don't want extra char in s.
return match(s, p);
}

Call DoMatch(string, pattern) to start the process. Note in the implementation you must use a+1 instead of ++a since the latter would change the value of a and causes problem for the next invocation of match call. Use of a++ is even worse since a won't be incremented at all until the match call finishes. These are simple issues but easy to ignore until you stumble into problem.

Since backtracking is used, this solution can be exponential in nature. This solution can be used when the input size is small.

This problem obviously is a simplification of regular expression string matching, such as used in lex, in Perl, Python, Java, C# etc. In a general regular expression we start with several basic building blocks (single symbol, concatenation of symbols, kleene closure (*, extensions are + and ?), alternation (|)) to build up the NFA for a regular expression. The next key step is to convert a NFA (Non-deterministic Finite Automata) to a DFA (Deterministic Finite Automata). Using DFA, the complexity is O(n) or linear, and therefore is the ideal solution. Check any computation theory textbook for the algorithm.

No comments:

Blog Archive

Followers