计算理论:正则表达式
Regular expression
Definition of RE
Regular expressions describe language (regular languages)
If
RE 的定义是递归的。RE 使用三种语言上的基本操作:Union, Concatenation, Kleene Star
- Union:
- Concatenation:
- Kleene Star:
根据以上操作,即可给出 RE 的递归定义,RE 是 expression,由操作符和操作符连接起来的表达式构成,在给出 RE 的同时也将给出 RE 定义的语言
Basis.
- Basis 1: If
is any symbol, then is a RE, - Basis 2:
is a RE, - Basis 3:
is a RE,
Induction.
- Induction 1: If
are RE, is a RE, - Induction 2: If
are RE, is a RE, - Induction 3: If
is a RE, then is a RE, - Induction 4: if
is a RE, then is a RE,
操作优先级由高到低是
Algebraic Laws for RE:
- Union is commutative and associative, concatenation is associative
- Concatenation distributes over union
is the identity for +, is the identity for concatenation, is the annihilator for concatenation, - Union is idempotent:
Laws Involving Closures:
如何判断一个 RE 的代数定律是否为真:代入一个具体的 RE (见书 3.4.7 节)
Equivalence of RE and FA
RE to FA
For every RE, there is a FA that accepts the same language
Pick
Basis.
Induction. 假设正则表达式
FA to RE
For every FA, there is a RE defining its language
Pick DFA
使用归纳。假设 DFA 的状态为
定义
则该 DFA 对应的 RE 是所有从开始状态到接受状态的
Proof.
其正确性的证明基于对
Basis.
- 若没有边,则为
- 若
,则需要加上
Induction. 从
- 经过状态
: - 不经过状态
:
故
Q.E.D.
The RE with the same language as the DFA is the union of
is the number of states is the start state is one of the final states
Properties of Regular Languages
Properties
A language class is a set of languages
Language classes have two important kinds of properties
- Decision properties (判定性)
- Closure properties (封闭性)
Closure properties: given languages in the class, an operation produces another language in the same class
Decision properties: an algorithm that takes a formal description of a language and tells whether or not some property holds
- Membership problem: is string
in regular language - Emptiness problem: does the language contain any string at all
Pumping Lemma
Let
- For all
, the string is also in
pumping lemma 可以用于证明一个 language 不是 RL
Decision Properties of Regular Language
The membership problem: Given a string
Algorithm: Simulate the DFA, if the DFA ends in an accepting state, the answer is "yes", otherwise the answer is "no"
The emptiness problem: Given a regular language, does the language contain any string at all
Algorithm: Compute the set of states reachable from the start state. If at least one final state is reachable, then yes, else no.
reachable 的定义是递归给出的
Basis. The start state is surely reachable from the start state.
Induction. If state
如果语言由 RE 给出,则可根据以下递归规则判断是否为空
Basis.
Induction. Suppose
. Then is empty is empty and is empty . Then is empty is empty or is empty . Then is not empty, it always includes at least . Then is empty is empty
The infiniteness problem: Is a given language infinite?
Key Idea: if the DFA has
Suppose that
Proof.
设从
Q.E.D.
However, there are an infinite number of strings of length
Key Idea: if there is a string of length
Suppose that
Proof.
设 DFA 接受其的状态顺序为
Q.E.D.
根据以上两个 key idea,检查所有长度在
然而在实际中这样的算法代价也是不可接受的。所以一般是通过检查 DFA 对应的图中是否有环来判断语言是否 infinite。具体而言分为以下几步
- Eliminate states not reachable from the start state
- Eliminate states that do not reach a final state
- Test if the remaining transition graph has any cycles
The equivalence problem: Given regular languages
Algorithm: Constructing the product DFA from DFA for
设
对于 product DFA 而言
- start state:
- transition:
- final states: states
that exactly one of and is a final state of its own DFA
由于 product DFA 的 accept state 是两个状态中有且仅有一个是原本的 accept state,则如果一个 string
The containment problem: Given regular languages
Algorithm: use the product DFA
final state: states
The Minimum-State DFA for a Regular Language
Equivalence of states
Given a DFA
Equivalence of states: states
For all input strings
If two states are not equivalent, then we say they are distinguishable. That is, state
Algorithm: table-filling algorithm
- Construct a table with all pairs of states
- If you find a string that distinguishes two states (takes exactly one to an accepting state), mark that pair
- Algorithm is a recursion on the length of the shortest distinguishing string
- Basis. If
is an accepting state and is non-accepting, then the pair is distinguishable - Induction. Let
and be states such that for some input symbol , and are a pair of states known to be distinguishable. Then is a pair of distinguishable states.
- Basis. If
在算法结束后,所有未被标记的 pair 是等价的状态
If two states are not distinguished by the table-filling algorithm, then the states are equivalent.
Proof.
假设命题错误,故存在至少一对状态
is distinguishable. - the algorithm does not find
and to be distinguishable
令这样的状态对为 bad pair。则对于所有 bad pairs,都存在一个 string 来区分两个状态。令其中 string 最短的 pair 为
显然
Q.E.D.
对于 Equivalence problem,也可以通过这个算法解决。若判断两个 regular languages
Minimize DFA
要最小化一个 DFA ,首先要合并所有不可区分的状态
状态的合并可按照如下步骤
- 假设
是不可区分的 - 将其用一个代表状态
代替 - 显然
也是不可区分的(否则其中必有至少一对被算法标记为 distinguishable) - 令
为 的代表元素
合并之后,还要去掉所有从开始状态不可达的状态
事实上,状态的等价是一种等价关系,所以所有等价的状态将状态集划分为多个不相交的等价类
The equivalence of states is transitive
Proof.
假设
w.l.o.g.
Q.E.D.
故最小化 DFA 即为
- 消去所有从开始不可达的状态
- 将状态根据等价关系(由 table-filling algorithm 得到)划分为等价类
- 将含有不止一个状态的等价类合并为一个代表状态
- 合并状态转移
包含开始状态的等价类为新 DFA 的开始状态,包含接受状态的等价类为新 DFA 的接收状态
Minimized DFA can't be beaten
If
is a DFA, and the DFA constructed from by the algorithm of minimizing DFA, then has as few states as any DFA equivalent to
事实上,对于一个 DFA
Proof.
假设
的开始状态是等价的,因为 - 若状态
等价,则 等价
对于每个
Proof.
Basis. 如果
假设从
I.H. 若从
Induction. 假设从
Q.E.D.
由于根据前提,
Q.E.D.
Closure Properties of Regular Language
Closure Under Union: If
Proof.
如果
Q.E.D.
Concatenation 与 Kleene Closure 的证明同理,根据 RE 的定义即可证明其封闭性:
Closure Under Complementation: If
Proof.
Let
即构造一个 DFA
于是得到
即
Q.E.D.
Closure Under Intersection: If
Proof.
可以通过 De Morgan‘s Law 以及 Complementation 和 Union 的封闭性证明
也可通过直接构造 DFA 证明。设
这样对于任意字符串
则有
Q.E.D.
已知
Proof.
若
Q.E.D.
Closure Under Difference: If
Proof.
Q.E.D.
Closure Under Reversal: 对于一个字符串
Proof.
Automaton-based proof
对于
- 将
的转换图中的边反转 - 令
的开始状态为唯一的接收状态 - 添加一个开始状态
,以及从 到其他接收状态的 -transition
则
RE-based proof
假设
Basis.
Induction.
, then , then , then
Q.E.D.
Closure Under Homomorphisms: A string homomorphism is a function on strings that works by substituting a particular string for each symbol
If
For language
If
Proof.
令
Basis.
如果
如果
Induction.
Union: 根据定义,有
以及
又因为
根据 I. H.
Concatenation 与 Kleene star 的证明类似
可以证明将
Q.E.D.
Closure Under Inverse Homomorphisms: Suppose
If
Proof.
令
其中转换函数定义为
从而有
Q.E.D.