What are the properties of sigma notation

[Back] [Back (bottom of page)] [Bottom of page] [Supplementary chapter] [Please report script error]

13.4. Formal languages ​​and rule grammars

13.4.1. Language as a crowd

The alphabet (sigma), characters and strings

De fi nition 13.4.1. An alphabet is a finite set of characters (atomic symbols). It is notated with Σ (sigma).

De fi nition 13.4.2. A character string (word, string) of n characters from Σ is a finite sequence of length n over Σ.

The empty string (empty word) is the sequence of 0 characters. It is notated with ϵ (epsilon) and has the length 0.

Note on the notation

A character string is typically noted by juxtaposing the characters from left to right.

Let Σ = {a, b}, then ϵ, a, bb or ababbba are words over Σ. A more explicit notation for bb is 〈b, b〉 or {〈1, b〉, 〈2, b〉}.

Star of Sigma and Formal Languages

De fi nition 13.4.3. The star of Sigma is the set of all words over an alphabet Σ. The star is used as the post fi x operator Σ (pronounced “Sigma Stern”).

De fi nition 13.4.4. A formal language L over Σ is a subset of the star of Sigma.

Example 13.4.5. If Σ = {a}, then Σ= {ϵ, a, aa, aaa, ...}. The sets L1= {ϵ, a} or L2= {aa, aaaa, aaaaaa} are formal languages ​​because they are (real) subsets of Σ are.

Empty languages ​​vs. empty string

Hints

  • The empty language is the empty set, noted as {} or ∅.
  • The language, which only includes the empty string, is noted as {ϵ}.
  • The empty language {} and the language {ϵ} are not the same.

ask

  • Is {} a language above everyone Σ?
  • Is the language {ϵ} a subset of any non-empty language?
  • Is Σ a language over Σ?

13.4.2. Concatenation

Concatenation of strings

De fi nition 13.4.6. The concatenation of character strings is a two-digit function which concatenates its arguments into a word. For all u, v ∈ Σ:

What does uv mean?

If u: 1..n → Σ and v: 1..m → Σ are words, i.e. finite sequences of signs, then uv: 1 .. (m + n) → Σ. Whereby for all character positions i ∈ 1 .. (n + m) applies:

Power notation of concatenation

Properties of concatenation

The concatenation is associative and has ϵ as a neutral element. For all u, v, w ∈ Σ:

De fi nition 13.4.7. The n-fold concatenation of a character string w with itself in the power notation is de fi ned recursively. For n ≥ 1, n ∈ ℕ:

Example 13.4.8 (power notation of the concatenation).
The string aaabbcccc can be used as a3b2c4 be noted.

bu ff alo6 

Example 13.4.9 (A syntactically correct English sentence).
“Bu ff alo bu ff alo Bu ff alo bu ff alo bu ff alo bu ff alo Bu ff alo bu ff alo.”

analysis

  • Bu ff alo as an adjective of origin
  • bu ff alo as a noun (Bü ff el)
  • bu ff alo as a verb (intimidate)
  • Sense: "Bison from Bu ff alo, New York who are intimidated by other bison in their community also happen to intimidate other bison in their community."


13.4.3. Grammars

Chomsky hierarchy [HOPCROFT et al. 2002]



Figure 13.2: Subset relationships of the Chomsky language classes




Language class Type example






regular 3 {an}
context free 2 {anbn}
context sensitive 1 {anbncn}
generally 0



with n ≥ 1

Real subsets

For all type i languages ​​with 0 ≤ i ≤ 2 the following applies: Li + 1⊂ Li .

Where are natural languages ​​located? [HESS2005, 138ff.]

At least type 2: NPnVPn(central embedding)

      -----------------------------------------------  
     |          ----------------------------         |  
     |         |            --------        |        |  
     |         |           |        |       |        |  
The man whose wife whose child is angry is sad is surprised

At least type 1 according to [SHIEBER1985, KALLMEYER2005]: NPiNPjViVj(cross serial construction)

                              ------------------  
                             |                  |  
                        ------------------      |  
                       |     |            |     |  
                --------------------      |     |  
               |       |     |      |     |     |  
mer wänd d’Chind on Hans s’Huus laa half aaschtriiche

Complexity, grammaticality, acceptance of language

Christian Morgenstern, preface to gallows songs

Rule grammars

  • A rule grammar is a powerful finite descriptive tool for specifying formal languages ​​with a potentially infinite number of strings.
  • A grammar G = 〈Φ, Σ, R, S〉 consists of:
    1. Alphabet Φ: finite set of nonterminal symbols
    2. Alphabet Σ: finite set of terminal symbols with Φ ∩ Σ = ∅
    3. Set R ⊆ Γ× Γ of rules 〈α, β〉 (with total alphabet Γ = Φ ∪ Σ), where: α ≠ ϵ and α ⁄∈ Σ
    4. Start symbol S ∈ Φ
  • This definition of a grammar is the most general (type 0).
  • A grammar rule is an ordered pair: 〈α, β〉. Notation: α → β.

Context Free Grammars

  • A context-free grammar G = 〈Φ, Σ, R, S〉 consists of:
    1. Non-terminal symbols Φ
    2. Terminal symbols Σ
    3. Rule set R ⊆Φ × Γ (Γ = Φ ∪ Σ)
    4. Start symbol S ∈ Φ

Example 13.4.10 (context-free grammar).

  • G1= 〈{S, NP, V P, EN, V, D, N}, {Egon, Pudel, den, ass}, R, S〉
  • Standard set

Example evaluation
See Fig. 13.3 on page 530.


Figure 13.3: Example of left derivation and parse tree construction

Formal derivation of sentences

De fi nition 13.4.11 (direct derivative relation). The direct derivative relation⇒ ⊆ Γ× Γ of a grammar is the set of all pairs 〈u, v〉 with u, v, w, z ∈ Γfor which applies:

  • there is a rule of the form w → z
  • the strings u and v can be divided into substrings in such a way that: u = u1wu2 as well as v = u1to2

De fi nition 13.4.12 (derivation). A derivative is an n-tuple 〈w1, ..., wn〉 Of strings wi∈ Γ with (1 ≤ i ≤ n) such that:

  • wi − 1⇒ wi for all i ∈ {2..n}

Normal notation for derivatives

w1⇒… ⇒ wn

Example: Derivation with context-free grammar







Derivation u rule v




u1wu2w → z u1to2








S. ϵ S ϵS → NP VP ϵ NP VP ϵ
⇒ NP VP ϵ NP VP NP → EN ϵ EN VP
⇒ EN VP EN VP ϵ VP → V NP EN V NP ϵ
⇒ EN V NP EN V NP V → ate EN ate NP
⇒ EN ate NP EN ate NP ϵ NP → D N EN ate D N ϵ
⇒ EN aß D N EN ate D N D → den EN ate the N
⇒ EN ate the N EN ate the N ϵ N → poodle EN ate the poodle ϵ
⇒ EN ate the poodle ϵ EN ate the poodle EN → Egon ϵ Egon ate the poodle
⇒ Egon ate the poodle

Sentence forms, sentences and languages

De fi nition 13.4.13 (derivation relation) is the re fl exive-transitive envelope of ⇒.

De fi nition 13.4.14 (sentence). A sentencea of ​​a grammar G = 〈Φ, Σ, R, S〉 is a string of terminal symbols a ∈ Σso that:

De fi nition 13.4.15 (language of a grammar G) .The languageLG of a grammar G = 〈Φ, Σ, R, S〉 is the set of all its sentences a ∈ Σ.

Grammar rules, language classes and automata
The various grammar types differ in terms of the conditions that are placed on the rule set R. Let A, B ∈ Φ, w ∈ Σ and α, β, γ ∈ (Φ ∪ Σ).





Language class Machine type








Regular A → w Finite automaton
(Type 3) A → wB or A → Bw




Context-free A → α Basement machine
(Type 2)




Context- αAγ → αβγ with β ≠ ϵ or
sensitive S → ϵ (then S is not allowed Linear
(Type 1) on a right side more limited
a rule occur) Machine (LBA)




(Type 0) α → β (with α ≠ ϵ and α ⁄∈ Σ) Turing machine




The complexity of the computations for parsing increases with each type of grammar.

[Back] [Back (bottom of page)] [Top of page] [Supplementary chapter] [Please report script error]