LINGUISTIquips


Using PROLOG to process language (Part I).
January 21, 2008, 1:56 am
Filed under: Linguistic Computation | Tags: , ,

Humans develop language; indeed, it is (perhaps without question) impossible for a human not to acquire language. In the case of computers, however, human language remains a difficult proposition. In fact, making computers process or synthesize human language can be very difficult.

Firstly, generative grammars on computers must be completely explicit in nature. If they are not, the computer will spit out something unintelligible to humans. As it turns out, this required level of explicitness is of great interest to linguists, as teaching a computer to “talk” requires a deep understanding of language itself.

If we are to create such a grammar on a computer, we must place in front of us several goals for success:

• The grammar must be fully explicit to the point that only grammatical sentences are produced by the computer.

• The computer must be able to produce all possible grammatical sentences using the lexical and semantic base that we provide (ie the number of words we give it).

• The computer must be able to produce these sentences in polynomial time; that is, it must render them faster than exponential time in order to make the use of a computer-language processor practical.

The programming language that I’ll be using is called SWI-PROLOG. It seems well suited to programming human language as, unlike most other computer languages like C or Pascal, PROLOG is logic-based instead of procedural in nature. That is to say, with most programming languages, the computer must be explicitly told how to perform an action, and it then carries out a sequence of such actions exactly as it was told. PROLOG, on the other hand, learns logical relationships between items in a program, and then solves new problems which are prompted by the user (It is important to note that PROLOG is a child of the Artificial Intelligence movement in computing in the late 1960’s. The PROLOG wiki can be found here).

To illustrate why PROLOG is well suited to human language (insofar as a programming language can be), consider how prolog solves problems.

For example, let’s say that we place the following PROLOG statements into a PROLOG program:

Example 1.A

parent(george,bobby).

parent(gracie,bobby).

The above are referred to as predicates in PROLOG lingo. Basically, we have told the PROLOG interpreter 2 things:

• There is an entity named “george” who is a parent of “bobby.”

• There is an entity named “gracie” who is a parent of “bobby.”

It is important to note that the order in which items are listed in predicates (ie george,bobby versus bobby, george) is completely arbitrary from the standpoint of PROLOG’s interpretation. However, it is logical to set up a simple convention at this point, namely that the predicate title (in this case “parent”) describes the first item listed, and the second item is related to the meaning of the first in some obvious way. For example, “george” is the parent, and “bobby” is the child of the parent (another example might be president(lincoln,usa), where “lincoln” is the president, and “usa” is his country of presidency.

As I stated before, PROLOG works by solving problems. Put simply, a variable in PROLOG is one that starts or consists of all UPPER case letters. After having the PROLOG interpreter (in this case, SWI-PROLOG) consult our small program (example 1.A), we can then have it solve a problem:

Example 1.B

?- parent(george,X).

The above presents PROLOG with a goal: find X. Obviously, it should return the following result:

Example 1.C

X = bobby

This isn’t too exciting; after all, we told PROLOG explicitly that “george” was the parent of “bobby.” To better illustrate PROLOG’s logical abilities, consider a revised program:

Example 1.D

parent(george,bobby).

parent(gracie,bobby).

child(X) :- parent(Y,X).

Now we’ve added a little bit for PROLOG to learn. We tell it now that something may be referred to as a “child” if that something is also listed as the second item in a “parent” predicate (note that the special operator :- means “if” in PROLOG, essentially making line three of example 1.D child(X) if parent(Y,X).).

Now, we might ask PROLOG if “bobby” is a child.

Example 1.E

?- child(bobby).

Yes

Notice PROLOG responds to simple predicates with no variables with a Yes or a No. Next we’ll see if PROLOG truly understands the parent/child relationship by asking it if “george” is a child:

Example 1.F

?- child(george).

No

And there you have it: we never had statements in our program like child(bobby) explicitly telling PROLOG who was a child, but still, PROLOG understood the relationship and was able to respond based on logic. It is this logical basis that makes PROLOG well-suited for the formation of generative grammars, a topic which I will begin to cover in Part II.

Technorati Tags: , ,