What are the contributions mentioned in the paper "Analysis of sanskrit text : parsing and semantic relations" ?

In this paper, the authors are presenting their work towards building a dependency parser for Sanskrit language that uses deterministic finite automata ( DFA ) for morphological analysis and ’ utsarga apavaada ’ approach for relation analysis.

What are the future works mentioned in the paper "Analysis of sanskrit text : parsing and semantic relations" ?

Hence future works in this direction include parsing of compound sentences and incorporating Stochastic parsing. The authors are trying to come up with a good enough lexicon so that they can work in the direction of y ? ? in Sanskrit sentences.

Why is the morphological analyzer used for the analysis of Sanskrit words?

While evaluating the Sanskrit words in the sentence, the authors have followed these steps for computation:1. First, a left-right parsing to separate out the words in the sentence is done.

What is the way to get rid of the blocking?

If the algorithm is able to generate a parse taking the longest possible match, the authors will not go into stacked possibilities, but if the subject disagrres with the verb (blocking), or some other mismatch is found, the authors will have to go for stacked possibilities.

(Open Access) Analysis of Sanskrit Text: Parsing and Semantic Relations (2009) | Pawan Goyal

Q: What are the 9 classes of pronouns in Sanskrit?

The authors have classified each of these pronouns into 9 classes: Personal, Demonstrative, Relative, Indefinitive, Correlative, Reciprocal and Possessive.

HAL Id: inria-00203459

https://hal.inria.fr/inria-00203459

Submitted on 10 Jan 2008

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Analysis of Sanskrit text : parsing and semantic relations

Pawan Goyal, Vipul Arora, Laxmidhar Behera

To cite this version:

Pawan Goyal, Vipul Arora, Laxmidhar Behera. Analysis of Sanskrit text : parsing and seman-

tic relations. First International Sanskrit Computational Linguistics Symposium, INRIA Paris-

Rocquencourt, Oct 2007, Rocquencourt, France. �inria-00203459�

ANALYSIS OF SANSKRIT TEXT: PARSING AND SEMANTIC

RELATIONS

Pawan Goyal

Electrical Engineering,

IIT Kanpur,

208016, UP,

India

pawangee@iitk.ac.in

Vipul Arora

Electrical Engineering,

IIT Kanpur,

208016, UP,

India

vipular@iitk.ac.in

Laxmidhar Behera

Electrical Engineering,

IIT Kanpur,

208016, UP,

India

lbehera@iitk.ac.in

Abstract

In this paper, we are presenting our work

towards building a dependency parser for

Sanskrit language that uses determinis-

tic ﬁnite automata(DFA) for morpholog-

ical analysis and ’utsarga apavaada’ ap-

proach for relation analysis. A computa-

tional grammar based on the framework

of Panini is being developed. A linguis-

tic generalization for Verbal and Nomi-

nal database has been made and declen-

sions are given the form of DFA. Verbal

database for all the class of verbs have

been completed for this part. Given a

Sanskrit text, the parser identiﬁes the root

words and gives the dependency relations

based on semantic constraints. The pro-

posed Sanskrit parser is able to create

semantic nets for many classes of San-

skrit paragraphs(

 

). The parser is

taking care of both external and internal

sandhi in the Sanskrit words.

1 INTRODUCTION

Parsing is the ”de-linearization” of linguistic in-

put; that is, the use of grammatical rules and other

knowledge sources to determine the functions of

words in the input sentence. Getting an efﬁcient

and unambiguous parse of natural languages has

been a subject of wide interest in the ﬁeld of

artiﬁcial intelligence over past 50 years. Instead

of providing substantial amount of information

manually, there has been a shift towards using

Machine Learning algorithms in every possible

NLP task. Among the most important elements

in this toolkit are state machines, formal rule

systems, logic, as well as probability theory and

other machine learning tools. These models,

in turn, lend themselves to a small number

of algorithms from well-known computational

paradigms. Among the most important of these

are state space search algorithms, (Bonet, 2001)

and dynamic programming algorithms (Ferro,

1998). The need for unambiguous representation

has lead to a great effort in stochastic parsing

(Ivanov, 2000).

Most of the research work has been done for

English sentences but to transmit the ideas with

great precision and mathematical rigor, we need a

language that incorporates the features of artiﬁcial

intelligence. Briggs (Briggs,1985) demonstrated

in his article the salient features of Sanskrit

language that can make it serve as an Artiﬁcial

language. Although computational processing

of Sanskrit language has been reported in the

literature (Huet, 2005) with some computational

toolkits (Huet, 2002), and there is work going

on towards developing mathematical model and

dependency grammar of Sanskrit(Huet, 2006), the

proposed Sanskrit parser is being developed for

using Sanskrit language as Indian networking lan-

guage (INL). The utility of advanced techniques

such as stochastic parsing and machine learning

in designing a Sanskrit parser need to be veriﬁed.

We have used deterministic ﬁnite automata

for morphological analysis. We have identiﬁed

the basic linguistic framework which shall facili-

tate the effective emergence of Sanskrit as INL. To

achieve this goal, a computational grammar has

been developed for the processing of Sanskrit lan-

guage. Sanskrit has a rich system of inﬂectional

endings (vibhakti). The computational grammar

described here takes the concept of vibhakti and

karaka relations from Panini framework and uses

them to get an efﬁcient parse for Sanskrit Text.

The grammar is written in ’utsarga apavaada’ ap-

proach i.e rules are arranged in several layers each

layer forming the exception of previous one. We

are working towards encoding Paninian grammar

to get a robust analysis of Sanskrit sentence. The

paninian framework has been successfully applied

to Indian languages for dependency grammars

(Sangal, 1993), where constraint based parsing is

used and mapping between karaka and vibhakti

is via a TAM (tense, aspect, modality) tabel. We

have made rules from Panini grammar for the

mapping. Also, ﬁnite state automata is used for

the analysis instead of ﬁnite state transducers.

The problem is that the Paninian grammar is

generative and it is just not straight forward to

invert the grammar to get a Sanskrit analyzer, i.e.

its difﬁcult to rely just on Panini sutras to build

the analyzer. There will be lot of ambiguities

(due to options given in Panini sutras, as well

as a single word having multiple analysis). We

need therefore a hybrid scheme which should

take some statistical methods for the analysis of

sentence. Probabilistic approach is currently not

integrated within the parser since we don’t have

a Sanskrit corpus to work with, but we hope that

in very near future, we will be able to apply the

statistical methods.

The paper is arranged as follows. Section 2

explains in a nutshell the computational process-

ing of any Sanskrit corpus. We have codiﬁed the

Nominal and Verb forms in Sanskrit in a directly

computable form by the computer. Our algorithm

for processing these texts and preparing Sanskrit

lexicon databases are presented in section 3. The

complete parser has been described in section

4. We have discussed here how we are going

to do morphological analysis and hence relation

analysis. Results have been enumerated in section

5. Discussion, conclusions and future work follow

in section 6.

2 A STANDARD METHOD FOR

ANALYZING SANSKRIT TEXT

The basic framework for analyzing the Sanskrit

corpus is discussed in this section. For every

word in a given sentence, machine/computer is

supposed to identify the word in following struc-

ture. < W ord >< Base >< F orm ><

Relation >.

The structure contains the root word (<Base>)

and its form <attributes of word> and relation

with the verb/action or subject of that sentence.

This analogy is done so as to completely disam-

biguate the meaning of word in the context.

2.1 <Word>

Given a sentence, the parser identiﬁes a singular

word and processes it using the guidelines laid out

in this section. If it is a compound word, then the

compound word with



has to be undone. For

example:

 

!"

# 

2.2 <Base>

The base is the original, uninﬂected form of the

word. Finite verb forms, other simple words and

compound words are each indicated differently.

For Simple words: The computer activates the

DFA on the ISCII code (ISCII,1999) of the San-

skrit text. For compound words: The computer

shows the nesting of internal and external







using nested parentheses. Undo

$%

changes be-

tween the component words.

2.3 <Form>

The <Form> of a word contains the information

regarding declensions for nominals and state for

verbs.

• For undeclined words, just write u in this col-

umn.

• For nouns, write ﬁrst.m, f or n to indicate the

gender, followed by a number for the case (1

through 7, or 8 for vocative), and s, d or p to

indicate singular, dual or plural.

• For adjectives and pronouns, write ﬁrst a, fol-

lowed by the indications, as for nouns, of

gender (skipping this for pronouns unmarked

for gender), case and number.

• For verbs, in one column indicate the class

(

&

) and voice. Show the class by a num-

ber from 1 to 11. Follow this (in the same

column) by ’1’ for parasmaipada, ’2’ for

¨atmanepada and ’3’ for ubhayapada. For ﬁ-

nite verb forms, give the root. Then (in the

same column) show the tense as given in Ta-

ble 3. Then show the inﬂection in the same

column, if there is one. For ﬁnite forms, show

Table 1: Codes for

<Form>

pa/ passive

ca/ causative

de/ desiderative

fr/ frequentative

Table 2: Codes for Fi-

nite Forms, showing the

Person and the Number

'(

*) +,

- ./) 0+,

132

*) +,

s singular

d dual

p plural

Table 3: Codes for

Finite verb Forms,

showing the Tense

pr present

if imperfect

iv imperative

op optative

ao aorist

pe perfect

fu future

f2 second future

be benedictive

co conditional

the person and number with the codes given

in Table 2. For participles, show the case and

number as for nouns.

2.4 <Relation>

The relation between the different words in a

sentence is worked out using the information

obtained from the analysis done using the guide-

lines laid out in the previous subsections. First

write down a period in this column followed by

a number indicating the order of the word in the

sentence. The words in each sentence should

be numbered sequentially, even when a sentence

ends before the end of a text or extends over

more than one text. Then, in the same column,

indicate the kind of connection the word has to

the sentence, using the codes given in table 4.

Then, in the same column, give the number

of the other word in the sentence to which this

word is connected as modiﬁer or otherwise. The

relation set given above is not exhaustive. All the

6 karakas are deﬁned as in relation to the verb.

3 ALGORITHM FOR SANSKRIT

RULEBASE

In the section to follow in this paper, we shall

explain two of the procedures/algorithms that we

have developed for the computational analysis of

Sanskrit. Combined with these algorithms, we

Table 4: Codes for <Relation>

v main verb

vs subordinate verb

s subject(of the sentence or a subordinate clause)

o object(of a verb or preposition)

g destination(gati) of a verb of motion

a Adjective

n Noun modifying another in apposition

d predicate nominative

m other modiﬁer

p Preposition

c Conjunction

u vocative, with no syntactic connection

q quoted sentence or phrase

r deﬁnition of a word or phrase(in a commentary)

have arrived at the skeletal base upon which many

different modules for Sanskrit linguistic analysis

such as: relations,

$%







can be worked

out.

3.1 Sanskrit Rule Database

Every natural language must have a representa-

tion, which is directly computable. To achieve

this we have encoded the grammatical rules

and designed the syntactic structure for both the

nominal and verbal words in Sanskrit. Let us

illustrate this structure for both the nouns and the

verbs with an example each .

Noun:-Any noun has three genders: Mas-

culine,Feminine and Neuter. So also the noun

has three numbers: Singular, Dual and Plural.

Again there exists eight classiﬁcation in each

number: Nominative, Accusative, Imperative,

Dative, Ablative, Genitive, Locative and Vocative.

Interestingly these express nearly all the relations

between words in a sentence .

In Sanskrit language, every noun is deﬂected

following a general rule based on the ending al-

phabet such as

#4567





. For example,

68

is in

class

4568





which ends with



(a). Such clas-

siﬁcations are given in Table 5. Each of these have

different inﬂections depending upon which gender

they correspond to. Thus

#4567





has different

masculine and neuter declensions,

4568





has

masculine and feminine declensions,

4$:67





has

masculine, feminine and neuter declensions. We

have then encoded each of the declensions into

ISCII code, so that it can be easily computable

in the computer using the algorithm that we have

developed for the linguistic analysis of any word .

Table 5: attributes of the declension for noun

Class

∗

Case

Gender

;=<?>A@>CBED

(1)

<?>C@>CBED

(14)

<?GH>

(1)

J K"LCMN

(1)

;=>C<?>C@>CBED

(2)

<>C@>ABED

(15)

<?P

(2)

Q RTS7LCUN

(2)

V"<?>C@>CBED

(3)

<?>C@>CBED

(16)

<?@X

(3)

W0J KZY [

LCUN

(3)

<?>C@>CBED

(4)

<?>A@>CBED

(17

)

[^]`_aF

(4) Number

bc<>C@>ABED

(5)

<?>C@>CBED

(18)

;

(5)

<?eaf

(1)

g<?>A@>CBED

(6)

<>C@>ABED

(19)

[^]ji

(6)

LAk

elf

(2)

mn<?>C@>CBED

(7)

@<>C@>ABED

(20)

;

LoO

<?@X

(7)

iap

eaf

(3)

d q

<?>C@>CBED

(8)

ea<?>C@>CBED

(21)

[^]`iCrOsW

(8)

;

<?>C@>CBED

(9)

<?>C@>CBED

(22)

;"ua<?>A@>CBED

(10)

<?>C@>CBED

(23)

f^<?>C@>CBED

(11)

[

<?>C@>CBED

(24)

w=<?>A@>CBED

(12)

<?>C@>CBED

(25)

D0<?>C@>CBED

(13)

Let us illustrate this structure for the noun

with an example . For

#4$:67





, masculine,

nominative, singular declension:

This is encoded in the following syntax:

(163{1

∗

, 1

}) .

Where 163 is the ISCII code of the declension

(Table 6). The four 1’s in the curly brackets repre-

sent Class, Case, Gender and Number respectively

(Table 5) .

Table 6: Noun example



Masculine

Singular(

45y?z

)

Endings ISCII Code

Nominative

{

163

Pronouns:-According to Paninian grammar

and Kale, (Kale) Sanskrit has 35 pronouns which

are:



y|



y?}

13~

6

H









.H6

6



y



 ?

.



.

) yT|

)?6

y6





)?6





y





6



.







H











. `

y

and



45

We have classiﬁed each of these pronouns into

9 classes: Personal, Demonstrative, Relative, In-

deﬁnitive, Correlative, Reciprocal and Possessive.

Each of these pronouns have different inﬂectional

forms arising from different declensions of the

masculine and feminine form. We have codiﬁed

the pronouns in a form similar to that of nouns .

Adjectives:- Adjectives are dealt in the same

manner as nouns. The repetition of the linguistic

morphology is avoided .

Verbs:- A Verb in a sentence in Sanskrit

expresses an action that is enhanced by a set of

auxiliaries”; these auxiliaries being the nominals

that have been discussed previously .

The meaning of the verb is said to be both

vyapara (action, activity, cause), and phala (fruit,

result, effect). Syntactically, its meaning is in-

variably linked with the meaning of the verb ”to

do”. In our analysis of Verbs, we have found that

they are classiﬁed into 11 classes(

&

, Table 7).

While coding the endings, each class is subdivided

according to ”

9



” knowledge,



















and

y





; each of which is again sub-classiﬁed as into 3

sub-classes as





 ?)?

)?68 ?)?

and

13~

.)

which we have denoted as pada. Each verb sub-

class again has 10 lakaaras , which is used to ex-

press the tense of the action. Again, depending

upon the form of the sentence, again a division

of form as

|`y.

4$E|jy?.

and

yy?.

has

been done. This classiﬁcation has been referred

to as voice. This structure has been explained in

Table 7.

Table 7: attributes of the declension for verb

Class

∗

pada

T ense

jea>

LCF

(1)

[ ql 

(1)

;>AEP

W qaJaF

(1)

U 

(1)

;

LCF

(2)

;

LCW0



(2)

P 

JlF

(2)

UN

(2)

LCF

ea>

LCF

(3)

ql



(3)

h78JaF

(3)

U

(3)

ea>

LCF

(4)

U%ra



(4)

K"F

LCF

(5)

LoO0LCUN

(5)

0c>

LCF

(6)

;=>

tS

LCUN

(6)

LCF

(7)

LCU



(7)

>

LAF

(8)

 

(8)

@>

LCF

(9)

U K"N

(9)

pr





LCF

(10)



(10)

<?j



ea>

LCF

(11)

V oice

P erson

Number

<?G

ea>C



(1)

_

J K



(1)

<?eaf

(1)

<P

ea>C



(2)

P?



J K



(2)

LAk

elf

(2)

>Ceael>C



(3)

bcGP



(3)

iap

elf

(3)

Let us express the structure via an example for

y?



8&

)?67 ?)

, Present Tense, First person,

Analysis of Sanskrit Text: Parsing and Semantic Relations

Figures

Citations

Sanskrit Word Segmentation Using Character-level Recurrent and Convolutional Neural Networks

Formal Structure of Sanskrit Text: Requirements Analysis for a Mechanical Sanskrit Processor

Design and analysis of a lean interface for Sanskrit corpus annotation

A Deterministic Dependency Parser with Dynamic Programming for Sanskrit

Extracting Dependency Trees from Sanskrit Texts

References

Introduction to Automata Theory, Languages, and Computation

Planning as heuristic search

Recognition of visual activities and interactions by stochastic parsing

Parsing Free Word Order Languages in the Paninian Framework

A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger

Related Papers (5)

SanskritTagger: A Stochastic Lexical and POS Tagger for Sanskrit

Designing a Constraint Based Parser for Sanskrit

Shallow syntax analysis in Sanskrit guided by semantic nets constraints

Extracting Dependency Trees from Sanskrit Texts

Formal Structure of Sanskrit Text: Requirements Analysis for a Mechanical Sanskrit Processor

Frequently Asked Questions (7)

Q1. What are the contributions mentioned in the paper "Analysis of sanskrit text : parsing and semantic relations" ?

Q2. What are the future works mentioned in the paper "Analysis of sanskrit text : parsing and semantic relations" ?

Q3. What is the purpose of the proposed Sanskrit parser?

Q4. What are the 9 classes of pronouns in Sanskrit?

Q5. Why is the morphological analyzer used for the analysis of Sanskrit words?

Q6. What is the way to get rid of the blocking?

Q7. What is the way to analyze Sanskrit?