Tamil Computing Research Experiences - Part 2
Followed by my Part-1 article, https://www.subalalitha.in/post/on-tamil-computing-research-experiences-part-1, thought of writing part 2 of it. Catching up with our own passion amidst routine work really gives a relaxed feeling and that's why I thought of taking up this as my first task in the morning!
In Part-1, I had shared about my experiences on CoRee search engine funded project at Tamil Computing Lab (TaCoLa), Anna University. I have now updated part 1 with the research paper that we had published on CoRee. Actually through CoRee, we got a lot of exposure to search engine (Actually we call it Information Retrieval in NLP) concepts such as, document processing and representation, query processing and expansion, indexing, searching and ranking. The best part of CoRee is that it emitted many research components . All Junior Research Fellow(JRF) (including me )who worked on CoRee found our PhD ideas with much ease. My friend Dr.Uma Maheswari who is now working as Research Associate in Nanyang Technological University, Singapore was working on search and rank of CoRee and her PhD was on Events searching. The other friend Dr.Balaji who is now head of R and D in a Bangalore based software company worked on UNL representation of CoRee and he enhanced that representation in his PhD.
I worked on Indexer and Summary Generation modules of it and even I wanted to do related to this in my PhD but the idea of using Indian Logic was continuously hitting hard as my guide Dr.Ranjani Parthasarathi mam used to speak more about Indian logic from Nyaya and Mimamsa which are Sanskrit Literature concepts and it's connect with the current day Computer Science theories. I actually wanted to rope in some Tamil concepts. Fortunately, Dr.Ka.Pa Aravaanan sir gave a talk on Nannool ideas correlating with our Current day teaching methods. He talked about "pathu azhagugal" and "pathu kutram" (பத்து அழகு & பத்து குற்றம் ) that should be in a book and he taught how this can help evaluating students research thesis. I still remember when he was giving that speech, there was a power cut and he was reading using an emergency lamp in that dark hall with no lights and fans ! I was so impressed and was thinking how to use Nannool for my PhD . I could not find for long time but had hopes on it.
I had to take a break in my job and my PhD as I had to leave with my husband Selva to UK for his onsite assignment for about a year. Fortunately, I remembered to take the Nanool book along with me that I found in my parents home. My mom being Tamil Professor and Dad being an ardent Tamil book reader, we always have many Tamil books at home. My parents gave more importance to buying books than buying luxury related stuffs. I used to get irritated with this during my childhood days as books occupied most of the shelves of our very tiny home that had one room in a government quarters at Foreshore Estate, Chennai. We change opinions as we grow and so I did! Back to my UK life, it gave me more time to research both computer science and cooking as well :) . One day, I was reading the nannool book and was trying hard to correlate with my Phd. That's when those words flashed some thoughts on me. These are the words. "சில்வகை எழுத்தில் பல்வகைப் பொருளை". Actually this is how the definition of Soothiram (சூத்திரம்) starts in Nannool. Soothiram which means formula which is used to define the rules of Tamil Grammar in Nannool. Here is the full definition.
சில்வகை எழுத்தில் பல்வகைப் பொருளைச்
செவ்வன் ஆடியிற் செறித்தினிது விளக்கித்
திட்ப நுட்பஞ் சிறந்தனசூத்திரம்
gloss: The speciality of the sūtras is to coherently convey the semantics precisely, accurately, and with certainty, using a few words.
The first lines was continuously hinting me and I felt it has a connect with semantic indexing. because semantics deals with capturing many meanings in a single word like apple has many meanings( fruit and computer) or any synonyms have multiple interpretations. I thought this should also have some connect with semantics. I confirmed with my cousin sister who is a Tamil Teacher in a government school and also spoke to my Mom's colleague who has deeper knowledge in Nannool. Yes I did not believe in my Mom as most of the children would do :) They could not exactly confirm but they said it is possible. So wrote a mail to my guide Dr.Ranjani Parthasarathi mam. She was already having this in her mind. She liked this and she said even Sanskrit literature have Sutras and she said we can explore both. Yeah they did have many similarities.
Okay, now I had to delve into how these could be merged with a computational theory. That is when the next lines of Soothiram helped. It defined the characteristics of Soothiram.
ஆற்றொழுக் கரிமா நோக்கு தவளைப்
பாய்த்து பருந்தின் வீழ் வண்ண சூத்திர நிலை
Gloss:“su̅tra̅s have the characteristics of a river’s flow, lion’s vision, frog’s jump and eagle’s flight”.
This tells how sutras relate texts. We were already researching on "Discourse Analysis " Discourse means coherence. Actually we would have heard this term discourse used for religious lectures.We were researching on "Rhetorical Structure Theory (RST)" which is used to do discourse analysis. My guide suggested comparing RST with Sangatis used in Sanskrit literature that links sutras. Yes, we could find that many RST relations had a collision with sangatis. These relations connect texts using semantic relations. Like we have "reason relation" between the sentences, "She Smiled" and "I smiled back". Now we decided to first make a semantic text representation using RST and Sangatis and identify sutras for this representation. Did you get the link? A text document when represented as a graph captures many semantic links in it and since sangatis link sutras these graphs can have sub graph denoting "soothiram" . We explored the characteristics of soothiram to trace the semantic links. For instance, River flows in the same direction. Piece of texts that are linked in the same direction can have a soothiram in them. Texts can have relations pointing in any direction, I mean the graph that we build is a directed graph. We perceived Sutra as a crisp representation of a text as per it's definition and we used this to index and to generate a summary of a text. These ideas are published in this journal. https://www.degruyter.com/view/journals/jisys/23/3/article-p231.xml and here are two other papers
Since we had used UNL concepts in CoRee about which I had mentioned in Part1, we used UNL as the foundation for RST- Sangati Graph as UNL had a connect with both of them. .This was so satisfying as we were able to connect both modern and our ancient literature ideas and yes my dream of roping Tamil in my Phd came true. Will continue about my Post PhD Tamil Computing experiences in Part-3 !
Thanks for reading :)