Language is a good starting point for building inclusive AI: BHASHINI CEO Amitabh Nag | Technology News

Vikas Halpati

4 hours ago

Amitabh Nag is the CEO of the Digital India BHASHINI Division, where he leads the implementation of the National Language Translation Mission (NLTM), India’s initiative to build an inclusive multilingual AI ecosystem.

BHASHINI, short for Bhasha Interface for India, is one of the world’s largest multilingual AI initiatives. It brings together startups, academia, research institutions, industry, and government to build indigenous language technologies.

With over four decades of experience across technology, digital transformation, business strategy, and public sector innovation, Amitabh Nag has held leadership roles at HP and TCS. He has led large-scale IT transformation programmes across government and enterprise sectors, including the Passport Seva project, and holds a bachelor’s degree in engineering from the National Institute of Technology, Kurukshetra.

Amitabh Nag spoke to indianexpress.com on the impact of BHASHINI, the challenges of building indigenous language technologies, and how language tech could be the starting point for an inclusive AI. Edited excerpts:

Venkatesh Kannaiah: Tell us about BHASHINI and its journey.

Amitabh Nag: When the project was launched in 2022, the idea was to transcend the language barrier. It is not just about technology; it is also about looking into other factors that enable these language barriers to persist. Apart from technology, content, data, standards, community participation, education, and widespread adoption are also needed to overcome the language barrier.

We expected the biggest uptake of such language services in governance and service delivery, because that’s where people’s lives are most impacted.

We had a charter of building solutions for 22 languages to start with, but we are not limited by that.

We are working with around 70 research institutes across the country to co-create solutions in a localised manner. The dataset generation and AI model vetting are being carried out by them at the local level.

Most languages in India do not have sufficient digital data to build these language models. So we had to collect data on the ground, asking people to speak about a topic, subject, or picture, and then create a digital corpus. It was a unique approach and was later followed by others across the globe.

Venkatesh Kannaiah: Give us an idea of BHASHINI’s applications and services.

Amitabh Nag: We had five problems to tackle: automatic speech recognition, text-to-text translation, text-to-speech, optical character recognition, and named entity recognition.

We built models that could address these across 22 languages and beyond. We also developed services that allow these models to function and become conversational.

For example, if you’re not able to do automatic language detection in an application and instead want the person to select the language manually, you already have the first point of disconnect. Automatic language detection is extremely important for people on the other side of the digital and literacy divide. Then we added voice activity detection and a number of other supporting services. There are over 40 such services today, and some are under development as we move into the next phase.

The glossary is one key addition. It essentially consists of words that are unique to various Indian languages or that we have created over a period of time. Then there are words that are spoken or understood differently across regions. For example, in land records, the same parcel of land may be referred to by different terms depending on the region.

Till now, we have collected about three million contextually sensitive words relevant to the Indian context, and we hope to reach 10 million soon.

Since our launch in 2023, there have been around eight billion inferences, and today we process almost 20 million inferences a day. Almost all the states are using the platform, and it is being used across states, services, and government departments.

Venkatesh Kannaiah: Tell us about solutions that have come out of BHASHINI.

Amitabh Nag: BHASHINI is being used in 1.2 lakh panchayats across the country. , where the audio is transcribed automatically, and minutes of meetings are generated.

We have also reached approximately 20 lakh downloads of our BHASHINI mobile app.

Our solutions are being used to generate advisory services for farmers in multiple states. A farmer can just ask a question in his language, and the advisory is delivered in the same language in voice mode. So two-way communication with farmers, on a voice-first and voice-journey model, is already happening.

People interact with digital systems largely through a text journey. Sometimes they use voice plus text, but in a voice journey, the microphone is the only user interface. In the next phase, we are working on enabling transactions through a voice journey.

About 800 aspirational districts are using our language solutions with a device called NITI TARA (Toolkit for Analysis, Review & Action), a portable ’Experience in a Box’ solution by NITI Aayog. It transforms any flat wall into an interactive touch-screen to bring data, planning tools, and collaborative review directly to the grassroots.

Our solutions are also available on more than 900 websites, and that number is continuously increasing. This helps disseminate a large amount of information in regional languages.

Our website translation plugins are unique in several ways. They support multiple languages, preserve the user’s session, and ensure that once a language is selected, the user can continue browsing the entire website in that language.

We have developed multiple such products and solutions.

Venkatesh Kannaiah: Are there any comparable international governmental initiatives like BHASHINI?

Amitabh Nag: No. We are unique in many ways, and our initiatives are a mascot, globally, for inclusivity. We will also be in UNDP’s Atlas for the inclusivity and public communication domain.

Our solutions are Digital Public Infrastructure meant for public utilisation rather than commercial gain. There is, of course, a multiplicity of languages, say in Europe, but they have not built tools like BHASHINI. They do have organisations working in this area. Using such tools at this scale for public services and service delivery is something the West has never really been able to do.

You saw it with CoWIN, with Aadhaar, and with other digital public infrastructure initiatives. Our kind of citizen-first approach has not been replicated elsewhere.

Venkatesh Kannaiah: Tell us about the tech challenges in BHASHINI’s journey.

Amitabh Nag: Challenges remain, and will continue to remain. India is diverse, and we will have to cater to more than 100 languages. Today, building language models is not difficult, but getting the data for them is. Everyone is trying to say, “My model is better than your model,” but it is hardly a meaningful statement because both models may not have been exposed to sufficient data required to make them robust.

Unlike most IT or technology systems, language tech is an effort in co-creation. It is not where someone develops a product, writes some programmes, carries out user acceptance testing, and then rolls it out. This is continuous co-creation. Getting buy-in from all stakeholders for such a co-development is also a challenge. Why should someone invest in it? They need to see the benefits. Establishing the fact that they will continue to benefit if they continue to contribute is important. So our customers are partners who are committed to improving their own functioning as well as ours.

The important thing is to ensure that we reach the last mile. If we don’t do that, people will not know how to use our solutions, how to participate, or how to co-create at every level.

Otherwise, we could end up creating divides that go far beyond the digital divide. We might have an AI divide in the future if we do not focus on bringing local communities, rural populations, and tribal communities into this ecosystem. Language is perhaps the most important use case because it is something where everyone can and is willing to contribute to.

Language is the best starting point for inclusive AI because everyone speaks a language and can help improve AI by contributing speech, text, translations, or local expressions. Unlike more specialised datasets, language data can be contributed by almost anyone.

Once AI understands local languages well, it becomes much easier to build other AI applications in health, agriculture, education, and government services in those languages. In other words, language is the foundation on which many other AI use cases can be built.

Venkatesh Kannaiah: What are you trying to solve with BHASHINI Sahyogi?

Amitabh Nag: We are looking at partnerships. These can be with states, educational institutions, or with individuals. We are also trying to create a community to contribute to the system.

What we do is look at the problem statements that are given to us by the government or industry, and then involve academics to solve them. But the idea is to build and create communities.

Venkatesh Kannaiah: Tell us about Vatika, and how you collaborate with other researchers in the field of language?

Amitabh Nag: Vatika means garden, and we have collected a large number of datasets and built many models with that.

All the datasets and models we built with educational institutions have been open-sourced. Anyone can pick up these and use them to build their own models. Had we not done this, the kind of revolution you see today in the language space would probably not have happened.

There are two important things about open source. First, as we go along, we cultivate a community of people who can use the system to build something new. They may build something even better than what we have built, and that is what we want to encourage.

Second, it helps us remain on our toes. What we have built cannot be kept with us forever. Once it is made available to everyone, we have to keep innovating and doing new things to stay ahead of the curve.

Venkatesh Kannaiah: Tell us about BhashaDaan and how it works.

Amitabh Nag: BhashaDaan is meant for creating datasets through crowdsourcing. The idea is simple. It has four facets — Bolo (speak), Likho (write), Suno (hear), and Dekho (see). In Bolo, for example, I speak a word or a sentence that I see on the screen, and it gets recorded. Once recorded, it goes through a chain of validation. We then use a three-out-of-five or best-of-five consensus to create a golden dataset that can be used for training models.

This is done to bring everyday spoken language into the system, which is quite different from the formal language that we usually write or use in official communication.

Venkatesh Kannaiah: Since you process so many languages, what changes are you seeing in the way people use them? Like, say, the usage of Hinglish. What trends stand out?

Amitabh Nag: Many of these Hinglish kind of trends are in cities and are not visible at the last mile, because most of them are comfortable in only one language. That is where the real test of our solutions happens.

That is also where one of the most debated aspects, accuracy, is really tested. Accuracy is only as good as what you are trying to understand. I might not be speaking perfectly right now, but if you are able to understand me, it is fine. Now, if you start testing my words and language with the intention of asking whether it is grammatically correct or whether every word has been pronounced perfectly, then perhaps we are in trouble. India will speak in 1.4 billion different ways.

Take the example of doctor-patient communication. Patients usually describe their symptoms in everyday language rather than using medical terms. A doctor may understand the underlying medical condition, but the patient’s description reflects ordinary conversation, not clinical terminology.

Interestingly, such data is not available anywhere. Such a dataset can only be captured when the system is actually deployed. Initially, the system may translate or transcribe these expressions incorrectly. But once a human corrects them, they become part of the golden dataset on which the model can be trained.

That is why we have to be patient enough to deploy these systems, learn from real-world usage, and continuously improve them.

Venkatesh Kannaiah: How do you work with startups in this field?

Amitabh Nag: We let startups use our systems and also build on top of them. Once they apply for an API key, they can access our open APIs for free to build their solutions.

We also organise innovation challenges and hackathons. During these programmes, we coach, guide, and mentor startups on deploying these new technologies. And when we are deploying new technology, they are obviously not yet industry-grade. There are chances that they may not work exactly the way we want. So we provide startups with the cushion to experiment and make mistakes. That is perhaps the most important requirement when we are working with emerging technologies.

Source link