LORELEI Imagines Rapid Automated Language Toolkit
Low-cost approach to extracting critical concepts from public information sources in unfamiliar languages would support disaster relief and other quick-response missions
Understanding local languages is essential for effective situational awareness in military operations, and particularly in humanitarian assistance and disaster relief efforts that require immediate and close coordination with local communities. With more than 7,000 languages spoken worldwide, however, the U.S. military frequently encounters languages for which translators are rare and no automated translation capabilities exist. DARPAs Low Resource Languages for Emergent Incidents (LORELEI) program aims to change this state of affairs by providing real-time essential information in any language to support emergent missions such as humanitarian assistance/disaster relief, peacekeeping and infectious disease response. The program recently awarded Phase 1 contracts to 13 organizations.
"The global diversity of languages makes it virtually impossible to ensure that U.S. personnel will be able to understand the situation on the ground when they go into new environments," said Boyan Onyshkevych, DARPA program manager. "Through LORELEI, we envision a system that could quickly pick out key information—things such as names, events, sentiment and relationships—from public news and social media sources in any language, based on the systems understanding of other languages. The goal is to provide immediate, evolving situational awareness that helps decision makers assess and respond as intelligently as possible to dynamic, difficult situations."
The conventional system of developing automated language technology—which requires years of effort and tens of millions of dollars to manually translate, transcribe and annotate individual words and phrases for each language—is adequate for languages in widespread use or in high demand. It is neither flexible enough to meet constantly changing language needs, however, nor specialized enough to account for the specific communication challenges involved in military-level emergency response.
LORELEI seeks to dramatically advance computational linguistics and human language technology to identify the elements that different languages have in common, and use that knowledge to enable rapid, low-cost development of automated language capabilities. The program would apply these automated capabilities via an easy-to-use interface that would assimilate, integrate and analyze real-time incident data in the local language(s). The envisioned system would provide useful response-related material as quickly as 24 hours after an incident occurs and fully automated language capabilities within days or weeks after that.
While LORELEI technologies could include partially or fully automated speech recognition and/or machine translation, the program does not primarily seek to comprehensively translate low-resource languages into English. Instead, LORELEI would provide situational awareness by identifying and correlating elements of information in foreign-language and English sources. LORELEI technology would be applicable to any incident where a sudden need emerges for assimilation of information by U.S. government entities about a region of the world where low-resource languages are frequently used.
"Our goal with LORELEI isnt rote translation based on libraries, but instead to provide idiomatic understanding of language as a whole, and specifically disaster-response vocabulary, to improve cooperation and speed response to dangerous situations worldwide," Onyshkevych said.
DARPA has awarded Phase 1 contracts for LORELEI to the following organizations:
Appen
Carnegie Mellon University
Columbia University
Johns Hopkins University
Next Century Corporation
Raytheon BBN
University of Illinois Urbana-Champaign
University of Massachusetts
University of Pennsylvania
University of Pennsylvania Linguistic Data Consortium
University of Texas El Paso
University of Washington
University Southern California Information Sciences Institute
LORELEI plans to explore three principal technical areas:
* Algorithm Research and Development Environment: LORELEI plans to target research and development of human language technology that would reduce the current reliance on huge, manually translated, transcribed or annotated bodies of knowledge. Instead, LORELEI would leverage what related and unrelated languages have in common and take advantage of a broad range of language-specific resources. The program also seeks to develop the LORELEI Technology Development Environment (LTDE), which would synthesize language data and integrate it with Web services that would provide named-entity recognition, topic spotting and other language technology capabilities.
* Run-time Framework Development: The program aims to develop a prototype tool, the LORELEI Run-Time Framework (LTRF), which would pull together various open-source data feeds in English and incident languages and send this data compilation through the LTDEs Web services. The processed results would return to the Framework, where numerous analytics tools would aggregate, summarize and organize them.
The LRTF would not produce reports or situational awareness documents automatically, but would present users with easy-to-understand summaries, visualizations and other useful products that would greatly help in the creation of such documents. The Framework would be able to generate initial results 24 hours after an incident and provide progressively more detailed results at one-week and one-month intervals.
* Linguistic Resource Creation: LORELEI plans to collect, create and annotate linguistic resources in multiple languages to support the work in the first two technical areas listed above. These resources would include standard language resources (dictionaries, etc.), subject-specific resources (disaster relief terminology, etc.) and other data-enabling research, development and evaluation.