<tt id="6hsgl"><pre id="6hsgl"><pre id="6hsgl"></pre></pre></tt>
          <nav id="6hsgl"><th id="6hsgl"></th></nav>
          国产免费网站看v片元遮挡,一亚洲一区二区中文字幕,波多野结衣一区二区免费视频,天天色综网,久久综合给合久久狠狠狠,男人的天堂av一二三区,午夜福利看片在线观看,亚洲中文字幕在线无码一区二区
          Global EditionASIA 中文雙語Fran?ais
          Business
          Home / Business / Technology

          AI's global village opens wider to more voices

          Developers look to break from yoke of English language, cater to all groups of people

          By Oasis Hu in Hong Kong | China Daily | Updated: 2024-12-06 07:14
          Share
          Share - WeChat
          LU PING/CHINA DAILY

          Artificial intelligence engineer Jacky Chan Ho-kit has conflicting feelings about his industry.

          While he looks forward to a future where AI reaches its pinnacle — possessing humanlike cognitive capabilities — he is deeply concerned that it will only understand English.

          "Given the language status quo, this is highly likely to be a reality rather than just alarmism," he said.

          Chan is the chief technology officer at Votee, a Hong Kong-based AI company. He is also a language enthusiast who in his free time follows language bloggers on social media, absorbing their linguistic insights. Through his research, he has learned that many languages are disappearing.

          Even though there are around 7,000 languages still in use globally, according to the World Atlas of Languages of UNESCO, only 10 boast more than 200 million speakers. UNESCO has said that a language vanishes every two weeks, with 25 disappearing annually.

          In the online realm, the disparity in language usage rates is even more pronounced.

          Over the last decade, English content has dominated the internet, accounting for 49.4 percent as of Nov 26 — more than eight times the use of Spanish, the second most prevalent online language at 6 percent, according to a report by W3Techs, a company that conducts global web surveys.

          Conversely, the proportion of web pages that use Chinese, the second-most spoken language in the physical world with more than 1.1 billion speakers, has plummeted from 4.3 percent in 2013 to 1.2 percent in 2024.

          In the realm of AI, prominent large language models, or LLMs, like Open-AI's ChatGPT4, Google's Gemini, and Anthropic's Claude all use English as their main language.

          Mainstream AI language models, particularly those originating in the West, are made for English-speaking audiences, with translations for other languages serving as only a support function, said Cao Jiannong, chair professor in the Department of Computing at Hong Kong Polytechnic University.

          Artificial intelligence is a field devoted to developing technologies that can replicate or even surpass human intelligence. Before this vision becomes real, large-scale AI companies will continue to prioritize enhancing AI's intelligence ability, instead of expanding their services to encompass more languages, Cao added.

          Chan, CTO at Votee, agreed that the endgame of AI is humanlike intelligence, but questions the consequences if such intelligence can only speak English.

          "Wouldn't it be even more unfair to non-English speakers? Wouldn't global cultural diversity be greatly eroded? Wouldn't the gap between the world's rich and poor be wider?" Chan said.

          Since last year, Votee, which previously concentrated on automated data collection and analysis, has shifted its focus to developing AI services for lesser-used languages.

          This year, it unveiled a Cantonese LLM and is actively pursuing clients in Southeast Asia, Africa, and the Chinese mainland. Future initiatives include the launch of LLMs and other AI services for Javanese in Indonesia, Okinawan in the southern region of Japan, and various Chinese dialects including Shanghainese and Hakka.

          "In an increasingly polarized world, we aim to utilize technology to bridge this gap," Chan said.

          Data scarcity

          The cornerstone of training AI lies in data. A significant hurdle in advancing AI's linguistic prowess is the scarcity of data available in numerous languages, Chan said.

          Of about 7,000 languages spoken worldwide, nearly 99 percent are considered low-resource languages, as the data available for computational processing and analysis is limited.

          The fact that mainstream AI tools predominantly rely on English corpora, or collection of written text, leads to significant inconvenience when handling other languages, said Ting Paksun, CEO of Votee.

          These AI tools often result in inaccuracies and biased content, cultural misunderstandings, business errors, and even legal violations, rendering them unsuitable for use in both casual and formal contexts, Ting said.

          On the beneficial side, AI tools hold the potential to streamline operations, boost productivity, and have a direct impact on local economies.

          At an investment summit in mid-November in Hong Kong, Daniel Pinto, president of JPMorgan Chase, said that AI contributed approximately $1.3 billion to the group's finances last year, through cost reductions or revenue increases, with projections indicating a rise to $2 billion this year.

          Chan warned regions that are unable to leverage AI tools due to language limitations are likely to experience decreased productivity in the future.

          To avoid lagging behind European and United States tech giants, governments and major tech firms in some regions have initiated the development of LLMs customized to their linguistic needs, Cao from the Hong Kong Polytechnic University said.

          The UAE, for instance, introduced Jais, the highest-quality Arabic AI LLM, in 2023. This year, South Korea's LG Group unveiled Exaone 3, the country's inaugural open-source Korean AI model.

          Smaller, nimbler

          Many smaller companies around the world are also venturing into the creation of small language models, Cao said.

          Asiabots Ltd, a Hong Kong-based artificial intelligence company established in 2017, is one such company.

          Chris Shum Chiu-fai, co-founder and CEO of Asiabots, said that the company initially prioritized AI capabilities in Cantonese due to its Hong Kong location. However, over time an increasing number of clients have approached them for AI solutions in various languages.

          Their clients encompass government bodies and private enterprises worldwide including from Southeast Asia and Europe. Instead of opting for large language models, they prefer small language models tailored to specific scenarios, such as AI-driven customer service, AI speech recognition technology, and AI text-to-speech tools.

          Asiabots' clients include the Hong Kong Special Administrative Region government, which asked them to develop AI tools for translation services between Cantonese and Middle Eastern languages. The request followed this year's Policy Address, which called for attracting more Muslim tourists, and encouraged the city's taxi services to offer information in Arabic for visitors from the Middle East.

          In July, a tourism company in Kunigami, Okinawa, Japan, engaged Asiabots to develop an AI tool capable of translating multiple languages, including minor ones such as Vietnamese.

          "Japan is preparing to host the World Expo next year. With the anticipated increase in global tourism, many Japanese companies are seeking AI tools, leading to a surge in requests from Japan recently," Shum said.

          Specialized needs

          Many mainstream AI tools excel at translating between widely spoken languages such as English and Chinese. However, when faced with less common languages, these tools may falter in recognizing speech and converting it into text, resulting in numerous errors.

          The primary issue lies in inadequate data for the specific language, Shum said.

          In some instances, countries with limited technological infrastructure may find that their online information is predominantly available in English, rather than their native language, as seen in the Philippines and Mongolia.

          Some languages have a variety of pronunciations without standardized characters, such as Minnan, a dialect spoken in southern parts of China.

          Other languages are fragmented into numerous dialects. In Indonesia, for example, there are more than 300 dialects, which increase the complexity and diversity of the language.

          These challenges can be overcome as long as clients have the financial resources to collect the necessary data, Shum said.

          Asiabots accumulates data from extensive research and non-infringing open-source repositories, he said. Clients also provide data to the company or fund it to conduct on-site data collection.

          After collecting the data, Asiabots collaborates with local universities and recruits native language speakers to refine and localize AI solutions, aligning them with regional cultures and legal frameworks to overcome cultural barriers.

          Since its inception, Asiabots has expanded its AI's linguistic repertoire over the past seven years to 22 languages, including Indonesian, Filipino, Portuguese and Hindi, as well as less common dialects.

          After establishing language capabilities, the company tailors AI software and hardware to meet specific customer requirements.

          For instance, for the Okinawa tourist spot, Asiabots developed an AI translator capable of translating among five languages: Japanese, Chinese, English, Korean and Vietnamese. These languages can also be interchanged with any of the company's 22 language libraries when required, Shum said.

          Endangered languages

          While commercial demand ensures the survival of languages with a large offline population, those with few speakers, limited commercial interest, and insufficient technological research are at risk of becoming endangered both online and offline, Chan warned.

          UNESCO has a classification system for endangered languages. Ones spoken across all age groups and contexts are considered safe, while languages that children no longer learn as their mother tongue are considered endangered. Those spoken solely by grandparents are in extreme peril, and those lacking speakers face extinction.

          Based on this definition, even language dialects that are spoken by substantial populations, like Minnan and Hakka, which is primarily used in southern China, face a fight for survival as fewer young people are learning them.

          Shum said not preserving an endangered language could lead to a deep sense of regret.

          "There are various research directions in AI and we opted to delve into language study from the start, because behind each language lies a unique mode of thought and a profound reservoir of human wisdom," Shum said.

          For instance, the Minnan term describing tears as "falling water" reflects a beautiful perspective. Losing such ways of thinking and expression is a loss of culture, and possibly even civilization, Shum said.

          Chan said that language is a crucial vessel of intangible cultural heritage, showcasing the history, customs, habits and social relationships of a region, while forming a part of people's individual and collective identity.

          "Protecting the cultural value of a language is much more urgent than its commercial worth, yet it often receives inadequate attention," he said.

          By preserving the voice and text of a language through a language model, even if the original speakers disappear, people can access its nuances and written form and learn it whenever they want, Chan said.

          Money talks

          With hundreds of indigenous languages in Africa at risk of extinction, Votee has worked with clients on the continent to assist in language preservation efforts. However, significant challenges stem from Africa's political instability, limited technological proficiency and insufficient technology infrastructure.

          In recent years, many clients have asked Asiabots to develop language models for the preservation of endangered languages.

          However, all these projects faltered due to a lack of funding for data collection, such as sending researchers into remote mountainous regions to record voices, and process and digitize these recordings, which might cost millions of dollars.

          Francis Fong Po-kiu, honorary president of the Hong Kong Information Technology Federation, said that the governments of smaller language communities should recognize the cultural value inherent in these languages.

          Chan proposed that global tech firms, language-focused NGOs, linguists and language enthusiasts collaborate to form communities for mutual support and to encourage the contribution of open-source language data.

          When developing its Cantonese LLM, Votee collaborated with Cantonese linguists and enthusiasts to establish a Cantonese-centered community. Subsequently, it open-sourced all the data and models within the LLM.

          "Cantonese belongs to everyone, not just a select few — it already lacks resources, so why create additional boundaries?" Chan said.

          In July this year, SenseTime, an AI software company in Hong Kong, launched a Thai-language LLM.

          Lu Lewei, director of the SenseTime Research Institute, said that they paid attention to minor languages because equipping AI with multilingual capabilities is also good for its own improvement.

          More importantly, AI was designed to assist humanity, and its future should prioritize broader accessibility and use, and not neglect some groups, Lu said.

          "I believe this is the original intent, also the ultimate goal of humanity's pursuit of technological advancement," Lu said.

          Top
          BACK TO THE TOP
          English
          Copyright 1994 - . All rights reserved. The content (including but not limited to text, photo, multimedia information, etc) published in this site belongs to China Daily Information Co (CDIC). Without written authorization from CDIC, such content shall not be republished or used in any form. Note: Browsers with 1024*768 or higher resolution are suggested for this site.
          License for publishing multimedia online 0108263

          Registration Number: 130349
          FOLLOW US
          CLOSE
           
          主站蜘蛛池模板: 久久久久四虎精品免费入口| 亚洲蜜臀av乱码久久| 美女一区二区三区亚洲麻豆| 久久96热在精品国产高清| 国内精品自线在拍| 亚洲一区 日韩精品 中文字幕| 2021中文字幕亚洲精品| 亚洲精品一区二区三区色| 美女胸18下看禁止免费视频| 欧美精品videosbestsex日本| 香港日本三级亚洲三级| 好好热好好热日韩精品| 黄色特级片一区二区三区| 性欧美videofree高清精品| 国色天香中文字幕在线视频| 色综合 图片区 小说区| 国产精品免费激情视频| 亚洲嫩模喷白浆在线观看| 亚洲国产大片永久免费看| 国内精品久久黄色三级乱| 97se亚洲综合自在线| 中文字幕乱码人妻综合二区三区| 狠狠色丁香婷婷综合尤物| 亚洲国产精品综合久久网各| 中国熟女仑乱hd| 亚洲禁精品一区二区三区| 国产精品乱人伦一区二区| 99久久国产综合精品麻豆| 久久人妻精品大屁股一区| 2020国产成人精品视频| 老司机精品成人无码AV| av深夜免费在线观看| gogogo高清在线播放免费| 人妻少妇看a偷人无码| 久久精品国产88精品久久| 国产四虎永久免费观看| 欧美国产日韩亚洲中文| 92精品国产自产在线观看481页| 内射老阿姨1区2区3区4区| 国产久久热这里只有精品| japanese边做边乳喷|