live chat Live chat
insight by wfx

Welcome to Wordwide FX's new enterprise!

Insight by WFX is a synthesis of our passion for languages and the financial markets. Here you will find technical and fundamental analyses from our clients, media partners and contributors in different languages, as well as discussions on languages and translation. And of course we will keep you updated on what is happening inside Wordwide FX Financial Translations. Hope you enjoy it! Greetings from the Wordwide FX team!

quotes
23/05/2019

Language of the Week: Icelandic, Viking Speech Preserved

author-image

By Wordwide FX Financial Translations

As a lover of Germanic languages and having had the chance to learn some German and Swedish, I've always been curious about Icelandic, the language of the North Atlandic island discovered and populated in the 7th century by Norsemen from different points of Scandinavia, most notably Flóki Vilgerðarson, one of the main characters of the TV show Vikings.  

With roughly 560.000 native speakers, Icelandinc (islenska) stands out among Scandinavian languages for being the closest to Old Norse, the speech of the Vikings. True, also Danish, Faroese, Norwegian, and Swedish, derive from Old Norse, but due to geogrgaphical isolation Icelandic has retained lots of features of their ancestor's speech, to the extent that Old Norse is also known among the linguistics community as "Old Icelandic" (even though, technically, Old Icelandic should be synonym with Old West Norse). The conservation of the language means that modern Icelanders are able to read the Eddas, the Sagas, and other classic Old Norse literary works created in the Viking period between the 10th and the 13th centuries. 

It is funny to think that modern Icelandic could be mutually intelligible with a language spoken so many centuries ago (at least partially, because although the written language remains quite close, the pronunciation is not the same), but it is no longer so with the other contemporary Scandinavian languages, not even with it's closest relative, Faroese.

The main difference between Icelandic and the other Scandinavian languages is that Icelandic keeps many grammatical features of other ancient Germanic languages, most notably the inflection system. While Norwegian, Danish, and Swedish have lost their inflections, Icelandic retains four cases: nominative, accusative, dative, and genitice. Also, like Old English (or modern German), nouns have 3 grammatical genders: masculine, feminine, and neuter. 

Icelandic has also retained with old letters that used to be used to write Old English and Old Norse, but that the other Germanic languages have dropped: Þ, þ (þorn, modern English "thorn") and Ð, ð (, anglicised as "eth" or "edh"), representing the voiceless and voiced "th" sounds (as in English thin and this), respectively. 

The good health of the Icelandic language is in charge of Ari Páll Krinstinsson, head of the Ari Magnússon institute for Icelandic studies (in the photo, below). Some people fear that, due to the low demography, Icelandic will die out soon. Jón Gnarr, the comedian who became the mayor of Rejkjavik, was quoted on The WOold in Words in 2015: "I think Icelandic is not going to last. Probably in this century we will adopt English as our language. I think it's unavoidable". Ari Magnússon also has the same fears: "English is everywhere, from the moment we wake up untill we die". The language is also closely linked to the feeling of Iceland as a nation: "If we lost the Icelandic language there will be no Icelandic nation", said poet Krinstinsson, a feelilng shared by speakers of many minority languages.  

 

 

post-image
quotes
31/01/2019

The Widely-Spoken Languages We Still Cannot Translate Online

author-image

By Wordwide FX Financial Translations

Via wired.com

IN THE INTERNET age, when we face a language barrier, there are a host of internet resources to solve it: things like translation apps, dictionary websites, versions of Wikipediain other languages, and the simple "click to translate" option. But there are about 7000 languages spoken in the world today. The top 10 or so are spoken by hundred of millions of speakers; the bottom third have 1000 speakers or fewer.

But in the murky middle ground are a couple hundred languages that are spoken by speakers in millions. These midsize languages are still fairly widely spoken, but they have vastly inconsistent levels of support online. There’s Swedish, which has 9.6 million speakers, the third-largest Wikipedia with over 3 million articles, and support in Google Translate, Bing Translate, Facebook, Siri, YouTube captions, and so on. But there’s also Odia, the official language of the Odisha state in India, with 38 million speakers, which has no presence in Google Translate. And Oromo, a language spoken by some 34 million people, mostly in Ethiopia, which has just 772 articles in its Wikipedia.

Why do Greek, Czech, Hungarian, and Swedish, with their 8 to 13 million speakers, have Google Translate support and robust Wikipedia presences, while languages the same size or larger, like Bhojpuri (51 million), Fula (24 million), Sylheti (11 million), Quechua (9 million), and Kirundi (9 million) languish in technological obscurity?

Part of the reason is that Greek, Czech, Hungarian, and Swedish are among the 24 official languages of the European Union, which means that a small hoard of human translators translate many official European Parliament documents every year. Human-translated documents make a great base for what linguists call a parallel corpus — a large mass of text that's equivalent, sentence-by-sentence, in multiple languages. Machine translation engines use parallel corpora to figure out regular correspondences between languages: if "regering" or "κυβέρνηση" or "kormány" or "vláda" all frequently appear in parallel to "government," then the machine concludes these words are equivalent.

In order to be reasonably effective, machine translation requires an enormous parallel corpus for each language. Ideally, this corpus contains documents from a variety of genres: not just parliamentary proceedings but news reports, novels, film scripts, and so on. The machine can't translate informal social media posts very well if it's been trained only on formal legal documents. Translation tools are already scraping the bottom of the parallel corpus barrel: In many languages, the largest parallel translated text is the Bible, which leads to peculiar circumstances where Google translates nonsense syllables into prophecies of doom.

In addition to EU documents, Swedish, Greek, Hungarian, and Czech have a wealth of language resources, created one human at a time over centuries. They're the languages of entire nation-states, with national TV and radio recordings that can be used as the foundation for text-to-speech models. Their speakers have the kind of disposable income that makes media companies translate popular novels and subtitle foreign movies and TV shows. They're found in countries that tech companies imagine their customers might be living in or might at least visit on holiday, meaning it's worth localizing interfaces and adding them as translation options. They have regularized spelling systems and dictionaries that can be rolled into spellcheckers and predictive text models. They have highly literate speakers with internet access who can contribute to projects like Wikipedia. (Speakers who can even, in the case of Swedish, create a bot to automatically make basic Wikipedia articles for rivers, mountains, and other natural features.)

Language resources don't just appear. People have to decide to create them, and those people need to be fed and watered and educated and housed and supported, whether that's by governments or by companies or by the kind of personal wealth that lets individuals take on time-consuming intellectual hobbies. Creating parallel corpora and other language resources takes years, if it happens at all, and costtens of millions of dollars per language.

Meanwhile, we know that catastrophes periodically happen around the world: earthquakes, floods, hurricanes, cyclones, diseases, famines, fires. Some of them will happen in areas where people speak a large, well-resourced language, and organizations will rush to their aid. But the odds are goodthat some of the world's future crises will happen in areas where people speak one of these medium-size but low-resource languages. In those cases, aid organizations and governments will face an urgent language barrier.

The problem is, we don't know which language will desperately need the world's attention next. When an earthquake hit Haiti in 2010, international organizations suddenly required Haitian Creole resources. Ebola outbreaks in West Africa affected speakers of languages like Swahili, Nande, Mbuba, Krio, Mende and Themne. Asylum seekers from Central America often speak languages like Zapotec, Q’anjob’al, K'iche' and Mam. These speakers aren't the ideal customers of big tech companies. They don't have leisure time to edit Wikipedia. They may not even be literate in their mother tongue, communicating by voice memoinstead of by text message. But when a crisis hits, internet communication tools will be crucial.

Researchers at Darpa, the Defense Advanced Research Projects Agency, decided to tackle the problem by rethinking the way we translate languages. Instead of creating language-specific tools, Darpa is attempting to build language-agnostic tools that, once created, could spring into action in times of crisis and be tuned to any language with minor tweaking — even if they have just monolingual text scraped from social media rather than carefully translated parallel corpora.

They also changed their goals. It's too hard to jump right to full-blown machine translators that produce idiomatic prose, according to Dr. Boyan Onyshkevych, program manager at Darpa's Information Innovation Office. Instead, they carve out more manageable tasks, such as linking all the proper nouns in a passage with their equivalents in a more widely-spoken language. Automatically identifying entities in this way can help provide clues about the overall situation — say, which rivers are flooding, which villages are affected by an outbreak, or which people are missing.

Darpa funds researchers year-round at a couple dozen universities and companies; then, twice a year, they test them, in a "linguistic crisis simulation" event, where teams of researchers translate imaginary catastrophe reports in a surprise mystery language. For the first round, the teams have 24 hours to figure out as much useful information as possible from social media, blogs, and news reports, with the help of a few resources like a basic dictionary and an hour of time with a native speaker of the language. Then Darpa adds in more social media data and more time with a speaker, and the teams go at it again. Later, the results and data sets from such simulations are often published online so they can eventually be rolled into tools like Siri and Google Translate.

Methods like these use the resources of the internet age to solve the problems of the internet age. Smaller languages may not have extensive books or parliamentary records to train a language processor; they may not have very many professional translators. But they do have thousands or millions of speakers hanging out on social media and posting, like all of us do, about the weather and what they had for lunch. These posters are potentially sowing the seeds of their own survival, should catastrophe strike — their tweets and blog posts could get scooped up to teach the rest of the world how to help.

post-image
quotes
02/10/2018

The long war over the Ukrainian language

author-image

By Wordwide FX Financial Translations

Via The Boston Globe

Don’t call it Little Russian. Why the Ukraine’s lingua franca is a hot point.

By Britt Peterson.

AS AMERICANS have been learning in recent weeks, Russia sometimes has its own way of describing events—like when Vladimir Putin claimed on March 4, despite the presence of Russian troops on the ground, that he hadn’t invaded the Crimean region of east Ukraine. Then there’s the narrative about what is spoken in that invaded country: namely, the Ukrainian language.

A couple of obscure Russian imperial statements on Ukrainian have recently become popular on Russian nationalist blogs and Reddit pages. One comes from the 1863 Valuev Circular, a decree suspending the publication of many religious and educational texts in Ukrainian, or as the Russians called it, Little Russian: “a separate Little Russian language has never existed, does not exist and cannot exist.” The other is a quote attributed to Czar Nicholas II: “There is no Ukrainian language, just illiterate peasants speaking Little Russian.”

The claim that Ukrainian isn’t a language has been one of the drumbeats of the Russian-Ukrainian relationship for centuries. It’s true that the distinction between a language and a dialect is notoriously slippery, often more about politics than mutual intelligibility or shared vocabulary. As Yiddish linguist Max Weinreich famously quoted, “A language is a dialect with an army and a navy.” But according to linguists outside of Russia, Ukrainian and Russian are two distinct, if closely related, languages. The attacks on the status of Ukrainian, in that light, offer a window onto a side of the conflict that can be hard for outsiders to grasp: the persistent ways that Russia has taken advantage of a long and complicated cultural relationship to enforce its claim to power.

In the West, it’s generally agreed that Ukrainian and Russian are separate languages, with 38 percent of their lexicon differing. (That’s slightly more than Spanish and Italian, which differ by 33 percent.) It’s also generally agreed that the three Eastern Slavic languages—Russian, Ukrainian, and Belarusian—split off from Old East Slavic about a thousand years ago.

Some Russian linguists, however, tell the story differently: They claim that the East Slavic ancestor was in fact a form of Russian, making Russian not a sibling, but rather the mother tongue from which the other languages descended. The word for Old East Slavic in Russian is drevnerusskiy yazyk, which means “Old Russian,” whereas Ukrainians call it the more neutral davn’orus’ka mova, or language of Rus, the medieval Russian state. Russian attempts to ban Ukrainian in its imperial territories didn’t end with the Valuev Circular—in 1876, Czar Alexander II issued the Ems Ukaz, banning the public use of Ukrainian altogether.

Ukrainian scholars will remind you, meanwhile, that 17th-century Russia, having mostly missed out on the Renaissance, was still catching up to modernity. It relied on Poland and Ukraine, with their connections to Europe and European languages, to broaden its vocabulary: “The Russian language was borrowing many constructions and forms...[from] the Ukrainians, because at that time [Russia] was underdeveloped compared with the Polish-Lithuanian Commonwealth and Ukraine,” said Andriy Danylenko, a Ukrainian linguist at Pace University. Ukrainian also saw a cultural revival in the 19th century, a Romantic outpouring of literature, journalism, and folk traditions that built the fundament of a new nationalist identity. In this light, the imperial Russian decrees against Ukrainian suggest the language was seen less as a poor stepchild than as a rival. 

The ban on Ukrainian was lifted after the first Russian Revolution in 1905, and after the second Revolution, Lenin and Stalin at first oversaw a period of Ukrainization, when the language was first standardized and dictionaries were written. During this time, according to Myroslav Shkandrij’s book “Russia and Ukraine: Literature and the Discourse of Empire from Napoleonic to Postcolonial Times,” some Russians viewed Ukrainian with a patronizing admiration, as a rural, archaic proto-Russian, a folk language at a time when the folk were meant to rule. One writer even suggested that elements of it be grafted onto Russian to make Russian more “pristine.” But the perils of their condescending attitude became quickly manifest in the late 1920s and 1930s, when Stalin began to enforce Russification on the language again, rewriting dictionaries to impose Russian loan words just as he purged the Ukrainian intelligentsia. 

The Ukrainian spoken today, nearly 25 years after it was first declared the country’s official language at independence, still bears scars from centuries of linguistic and demographic oppression. Although some intellectuals would like to bring back the old Ukrainian words banned by Stalin’s linguists, it’s difficult to undo the common parlance of eight decades. Most of the country is bilingual; many who check a “Ukrainian” or “Russian” census box are still fluent, often from birth, in the other language. But the years of Soviet rule created a linguistic hierarchy, in which Russian became the language of economic and social mobility, while Ukrainian was still considered a rural language. Pressure to speak Russian produced hybrid forms, known as surzhyk, close to Ukrainian grammatically but with Russian vocabulary and endings, now spoken by many Ukrainians as a private language. “Some writers have said [speaking surzhyk] is like getting home and putting on a comfortable bathrobe and slippers,” said Michael Flier, a Ukrainian philologist at Harvard.

post-image