Swiss Twitter Journey

An EPFL Applied Data Analysis Project

Language Analyses

In this section you will find some insights about language usage in Switzerland

In which region which language is in use?

The four national languages of Switzerland are German, French, Italian and Romansh. In the graph below we are seeing 4 different map. In each of them one language distribution is in use.

In 2015, 63.0% native speakers of German (of which 59.5% speak Swiss German and 10.4% Standard German at home); 22.7% French (mostly Swiss French, but including some Arpitan dialects); 8.4% Italian (mostly Swiss Italian, but including Lombard dialects); and 0.6% Romansh. [Wikipedia]

Language
Mother Lang

Main Lang
German 4,639k 63.66% 4,348k 64.47%
French 1,484k 20.37% 1,525k 22.61%
Italian 470,961 6.46% 561,857 8.33%
Romansch 35,072 0.48% 35,753 0.53%
Serbo-Croatian 103,350 1.42% 168,422 2.50%
Albanian 94,937 1.3% 176,293 2.61%
Portuguese 89,527 1.23% 238,327 3.53%
Spanish 76,750 1.05% 150,782 2.24%
English 73,422 1.01% 316,830 4.70%
Turkish 44,523 0.61% 76,379 1.13%

The German region (Deutschschweiz) is roughly in the east, north and center; the French part (la Romandie) in the west and the Italian area (Svizzera italiana) in the south. There remains a small Romansh-speaking native population in Graubünden in the east. The cantons of Fribourg, Bern and Valais are officially bilingual; the canton of Graubünden is officially trilingual.

You will see 4 different map blow. Each one represent one language distribution. In this visualizaion we have used actual tweet counts. So these graphs will be affected by population. Population * language ratio will be in count.

You can estimate French, German and Italian cities from these maps. Hoverer as you can see you cannot clearly be sure about regions. Because the population of these regions is very effective on these graphs. You can clearly see Geneva, Zurich and Ticino but how about others. In the sections blow we will investigate language usage by its ratios.


By this way, we are trying to answer these question:

  • Depending on location, how ratios of languages of tweets change?
  • How do these change in different regions of Switzerland depending on official language (German, French, Italian, Romansh) of the region? (Use dataset about demographics of Switzerland)

You can clearly see the German and French spoken regions from the visualizations. However Italian spoken regions are not separable as German nor French, probably they do not prefer to share tweets their own language. For example, in the region of Ticino the usage of Italian language is not more then 20% of overall usage.


Most used foreign languages are English, Spanish, Sweden etc. There are many important problems on these numbres. First, Spranish usage is way more then Italian. Is it real or a problem of the data? On the data provided by Wikipedia, we can clearly see that usage of Spanish is not more then one quarter of Italian usage. Or for example usage of Swedish language is not on the top 5 in the data provided by Wikipedia but in the data it is more then all other minor languages. It may be caused because of the errors on the language detection algorithm.

By this way, we are trying to answer these question:

  • Are there a significant number of tweets sent in languages which are not official languages in Switzerland? If there are, which languages are well represented and in which locations these tweets are sent?

Timestamp Analysis
Emotions And Peaks