Internet Tube

InternetTube_v2-01_Map

Description

This schematic map shows a simplification of the world’s network of submarine fibre-optic cables.

Data

The map uses data sourced from cablemap.info. Each node has been assigned to a country, and all nodes located in the same country have been collapsed into a single node. The resulting network has been then abstracted.

For the sake of simplicity, many short links have been excluded from the visualization. For instance, it doesn’t show the intricate network of cables under the waters of the Gulf of Mexico, the South and East China Sea, the North Sea, and the Mediterranean Sea. The map instead aims to provide a global overview of the network, and a general sense of how information traverses our planet. (The findings reported below, however, are based on two analysis of the full submarine fibre-optic cable network, and not just the simplified representation shown in the illustration.)

The map also includes symbols referring to countries listed as “Enemies of the Internet” in the 2014 report of Reporters Without Borders. The centrality of the nodes within the network has been calculated using the PageRank algorithm. The rank is important as it highlights those geographical places where the network is most influenced by power (e.g., potential data surveillance) and weakness (e.g., potential service disruption).

Findings

Submarine telecommunications have come a long way since 1842, when Samuel Morse sent the first submarine telegraph transmission under the waters of New York Harbor. Today, an entire network of fibre-optic cables connects almost every corner of the world, enabling the hyper-connected world that many of us take for granted.

The United States is by far the most connected country in the world, with submarine cable landing points on both coasts that connect it to most other continents. On the other side of the Atlantic, are the second and third most central parts the global network: the United Kingdom and Senegal. The UK has been a pioneer in laying submarine cables since the second half of the nineteenth century, and still controlled almost half of the world’s submarine cables in the 1920s.

Senegal is where most of the southern Atlantic cables land, and it will be followed by Nigeria when new cables become operative this year (i.e., the WASACE cable, integrated in the “under construction” section of the “Capes” line in the illustration). Others will soon connect Latin America with Angola and South Africa as well (i.e., the BRICS Cable and SACS cables, again in the “under construction” section of the “Capes” line in the illustration). Europe dominates the immediately subsequent position in the rank. The two most central East-Asian country are China (17th), followed by India (29th), twelve positions below.

Looking at the network at a more fine-grained scale, Alexandria (Egypt) is the world’s most central node, immediately followed by Singapore and Fujairah (United Arab Emirates). The city of Fortaleza in Brazil and the town of Bude in Cornwall (United Kingdom) are the most central single points in Latin America and Europe respectively, and Accra (Ghana) dominates the Sub-Saharan African list.

The importance of being central in the submarine fibre-optic cable network is twofold. On the one hand, Internet users in central countries tend to have faster and cheaper connections to the Internet — there are no countries with low-cost Internet access that aren’t also relatively well-connected.

But we’ve also seen how certain central countries in the network have a history of engaging in surveillance of Internet traffic: as revealed by Edward Snowden and described by the Guardian and the Washington Post, for both internal and foreign surveillance. For instance, in its “Enemies of the Internet 2014”, Reporters Without Borders highlights how several British telecommunication companies “have made their infrastructure available to GCHQ, allowing it to place hundreds of wiretaps in submarine cable landing stations”. From this perspective, we also see the potential dark side of network centrality.

Geographic Knowledge in Freebase

Freebase-final-01_Map

Description

This map shows the global distribution of geo-located entities described in Freebase, a collaborative knowledge base that defines itself as “an open shared database of the world’s knowledge”.

Data

Freebase forms one of the key informational ingredients in Google’s Knowledge Graph. If you’ve ever looked at the side panel in Google’s search results page, which presents information about people, places, and events in response to a search query, then you’ve probably come into contact with data stored in Freebase.

The data that we collected from Freebase describe over 43 million entities, among which we identified 478 thousand place names. The content is stored as RDF triples, which specify a predicate in the form of subject-verb-object. The triples in the dataset have been surveyed, collecting all entities associated with a latitude-longitude coordinates pair; that is, all subjects of triples where the verb refers to the concept “has latitude” and “has longitude”.

Findings

Geographic content in Freebase is largely clustered in certain regions of the world. The United States accounts for over 45% of the overall number of place names in the collection, despite covering about 2% of the Earth, less than 7% of the land surface, and less than 5% of the world population, and about 10% of Internet users. This results in a US density of one Freebase place name for every 1500 people, and far more place names referring to Massachusetts than referring to China.

A third of all place names are geo-located in Europe. The United Kingdom is home to about 7% of place names, Poland has about 6%, and France has just over 5%. The United Kingdom accounts for one place name for every 2000 inhabitants, the same proportion as Luxembourg. Ukraine is the only European country described with less than one place name per 30,000 inhabitants, whereas Slovenia and Poland are described in exceptional detail, with about one place name for every 1000 people and one place name for every 1300 inhabitants, respectively.

This stands in contrast to countries like China that account for less than 1% of the collection (with less than 4000 place names, and a density of only one place name for every 300,000 inhabitants). Most of Africa, Asia, Latin America and the Caribbean are similarly underrepresented. Nigeria barely represents 0.1% of the place names, and Venezuela accounts for only 0.05%. Outside Europe and North America, only four countries (Australia, China, India, and Japan) are represented with more content than Antarctica (in part because the database contains descriptions of hundreds of Antarctic mountains and ranges).

The largest cluster of under-represented countries is found in Sub-Saharan Africa, where only a handful of countries are described by more than one place name for every 100,000 inhabitants. South Africa is the notable exception, as it exhibits information counts comparable to most European countries. Other exceptions are Nepal and Bhutan in Asia, which score relatively highly compared to neighbouring countries. It is also worth pointing out that Indonesia in the country with the lowest information density in the world, with only one place name per 470,000 people.

Because Freebase is a core ingredient in the informational menu presented to us by the world’s most widely used search engine, these presences and absences have the potential to have a significant impact on how we understand, interact with, and create our world. Freebase may seem like a small corner of the Web, but the imbalances that we observe in it can have large reverberations through the broader information ecosystems accessed by billions of people.

A world’s panorama

Density_Photographs_Panoramio-01

Description

This map represents the location of public photographs published on Panoramio, one of the largest photo-sharing services on the Web.

Data

The map uses data collected via the Panoramio Data API in December, 2013. We used the API to retrieve the number of public photos tagged to locations in each of 259,200 bounding boxes into which we divided the world. It’s worth noting that because our boxes are sized to be a quarter of a degree of latitude tall and a quarter of a degree of longitude wide, the mapped cells are not of a consistent size globally. A cell in Edinburgh has about half the area of a cell in Nairobi. This means that locations near the equator are more likely to show up as bright concentrations of content, compared to locations with equivalent numbers of photographs in higher or lower latitudes, although the used color-scale should limit this effect.

Findings

Building on our map of content in Flickr, this graphic tells a very similar story. Panoramio is smaller than Flickr, with about a tenth of its users, and only a fraction of its photos. Nonetheless, Panoramio plays an important role in online representations of places, as photographs on the site can be accessed as a layer in Google Maps and Google Earth.

The United States is layered with more than two million public photographs published on Panoramio. It is closely followed by Russia, China, Germany and Brazil, which are each covered with more than a million photos. These five countries account for about one-third of the entire public content on the platform.

However, it is the Netherlands that is covered by the densest layer of content, with over five pictures per square kilometer. The Netherlands are followed by Switzerland, Slovakia, Germany, and Belgium, which all have an average of three pictures per square kilometer.

In contrast, Africa in particular is characterised by very thin layers of digital content (Italy alone is covered by more photos than the whole continent). No African country has more than one picture per five square kilometers; the highest being Tunisia with 0.2 photos per square kilometer. Algeria is the country with the most photographs in Africa, but tiny Western Sahara has the fewest, representing just 0.016% of the content created about the United States.

Whilst Latin America and the Caribbean tend to score poorly on many other metrics of information production, they are represented by a non-trivial amount of content, with about as many photographs as the United States. In Asia, China accounts for the largest portion of pictures, followed by Turkey (with 800,000), and then Japan and India, each with about half a million pictures. The rest of Asia combined is described by about 1.8 million pictures.

These presences and absences all ultimately influence what we see, and where we see it, when using some of the web’s most popular platforms.

Geographic coverage of Wikivoyage

Wikivoyage_Circles_final

Description

This graphic depicts the geographic focus of four major languages of the Wikivoyage project; one of the world’s most popular crowd-sourced travel guides.

Data

This graphic uses data freely available from the WikiMedia Dumps website, collected in October 2013.

To determine to location of each article, we used WikiVoyage’s internal geographic hierarchy. The page on Blackburn, for instance, is nested within the categories of Lancashire, and the United Kingdom. English and German have been included as they are the two largest sub-projects in Wikivoyage according to WikiMedia Statistics. We selected Italian and Spanish because they respectively represent good examples of geographically concentrated and dispersed languages.

Each ring represents one of the languages, and is sized in relation to the number of articles present in that language. Each section of a ring represents the number of articles in that language about a country. The visualisation excludes countries represented by fewer than three pages.

Findings

The visualisation shows us that, in all four languages, extensive coverage exists of countries in which those languages are spoken. Wikivoyage — one of the world’s most used travel guides — therefore presents us with a very selective picture of the world.

The United States accounts for a large portion of the content included in the English edition of Wikivoyage, and the comparison with the other languages is striking. The same applies to Germany in Germany, and Spain in Spanish. English-speaking countries account for about half of the pages written in English, and Spanish-speaking countries account for about half of the pages written in Spanish. However, German-speaking countries account for only about one third of German Wikivoyage. and the Italian edition dedicates an even smaller percentage of pages to Italy (just above 18%).

In other words, despite the fact that WikiVoyage is by its nature a project designed to facilitate writing about distant parts of the world that people might travel to, people aren’t actually writing that much content about places in which the language that they speak isn’t widely spoken (notable exceptions being content about Egypt in German, and about Greece in Italian, which account for more than 4% of the respective guides).

Low-income countries are particularly under-represented by the English, German, and Italian projects, with only about one third of articles in those languages dedicated to countries outside Europe, North America, Australia, and New Zealand. The Spanish Wikivoyage, in contrast, devotes almost 40% of its content to the Latin America and Caribbean region (as Spanish is widely spoken in that region, and it is possible that a significant number of editors are writing from the region). Sub-Saharan Africa, in contrast, is heavily under-represented in the Spanish WIkivoyage, comprising only 0.1% of the collection.

As ever more people use online travel guides, it will be important to understand whether these inequalities in information begin to actively shape where and how people move around the world.

Broadband affordability

Broadband_Affordability

Description

This map presents an overview of broadband affordability, as the relationship between average yearly income per capita and the cost of a broadband subscription.

Data

The maps use the “Fixed (wired)-broadband monthly subscription charge, in USD” indicator published by the International Telecommunication Union (ITU) in the 17th edition of the World Telecommunication/ICT Indicators Database. We map 2011 data, being the most recent information available for this indicator in the dataset.

The ITU defines the indicator as the following: “Fixed (wired)-broadband monthly subscription charge refers to the monthly subscription charge for fixed (wired)-broadband Internet service. Fixed (wired) broadband is considered to be any dedicated connection to the Internet at downstream speeds equal to, or greater than, 256 kbit/s. If several offers are available, preference should be given to the 256 kbit/s connection.”

The data also refer to the monthly cost of the cheapest entry-level subscription in any place. These values have been multiplied by twelve to obtain a yearly cost, and weighted over the gross national income per capita (Atlas method, current USD) data available from the World Bank, referring to the same year (2011).

The graph on the lower-left corner illustrates the evolution of the cost of broadband over time.

Findings

This visualization speaks to one of the core themes of the global digital divide: the relative cost of being connected to the Internet. The geographies of the phenomenon could hardly be more clear, and its consequences are illustrated in many other visualizations published on our website, from the cartogram of the Internet population to the graphic depicting the geographic distribution of the top-level domain names.

We see that the price of a broadband connection in most parts of Africa is out of reach for people on average incomes. Said differently, Africans need to pay ten times as much of their salary (if looking at the ratio of income to connectivity costs) for broadband as people in the rest of the world.

A monthly broadband subscription costs about 60 USD both in Australia and Mozambique. However, while the average yearly gross income in Australia is around 50,000 USD, the same figure in Mozambique is less than 500 USD. This means that while an average worker in Australia could pay for a year’s worth of connectivity with one week’s salary, a Mozambican worker would need over one and a half year’s salary.

This situation does not mean that costs in Africa aren’t dropping. The average cost of an African Internet connection is now half of what it was four years ago, thanks to a series of cables laid around the African continent in 2009. Kenya and Nigeria, for instance, have 2011 broadband costs that are respectively 21% and 8% of what they were in 2008. These changes have undoubtedly contributed to the significant growth in the number of Internet users seen by both countries. The most striking drop in broadband cost has been observed in Burkina Faso, which has gone from over 1,700 USD a month to a most reasonable 55 USD (which still, however, represents 100% of the salary of an average worker).

Eritrea is the country where the Internet is least affordable. A yearly subscription there is the equivalent of almost fifty year’s worth of an average salary: an entire life of work! Over 18 countries still face costs of Internet subscriptions higher than the average income, including 14 Sub-Saharan African countries, the landlocked countries of Afghanistan and Tajikistan, and the islands of Kiribati and Solomon Islands. A broadband Internet connection cost over 500 USD a month in the Central African Republic, Guinea, Malawi, and Swaziland, as well as in Cuba, where, according to the ITU, 1,700 USD was still not enough to buy a subscription.

India and Sri Lanka have the cheapest broadband access prices, where access can be obtained for as little as 6 USD a month. Europe and North America have higher absolute costs, ranging between about 10 and 40 USD a month, but have some of the lowest relative costs in the world: with a couple of hours of work a month being sufficient for an average worker to afford the cost of connectivity.

The data mapped here are some of the world’s most important indicators. Without the ability and means to connect, the opportunities, the information, and the communication mediated and afforded by the Internet all remain impossible.

Geographic intersections of languages in Wikipedia

Wikipedia_geotagged_articles_final-update (1)

Description

This graph illustrate the percentage of geo-referenced articles in the twenty editions of Wikipedia containing the larges number of geo-referenced articles.

Data

The Terra Incognita project by Tracemedia investigates how Wikipedia has evolved over the last decade, mapping geographic articles, and date of creation, for over 50 languages. The maps highlight geolinguistic biases, unexpected areas of focus, and overlaps between the spatial coverage of different languages.

The project was developed using geo-coded Wikipedia articles from the Wikimedia Toolsever Ghel project (Geohack External Links), and article metrics that were collated using Toolserver scripts. The Ghel data dumps date to July 2013.

Only articles with primary coordinates are used, that is “where the location should be considered the primary object(s) in the page […]. Generally this should be one per article, but may be more with current corner cases with source and outlet of lakes and rivers” (Ghel project).

As illustrated in the featured graphic above (see table, bar chart by the Terra Incognita project), the percentages of geocoded articles in Wikipedia editions vary largely, from a minimum of 2% (Hindi Wikipedia) to a maximum of 46% (Polish Wikipedia), with the exception of the constructed language Volapük, whose Wikipedia edition includes a 79% of geocoded articles. Most large editions in Germanic and Italic languages contain between 12% (Italian Wikipedia) and 20% (English Wikipedia) of geo-coded articles.

Findings

The primary goal of the illustrations presented in this piece is to visualise how Wikipedia has very divergent geographic coverage in different languages. The tool also allows us to look at the date at which every one of the 4.5 million geocoded articles in Wikipedia was created: thus enabling us to see how the focus of different linguistic communities has evolved.

Most geo-coded Wikipedia articles are located in the countries where the language is listed as an official one.

One of the most interesting patterns that we can see in the data is that over 70% of articles written in that languages are spoken predominantly in a single country (e.g. Czech or Italian) only exist in that language. This means, for instance, that there might be articles about thousands of Czech villages written in Czech, but not English, French, German, or even Japanese.

Furthermore, Terra Incognita studies how two or more languages intersect with each other, when two distinct Wikipedia editions refer to the same location, in which is the proportion of such articles in the collections. These linking points can be visualized by means of language intersection maps, which highlight location referred to be more than one language.

Some of the most interesting linguistic comparisons can be seen when comparing the geography of different languages in multilingual parts of the world, such as Spain. We can see a high density of articles in Galicia, the Basque Country, Catalonia and to a lesser extent Valencia in their respective languages. Spanish (Castilian) is more evenly represented across the whole country.

spain_languages

A similar approach can be taken to explore the distribution of Wikipedia articles in some of the main languages spoken in South Asia. The map below, for instance, includes Bishnupriya Manipuri, Hindi, Nepal Bhasa and Tamil.

india_tamil_bish_hindi_nepalbhasa

Regional variations are not as strongly pronounced as they were in the Spanish case (Tamil, which is concentrated in South India and Sri Lanka is a notable exception). The overlap of the languages with each other is consistently between 12% and 16% with the exception of Bishnupriya Manipuri and Nepal Bhasa where the majority (65.1%) of articles are shared. These shared articles are distributed across India, and the distinct articles are in the native Nepal and Bangladesh.

One further case study is presented below, illustrating the interaction between Romanian, Bulgarian and Serbian Wikipedia subprojects.

romanian_bulgarian21

Romanian and Bulgarian Wikipedia articles are largely concentrated within the political boundaries of their respective countries. There is very limited overlapping of geographic content except in major cities.

bulgarian_serbian

Bulgaria and Serbia also share a border and are both Slavic languages (in contrast to Romanian, which is a Romance language). There is a much higher percentage of language intersections for articles between Bulgarian and Serbian than between Bulgarian and Romanian. For instance, a large number of intersected articles appear in Macedonia, which shares a border with Serbia and Bulgaria.

These maps, and the associated Terra Incognita tool, offer us an insight into not just patterns in Wikipedia, but also the geographic spheres of interest to different linguistic communities. As we work to better understand online geographies of knowledge, these maps allow us to ask important questions about who is representing and being represented by who.

Credits

The project was created by Gavin Baily and Sarah Bagshaw at TraceMedia, and was supported by funding from the Arts Council of England Grants for Arts and the National Lottery.

Gender and Social Networks

Gender_and_SocialNetworks

Description

This graphic shows the population of some of the world’s most popular social platforms segmented by the gender of their users.

Data

The graphic uses 2013 social network statistics (not publicly available at the time of writing, as being updated) collected via the Google Display Planner tool by Information is beautiful, who have also published a visualisation (using 2012 data) of the gender balance in social networks.

Each bar represents a social network, divided in two sections: a red one on the left, representing the proportion of female-registered users in the social network, and a blue one on the right, representing the proportion of male-registered users. The width of the bars is used to convey the total number of registered accounts.

Findings

On the whole there isn’t a large disparity between men and women on the social networks represented here. The data indicate a total ratio of 1.05 males to every female.

This more-or-less equal gender balance can be seen in the two largest social platforms, Facebook and YouTube, whose gender ratios are very close to the ratio in the general population (however, both do see slightly more male than female users).

The third largest social platform, Google+, in contrast, has a higher proportion of males, with only 43% of its user-base being female. Twitter, in contrast, has a slightly higher female participation rate, with women accounting for 53% of its users.

It is interesting to note that, despite the fact that we see a slightly higher overall male participation rate, we actually see a greater number of platforms that have a higher ratio of women to men. For instance, both Flickr and Tumblr have a ratio of over 55% female users. A similar ratio can be observed in the Google-owned social network Orkut, which is popular in Brazil and India. Meetup, which is one of the largest networks to facilitate group meetings, also has a predominantly female population, with only 38% male users. Foursquare and Myspace both have fewer than 40% male users, and the movie discovery website Flixster, has more than 70% female users.

On the other end of the chart, the largest professional social network LinkedIn has only 40% of its accounts owned by female users. The social news and entertainment website Reddit has a similar balance. The population of the social discovery website Tagged is also highly skewed towards male users, as males account for over 60% of the user base. Finally, the social networking and gaming platform Hi5 registers a mere 36% female users.

On the whole (and with exceptions) we see that social, local, artistic, and parental-oriented websites tend to have a higher ratio of female users, and professional and games-oriented sites have a higher ratio of male users.

As the Internet becomes available to billions of new people in the next few years, it will be important to keep a focus on how these statistics evolve. Will we start to see more gendered silos of use or more balanced platforms of participation?