Mapping collaborative software

 

GitHub_Stereo_Map

Description

Github is one of the world’s biggest and best-known hosting services for software development projects. The shading of the map illustrates the number of users as a proportion of each country’s Internet population. The circular charts surrounding the two hemispheres depict the total number of GitHub users (left) and commits (right) per country. The uneven geographies on GitHub can possibly shed light on the ways in which different countries are being enrolled into a global knowledge economy.

Data

The data in this map consists of all public events logged by GitHub in 2013. The data are freely available from the GitHub Archive.

We analysed over 65 million commits, made by about 1.1 million users active in 2013 (i.e., users that registered at least one “PushEvent”). Only 26% of users (accounting for over 44% of the commits) specified a location that we were able to match to an actual place. We employed a script based on the Unlock Places service to geolocate the locations in people’s profiles.

Findings

GitHub has become one of the largest web-based hosting services for software development projects, and is used by 3.5 million users worldwide. Its global distribution is strongly correlated with the number of Internet users in a country.

North America and Europe each account for about one third of the total number of GitHub users. The platform is particularly popular in Northern Europe, where Iceland and Sweden each have more than 50 GitHub users for every 100,000 Internet users in the country, as well as in Eastern Europe. The United States, New Zealand and Australia are the countries where the service is most popular outside Europe (they have about 35 GitHub users for every 100,000 Internet users).

The remaining third of GitHub users are mostly located in Asia (17% of the total). Singapore (27 GitHub users per 100,000 Internet users), and Taiwan (10 GitHub users per 100,000 Internet users) are two of the biggest per capita users. A lot of usage comes from China, but on a per-capita basis the country isn’t a heavy user (fewer than 3 GitHub users for every 100,000 Internet users).

The Middle East and North Africa and Sub-Saharan Africa together represent less than 1% of GitHub users, and just about 1% of commits. Switzerland alone counts almost as many GitHub users as the Middle East and North Africa region, and more than Sub-Saharan Africa.

Not only are North America and Europe home to a majority of users, but those users make more contributions than their counterparts in the rest of the world. Each region is home to over 38% of commits to the platform. The United States, for instance, is home to 31% of users but over 35% of commits. Similarly, the Netherlands is home to 1.7% of the users but 2.4% of the commits, and Switzerland is home to 0.9% of the users but 1.4% of the commits.

We see the opposite dynamic in the rest of the world. India, for instance, accounts for 3.6% of users, but only 1.7% of commits.

In sum, the uneven geographies of collaborative software development likely tell us a lot about where our global knowledge economy is being performed. Africa and the Middle East, in particular, have far fewer people accessing open software tools than would be expected given their numbers of Internet users. Not only is a lot of the world not accessing software made available on GitHub, but they also aren’t contributing to it: a sign that this facet of our global knowledge economy remains heavily based in some of the world’s traditional hubs of codified knowledge.

The anonymous Internet

Tor_Hexagons_Map

Description

This cartogram illustrates users of Tor: one of the largest anonymous networks on the Internet.

Data

The data are freely and openly available on the Tor Metrics Portal, which provides information about the number of users per country joining their network every day. The average number of users has been calculated over a one-year period, prior to August 2013, when malware Sefnit “took the Tor Network by storm”, starting to use Tor for its communications and thus disrupting Tor’s usage statistics.

Findings

Tor is an opensource project promoting online anonymity through free software and volunteer collaboration. The Tor network consists of more than five thousand nodes. Tor users can connect to the network and have their Internet data routed through the network before reaching any server or webpage, thus the latter are not able to distinguish between Tor users or locate them.

Tor is the most popular and well known network of its kind, and it is used world-wide by over 750,000 Internet users every day. This is about the size of a small country; half-way between the Internet populations of Luxembourg and Estonia.

Over half of Tor users are located in Europe, which is also the region with the highest penetration, as the service is used by an average of 80 per 100,000 European Internet users. Italy in particular accounts for over 76,000 users a day, which is about one fifth of the entire European Tor daily user base. Italy is second only to the United States in terms of average number of users, as over 126,000 people access the Internet through Tor every day from the United States. The service is popular throughout the whole European region, with a high penetration in Moldova, as well as in less populous states: about a hundred Internet users connect to Tor every day from each of San Marino, Monaco, Andorra, and Liechtenstein, despite their small Internet populations.

When looking at the number of Tor users as a percentage of the larger Internet population, the Middle East and North Africa has the second highest rate of usage, with an average of over 60 per 100,000 Internet users utilizing the service. Tor is particularly popular in Israel, which accounts for more Tor users than India, while having less than 4% of its Internet users. The service is also very popular in Iran, which accounts for the largest number of Tor users outside Europe and the United States, and counts 50% more users than the United Kingdom, despite having only one third of its Internet population.

The geography of Tor tells us much about potentials for anonymity on the Internet. As ever more governments seek to control and censor online activities, users face a choice to either perform their connected activities in ways that adhere to official policies, or to use anonymity to bring about a freer and more open Internet.

Uneven Geographies of OpenStreetMap

OpenStreetMap_Satellite

Description

This series of maps shows the location of edited content in the world’s largest collaborative mapping project: OpenStreetMap.

Data

The maps use OpenStreetMap data downloaded from GeoFabrik.de on December 12th, 2013. Each sub-region extract has been parsed and for each node (i.e., elements used in OpenStreetMap to represent any point feature), the coordinates, version, and last update values have been selected.

The first map was created by counting the number of nodes for each cell in a grid of 0.1 degrees of latitude per 0.1 degrees of longitude. The second map instead focuses on edits by summing the version numbers of all nodes in a cell (as this number is increased by one each time a node is modified), resulting in a count of all edits for the whole history of OpenStreetMap. The third map focuses on the age of content, and so records the latest update made to a node for each cell of the grid.

Findings

The first map offers a revealing picture of the presence of thick layers of content that annotate a few parts of the world, and a relative absence of content over much of the rest of the planet. The glowing centres of content in parts of North America, Europe, Oceania, and Japan, in many ways, parallels the visual intensity of lights in NASA’s Earth City Lights series.

The United States account for the largest total amount of content, collecting 21% of all nodes present in OpenStreetmap (OSM), followed by France, Canada, Germany and Russia, all counting more than 100 million nodes. These five countries alone collect 58% of the content, and high-income OEDC countries sum up to about 80% of OSM.

The Netherlands enjoy the highest density of content, with an average of over 1000 nodes per square kilometre, followed by Belgium with over 700 nodes per square kilometre, and Germany, the Czech Republic, Switzerland, and France, with about 400 nodes per square kilometre.

In contrast to the brightness of the Europe, the southern hemisphere is barely visible, as the amount of content available on OSM about that part of the world is far lower than in the northern hemisphere, with Africa and Latin America represented by less than 5% of the content. California alone accounts for almost as much content as the entirety of Africa.

Turkey and the western part of the Middle East are visible, but already fading into a less intense color. The emerging powers of Brazil, India, and China appear to be suffering from wide-spread content “blackout”, where only the largest urban centres are visible. Brazil accounts for fewer nodes than Switzerland, and China for even fewer. The same applies to most of the remaining parts of Africa, Asia, and Latin America. One of the oldest urbanized areas of the world, an amazing strip of lights that follows the course of the Nile, is barely visible. In fact, Egypt accounts for as many nodes as Iceland, despite being 10 times as big and accounting for 250 times the population.

Interestingly, content in parts of North Korea lights up the map: an unusual situation for a country not renowned for even appearing in most indices of online participation. This is most likely thanks to work done in 2011 by the OSM developers community. We see a similar situation in Newfoundland and Labrador: with large swaths of sparsely populated land characterised by relatively dense amounts of content. The Canadian case is likely a result of a detailed physical geography dataset that was bulk-uploaded to OpenStreetMap.

Several studies have been conducted on the quality of OSM’s coverage in these areas (e.g., see the paper by Haklay et al, 2010) where high-quality data from government agencies are also available for comparison. However, it has to be noted that these are the same countries where Open Data policies have spread, allowing lots of data to be uploaded to OSM. In fact, the visible distribution of content is not too different from the map of the GeoNames gazetteer project we published some months ago.

The second map below illustrates the number of edits made to OpenStreetMap. Unsurprisingly, the most content-dense areas are also the most heavily edited, because each new node included means one more edit made within the related area. However, statistical analysis suggests that the United States and Germany account for far more edits than would be expected given the related content in OSM, whereas content from Italy and Netherlands is far less edited than expected. In most parts of the rest of the world the number of edits is simply related to the number of objects in a given area.

EditingTheMap

The third and last map presents an illustration of the most and least recently updated areas in OSM, similarly to a map included in the recent Mapbox’s 2013 OpenStreetMap Data Report.

It is not surprising that most areas in Europe have seen at least one edit in the week before the data were collected. Similarly, it is evident how the most remote regions of the world have not been updated for years, from Siberia to the Australian Outback, from central Africa to the Amazon basin and northern Canada.

While most of the map shows a random mix of data, due to the volunteer-based nature of the projects, there are some evident areas of plain colour, which might indicate bulk uploading of new data and datasets from government agencies or companies. An examples can be found in Iraq, where most of the country has been updated between September and November 2013; in Australia, where large areas in South Australia have been recently updated, and the updates clearly follow the state borders with New South Wales and Victoria states; and in Estonia, which has also received recent edits for most of its territory.

TheAgingMap

OSM will turn 10 years old in a few months, and combining the findings obtained from these three maps, it is evident how it is a very good geographical representation of the most developed countries, and their urban environment. OSM also provide large amount of information about non-rural areas, although these are not as up-to-date and detailed as urban areas.

The quantity and the quality of the data make OSM one of the most powerful and exciting open-source projects that the Internet has facilitated in recent years, along with Linux and Wikipedia. Nonetheless, there is still a lot of work to do, and the development of the project in its second decade will probably depend on it attracting new volunteers among the new Internet users in Africa, Asia, Latin America, and the Middle East. Finally, OSM will be influenced by the relationships with those many companies which are currently based their mapping services on it, as well as the future spread of open data policies.

A global division of microwork

ODesk-A_global_division_of_microwork-final-01

Description 

This graphic illustrates the global division of microwork undertaken on the ODesk platform and reveals some of its locally divergent practices.

Data

Microwork refers to a series of relatively small tasks that are carried out by a distributed workforce over the Internet. Practices of coordinated microwork therefore allows for relatively large projects to be carried out quickly by workforces from around the world. ODesk is one of the largest job marketplaces for microworkers. This graphic uses openly available data from ODesk, describing the hourly working practices of microworkers (i.e., the number of active workers per each hour of the week) in each country across the globe.

In the first visualisation, each dot represents the average number of workers active in each country for every hour of the week. For countries that span more than one time zone, we use the local time in the capital city.

The second visualisation uses the same data, but makes two changes. First, dots are aligned according to local time, rather than Coordinated Universal Time (UTC). Second, dots are aligned according to UTC  and the size of each dot is normalized by the Internet population in each country. These changes offer a sense of how prevalent online microwork is in each country, and allows working hours between places to be directly compared.

The representations do not account for the use of daylight saving time.

Findings

The first image shows that a large portion of the world’s microwork carried out through ODesk is carried out in Asia: particular in the Philippines, Bangladesh, India, and Pakistan. At noon (local time) on an average Tuesday, there are almost 35,000 active workers on the platform, roughly one third of whom are located in India, about one quarter in the Philippines, and about one tenth in the United States. Russia and the Ukraine also each provide over five percent of the total. Despite the fact that ODesk is used in 58 countries that cover almost every time zone, 85% of the digitally mediated workers are located in the seven countries mentioned above. In other words, despite the potential for almost anyone with an Internet connection to become a microworker, we can see that microwork practices have very clustered geographies.

One interesting facet of these data is the significant different between working patterns in the Philippines and most other countries. In most countries, it is easy to distinguish the difference between day and night by the sharp drop-off in work that happens at the end of the working day. However when looking at the Philippines we only see a relatively minor change in working practices between the day and night.

In many countries we also see a stark difference between weekdays and weekends. However, the Philippines again exhibit a relatively consistent temporal pattern with fewer people than elsewhere avoiding work on weekends. By 3am (Philippines time) on an average Sunday morning, the Philippines provide almost half of the active workers in ODesk.

Some of these patterns can be traced to the large US demand for microwork. Filipino microworkers are mostly employed to complete tasks related to data entry, writing, and a variety of personal assistance work(see ODesk Philippines Country Dashboard). We see an increase in the number of active Filipino workers when it is morning in the US (9am Eastern Standard Time: which is 10pm in the Philippines). Bangladesh also exhibits a similar pattern to the Philippines. Bangladeshi microworkers are also largely employed for data entry, with the most common type of task performed in the country relating to search engine optimization (see ODesk Bangladesh Country Dashboard).This contrasts to the situation in India, where most microworkers are employed for tasks related to Web programming and design (see ODesk India Country Dashboard). In India, we see the number of active workers decline in the US morning (9am Eastern Standard Time: which is 6.30pm Indian time).

The second image, weights the number of active microworkers from each country against that country’s Internet population. This gives us further insights into some of the country-specific differences in microwork practices. For instance, we can see that not only does ODesk have a large and around-the-clock workforce in the Philippines, but that the platform is also relatively popular in that country. On an average Tuesday at noon local time, ODesk employs 0.025% of the entire Filipino Internet population. This is almost ten times the global average. By way of comparison, the platform employs only 0.001% of the US Internet population.

Online microwork also appears to be relatively popular in Armenia and Moldova (in both countries over 0.01% of the Internet population are active on an average Tuesday at lunch time), mostly employing micoworkers in the fields of Web programming and design. In South America, Uruguay and Bolivia also demonstrate relatively high rates of microwork activity; Bolivia is particularly interesting because it is the only country that exhibits a visible decline in the number of active workers in the middle of the working day.

These data offer a fascinating insight into new practices of work in our global knowledge economy. The ability to carve up large projects into small digital tasks that can be performed by a globally distributed labour force has meant that global demands for, and supply of, digital tasks can be easily matched. But it remains to be seen whether these new work practices are a useful employment opportunity for many of the two and a half billion connected people in the world, or whether they represent a new type of digital sweatshop in which the world’s poor are enrolled, as expendable and unorganized workers, into exploitative digital divisions of labour.

ODesk-Local_practices_of_microwork-final-011

Internet Tube

InternetTube_v2-01_Map

Description

This schematic map shows a simplification of the world’s network of submarine fibre-optic cables.

Data

The map uses data sourced from cablemap.info. Each node has been assigned to a country, and all nodes located in the same country have been collapsed into a single node. The resulting network has been then abstracted.

For the sake of simplicity, many short links have been excluded from the visualization. For instance, it doesn’t show the intricate network of cables under the waters of the Gulf of Mexico, the South and East China Sea, the North Sea, and the Mediterranean Sea. The map instead aims to provide a global overview of the network, and a general sense of how information traverses our planet. (The findings reported below, however, are based on two analysis of the full submarine fibre-optic cable network, and not just the simplified representation shown in the illustration.)

The map also includes symbols referring to countries listed as “Enemies of the Internet” in the 2014 report of Reporters Without Borders. The centrality of the nodes within the network has been calculated using the PageRank algorithm. The rank is important as it highlights those geographical places where the network is most influenced by power (e.g., potential data surveillance) and weakness (e.g., potential service disruption).

Findings

Submarine telecommunications have come a long way since 1842, when Samuel Morse sent the first submarine telegraph transmission under the waters of New York Harbor. Today, an entire network of fibre-optic cables connects almost every corner of the world, enabling the hyper-connected world that many of us take for granted.

The United States is by far the most connected country in the world, with submarine cable landing points on both coasts that connect it to most other continents. On the other side of the Atlantic, are the second and third most central parts the global network: the United Kingdom and Senegal. The UK has been a pioneer in laying submarine cables since the second half of the nineteenth century, and still controlled almost half of the world’s submarine cables in the 1920s.

Senegal is where most of the southern Atlantic cables land, and it will be followed by Nigeria when new cables become operative this year (i.e., the WASACE cable, integrated in the “under construction” section of the “Capes” line in the illustration). Others will soon connect Latin America with Angola and South Africa as well (i.e., the BRICS Cable and SACS cables, again in the “under construction” section of the “Capes” line in the illustration). Europe dominates the immediately subsequent position in the rank. The two most central East-Asian country are China (17th), followed by India (29th), twelve positions below.

Looking at the network at a more fine-grained scale, Alexandria (Egypt) is the world’s most central node, immediately followed by Singapore and Fujairah (United Arab Emirates). The city of Fortaleza in Brazil and the town of Bude in Cornwall (United Kingdom) are the most central single points in Latin America and Europe respectively, and Accra (Ghana) dominates the Sub-Saharan African list.

The importance of being central in the submarine fibre-optic cable network is twofold. On the one hand, Internet users in central countries tend to have faster and cheaper connections to the Internet — there are no countries with low-cost Internet access that aren’t also relatively well-connected.

But we’ve also seen how certain central countries in the network have a history of engaging in surveillance of Internet traffic: as revealed by Edward Snowden and described by the Guardian and the Washington Post, for both internal and foreign surveillance. For instance, in its “Enemies of the Internet 2014”, Reporters Without Borders highlights how several British telecommunication companies “have made their infrastructure available to GCHQ, allowing it to place hundreds of wiretaps in submarine cable landing stations”. From this perspective, we also see the potential dark side of network centrality.

Geographic Knowledge in Freebase

Freebase-final-01_Map

Description

This map shows the global distribution of geo-located entities described in Freebase, a collaborative knowledge base that defines itself as “an open shared database of the world’s knowledge”.

Data

Freebase forms one of the key informational ingredients in Google’s Knowledge Graph. If you’ve ever looked at the side panel in Google’s search results page, which presents information about people, places, and events in response to a search query, then you’ve probably come into contact with data stored in Freebase.

The data that we collected from Freebase describe over 43 million entities, among which we identified 478 thousand place names. The content is stored as RDF triples, which specify a predicate in the form of subject-verb-object. The triples in the dataset have been surveyed, collecting all entities associated with a latitude-longitude coordinates pair; that is, all subjects of triples where the verb refers to the concept “has latitude” and “has longitude”.

Findings

Geographic content in Freebase is largely clustered in certain regions of the world. The United States accounts for over 45% of the overall number of place names in the collection, despite covering about 2% of the Earth, less than 7% of the land surface, and less than 5% of the world population, and about 10% of Internet users. This results in a US density of one Freebase place name for every 1500 people, and far more place names referring to Massachusetts than referring to China.

A third of all place names are geo-located in Europe. The United Kingdom is home to about 7% of place names, Poland has about 6%, and France has just over 5%. The United Kingdom accounts for one place name for every 2000 inhabitants, the same proportion as Luxembourg. Ukraine is the only European country described with less than one place name per 30,000 inhabitants, whereas Slovenia and Poland are described in exceptional detail, with about one place name for every 1000 people and one place name for every 1300 inhabitants, respectively.

This stands in contrast to countries like China that account for less than 1% of the collection (with less than 4000 place names, and a density of only one place name for every 300,000 inhabitants). Most of Africa, Asia, Latin America and the Caribbean are similarly underrepresented. Nigeria barely represents 0.1% of the place names, and Venezuela accounts for only 0.05%. Outside Europe and North America, only four countries (Australia, China, India, and Japan) are represented with more content than Antarctica (in part because the database contains descriptions of hundreds of Antarctic mountains and ranges).

The largest cluster of under-represented countries is found in Sub-Saharan Africa, where only a handful of countries are described by more than one place name for every 100,000 inhabitants. South Africa is the notable exception, as it exhibits information counts comparable to most European countries. Other exceptions are Nepal and Bhutan in Asia, which score relatively highly compared to neighbouring countries. It is also worth pointing out that Indonesia in the country with the lowest information density in the world, with only one place name per 470,000 people.

Because Freebase is a core ingredient in the informational menu presented to us by the world’s most widely used search engine, these presences and absences have the potential to have a significant impact on how we understand, interact with, and create our world. Freebase may seem like a small corner of the Web, but the imbalances that we observe in it can have large reverberations through the broader information ecosystems accessed by billions of people.

A world’s panorama

Density_Photographs_Panoramio-01

Description

This map represents the location of public photographs published on Panoramio, one of the largest photo-sharing services on the Web.

Data

The map uses data collected via the Panoramio Data API in December, 2013. We used the API to retrieve the number of public photos tagged to locations in each of 259,200 bounding boxes into which we divided the world. It’s worth noting that because our boxes are sized to be a quarter of a degree of latitude tall and a quarter of a degree of longitude wide, the mapped cells are not of a consistent size globally. A cell in Edinburgh has about half the area of a cell in Nairobi. This means that locations near the equator are more likely to show up as bright concentrations of content, compared to locations with equivalent numbers of photographs in higher or lower latitudes, although the used color-scale should limit this effect.

Findings

Building on our map of content in Flickr, this graphic tells a very similar story. Panoramio is smaller than Flickr, with about a tenth of its users, and only a fraction of its photos. Nonetheless, Panoramio plays an important role in online representations of places, as photographs on the site can be accessed as a layer in Google Maps and Google Earth.

The United States is layered with more than two million public photographs published on Panoramio. It is closely followed by Russia, China, Germany and Brazil, which are each covered with more than a million photos. These five countries account for about one-third of the entire public content on the platform.

However, it is the Netherlands that is covered by the densest layer of content, with over five pictures per square kilometer. The Netherlands are followed by Switzerland, Slovakia, Germany, and Belgium, which all have an average of three pictures per square kilometer.

In contrast, Africa in particular is characterised by very thin layers of digital content (Italy alone is covered by more photos than the whole continent). No African country has more than one picture per five square kilometers; the highest being Tunisia with 0.2 photos per square kilometer. Algeria is the country with the most photographs in Africa, but tiny Western Sahara has the fewest, representing just 0.016% of the content created about the United States.

Whilst Latin America and the Caribbean tend to score poorly on many other metrics of information production, they are represented by a non-trivial amount of content, with about as many photographs as the United States. In Asia, China accounts for the largest portion of pictures, followed by Turkey (with 800,000), and then Japan and India, each with about half a million pictures. The rest of Asia combined is described by about 1.8 million pictures.

These presences and absences all ultimately influence what we see, and where we see it, when using some of the web’s most popular platforms.