Mapping collaborative software

 

GitHub_Stereo_Map

Description

Github is one of the world’s biggest and best-known hosting services for software development projects. The shading of the map illustrates the number of users as a proportion of each country’s Internet population. The circular charts surrounding the two hemispheres depict the total number of GitHub users (left) and commits (right) per country. The uneven geographies on GitHub can possibly shed light on the ways in which different countries are being enrolled into a global knowledge economy.

Data

The data in this map consists of all public events logged by GitHub in 2013. The data are freely available from the GitHub Archive.

We analysed over 65 million commits, made by about 1.1 million users active in 2013 (i.e., users that registered at least one “PushEvent”). Only 26% of users (accounting for over 44% of the commits) specified a location that we were able to match to an actual place. We employed a script based on the Unlock Places service to geolocate the locations in people’s profiles.

Findings

GitHub has become one of the largest web-based hosting services for software development projects, and is used by 3.5 million users worldwide. Its global distribution is strongly correlated with the number of Internet users in a country.

North America and Europe each account for about one third of the total number of GitHub users. The platform is particularly popular in Northern Europe, where Iceland and Sweden each have more than 50 GitHub users for every 100,000 Internet users in the country, as well as in Eastern Europe. The United States, New Zealand and Australia are the countries where the service is most popular outside Europe (they have about 35 GitHub users for every 100,000 Internet users).

The remaining third of GitHub users are mostly located in Asia (17% of the total). Singapore (27 GitHub users per 100,000 Internet users), and Taiwan (10 GitHub users per 100,000 Internet users) are two of the biggest per capita users. A lot of usage comes from China, but on a per-capita basis the country isn’t a heavy user (fewer than 3 GitHub users for every 100,000 Internet users).

The Middle East and North Africa and Sub-Saharan Africa together represent less than 1% of GitHub users, and just about 1% of commits. Switzerland alone counts almost as many GitHub users as the Middle East and North Africa region, and more than Sub-Saharan Africa.

Not only are North America and Europe home to a majority of users, but those users make more contributions than their counterparts in the rest of the world. Each region is home to over 38% of commits to the platform. The United States, for instance, is home to 31% of users but over 35% of commits. Similarly, the Netherlands is home to 1.7% of the users but 2.4% of the commits, and Switzerland is home to 0.9% of the users but 1.4% of the commits.

We see the opposite dynamic in the rest of the world. India, for instance, accounts for 3.6% of users, but only 1.7% of commits.

In sum, the uneven geographies of collaborative software development likely tell us a lot about where our global knowledge economy is being performed. Africa and the Middle East, in particular, have far fewer people accessing open software tools than would be expected given their numbers of Internet users. Not only is a lot of the world not accessing software made available on GitHub, but they also aren’t contributing to it: a sign that this facet of our global knowledge economy remains heavily based in some of the world’s traditional hubs of codified knowledge.

The anonymous Internet

Tor_Hexagons_Map

Description

This cartogram illustrates users of Tor: one of the largest anonymous networks on the Internet.

Data

The data are freely and openly available on the Tor Metrics Portal, which provides information about the number of users per country joining their network every day. The average number of users has been calculated over a one-year period, prior to August 2013, when malware Sefnit “took the Tor Network by storm”, starting to use Tor for its communications and thus disrupting Tor’s usage statistics.

Findings

Tor is an opensource project promoting online anonymity through free software and volunteer collaboration. The Tor network consists of more than five thousand nodes. Tor users can connect to the network and have their Internet data routed through the network before reaching any server or webpage, thus the latter are not able to distinguish between Tor users or locate them.

Tor is the most popular and well known network of its kind, and it is used world-wide by over 750,000 Internet users every day. This is about the size of a small country; half-way between the Internet populations of Luxembourg and Estonia.

Over half of Tor users are located in Europe, which is also the region with the highest penetration, as the service is used by an average of 80 per 100,000 European Internet users. Italy in particular accounts for over 76,000 users a day, which is about one fifth of the entire European Tor daily user base. Italy is second only to the United States in terms of average number of users, as over 126,000 people access the Internet through Tor every day from the United States. The service is popular throughout the whole European region, with a high penetration in Moldova, as well as in less populous states: about a hundred Internet users connect to Tor every day from each of San Marino, Monaco, Andorra, and Liechtenstein, despite their small Internet populations.

When looking at the number of Tor users as a percentage of the larger Internet population, the Middle East and North Africa has the second highest rate of usage, with an average of over 60 per 100,000 Internet users utilizing the service. Tor is particularly popular in Israel, which accounts for more Tor users than India, while having less than 4% of its Internet users. The service is also very popular in Iran, which accounts for the largest number of Tor users outside Europe and the United States, and counts 50% more users than the United Kingdom, despite having only one third of its Internet population.

The geography of Tor tells us much about potentials for anonymity on the Internet. As ever more governments seek to control and censor online activities, users face a choice to either perform their connected activities in ways that adhere to official policies, or to use anonymity to bring about a freer and more open Internet.

Uneven Geographies of OpenStreetMap

OpenStreetMap_Satellite

Description

This series of maps shows the location of edited content in the world’s largest collaborative mapping project: OpenStreetMap.

Data

The maps use OpenStreetMap data downloaded from GeoFabrik.de on December 12th, 2013. Each sub-region extract has been parsed and for each node (i.e., elements used in OpenStreetMap to represent any point feature), the coordinates, version, and last update values have been selected.

The first map was created by counting the number of nodes for each cell in a grid of 0.1 degrees of latitude per 0.1 degrees of longitude. The second map instead focuses on edits by summing the version numbers of all nodes in a cell (as this number is increased by one each time a node is modified), resulting in a count of all edits for the whole history of OpenStreetMap. The third map focuses on the age of content, and so records the latest update made to a node for each cell of the grid.

Findings

The first map offers a revealing picture of the presence of thick layers of content that annotate a few parts of the world, and a relative absence of content over much of the rest of the planet. The glowing centres of content in parts of North America, Europe, Oceania, and Japan, in many ways, parallels the visual intensity of lights in NASA’s Earth City Lights series.

The United States account for the largest total amount of content, collecting 21% of all nodes present in OpenStreetmap (OSM), followed by France, Canada, Germany and Russia, all counting more than 100 million nodes. These five countries alone collect 58% of the content, and high-income OEDC countries sum up to about 80% of OSM.

The Netherlands enjoy the highest density of content, with an average of over 1000 nodes per square kilometre, followed by Belgium with over 700 nodes per square kilometre, and Germany, the Czech Republic, Switzerland, and France, with about 400 nodes per square kilometre.

In contrast to the brightness of the Europe, the southern hemisphere is barely visible, as the amount of content available on OSM about that part of the world is far lower than in the northern hemisphere, with Africa and Latin America represented by less than 5% of the content. California alone accounts for almost as much content as the entirety of Africa.

Turkey and the western part of the Middle East are visible, but already fading into a less intense color. The emerging powers of Brazil, India, and China appear to be suffering from wide-spread content “blackout”, where only the largest urban centres are visible. Brazil accounts for fewer nodes than Switzerland, and China for even fewer. The same applies to most of the remaining parts of Africa, Asia, and Latin America. One of the oldest urbanized areas of the world, an amazing strip of lights that follows the course of the Nile, is barely visible. In fact, Egypt accounts for as many nodes as Iceland, despite being 10 times as big and accounting for 250 times the population.

Interestingly, content in parts of North Korea lights up the map: an unusual situation for a country not renowned for even appearing in most indices of online participation. This is most likely thanks to work done in 2011 by the OSM developers community. We see a similar situation in Newfoundland and Labrador: with large swaths of sparsely populated land characterised by relatively dense amounts of content. The Canadian case is likely a result of a detailed physical geography dataset that was bulk-uploaded to OpenStreetMap.

Several studies have been conducted on the quality of OSM’s coverage in these areas (e.g., see the paper by Haklay et al, 2010) where high-quality data from government agencies are also available for comparison. However, it has to be noted that these are the same countries where Open Data policies have spread, allowing lots of data to be uploaded to OSM. In fact, the visible distribution of content is not too different from the map of the GeoNames gazetteer project we published some months ago.

The second map below illustrates the number of edits made to OpenStreetMap. Unsurprisingly, the most content-dense areas are also the most heavily edited, because each new node included means one more edit made within the related area. However, statistical analysis suggests that the United States and Germany account for far more edits than would be expected given the related content in OSM, whereas content from Italy and Netherlands is far less edited than expected. In most parts of the rest of the world the number of edits is simply related to the number of objects in a given area.

EditingTheMap

The third and last map presents an illustration of the most and least recently updated areas in OSM, similarly to a map included in the recent Mapbox’s 2013 OpenStreetMap Data Report.

It is not surprising that most areas in Europe have seen at least one edit in the week before the data were collected. Similarly, it is evident how the most remote regions of the world have not been updated for years, from Siberia to the Australian Outback, from central Africa to the Amazon basin and northern Canada.

While most of the map shows a random mix of data, due to the volunteer-based nature of the projects, there are some evident areas of plain colour, which might indicate bulk uploading of new data and datasets from government agencies or companies. An examples can be found in Iraq, where most of the country has been updated between September and November 2013; in Australia, where large areas in South Australia have been recently updated, and the updates clearly follow the state borders with New South Wales and Victoria states; and in Estonia, which has also received recent edits for most of its territory.

TheAgingMap

OSM will turn 10 years old in a few months, and combining the findings obtained from these three maps, it is evident how it is a very good geographical representation of the most developed countries, and their urban environment. OSM also provide large amount of information about non-rural areas, although these are not as up-to-date and detailed as urban areas.

The quantity and the quality of the data make OSM one of the most powerful and exciting open-source projects that the Internet has facilitated in recent years, along with Linux and Wikipedia. Nonetheless, there is still a lot of work to do, and the development of the project in its second decade will probably depend on it attracting new volunteers among the new Internet users in Africa, Asia, Latin America, and the Middle East. Finally, OSM will be influenced by the relationships with those many companies which are currently based their mapping services on it, as well as the future spread of open data policies.

A global division of microwork

ODesk-A_global_division_of_microwork-final-01

Description 

This graphic illustrates the global division of microwork undertaken on the ODesk platform and reveals some of its locally divergent practices.

Data

Microwork refers to a series of relatively small tasks that are carried out by a distributed workforce over the Internet. Practices of coordinated microwork therefore allows for relatively large projects to be carried out quickly by workforces from around the world. ODesk is one of the largest job marketplaces for microworkers. This graphic uses openly available data from ODesk, describing the hourly working practices of microworkers (i.e., the number of active workers per each hour of the week) in each country across the globe.

In the first visualisation, each dot represents the average number of workers active in each country for every hour of the week. For countries that span more than one time zone, we use the local time in the capital city.

The second visualisation uses the same data, but makes two changes. First, dots are aligned according to local time, rather than Coordinated Universal Time (UTC). Second, dots are aligned according to UTC  and the size of each dot is normalized by the Internet population in each country. These changes offer a sense of how prevalent online microwork is in each country, and allows working hours between places to be directly compared.

The representations do not account for the use of daylight saving time.

Findings

The first image shows that a large portion of the world’s microwork carried out through ODesk is carried out in Asia: particular in the Philippines, Bangladesh, India, and Pakistan. At noon (local time) on an average Tuesday, there are almost 35,000 active workers on the platform, roughly one third of whom are located in India, about one quarter in the Philippines, and about one tenth in the United States. Russia and the Ukraine also each provide over five percent of the total. Despite the fact that ODesk is used in 58 countries that cover almost every time zone, 85% of the digitally mediated workers are located in the seven countries mentioned above. In other words, despite the potential for almost anyone with an Internet connection to become a microworker, we can see that microwork practices have very clustered geographies.

One interesting facet of these data is the significant different between working patterns in the Philippines and most other countries. In most countries, it is easy to distinguish the difference between day and night by the sharp drop-off in work that happens at the end of the working day. However when looking at the Philippines we only see a relatively minor change in working practices between the day and night.

In many countries we also see a stark difference between weekdays and weekends. However, the Philippines again exhibit a relatively consistent temporal pattern with fewer people than elsewhere avoiding work on weekends. By 3am (Philippines time) on an average Sunday morning, the Philippines provide almost half of the active workers in ODesk.

Some of these patterns can be traced to the large US demand for microwork. Filipino microworkers are mostly employed to complete tasks related to data entry, writing, and a variety of personal assistance work(see ODesk Philippines Country Dashboard). We see an increase in the number of active Filipino workers when it is morning in the US (9am Eastern Standard Time: which is 10pm in the Philippines). Bangladesh also exhibits a similar pattern to the Philippines. Bangladeshi microworkers are also largely employed for data entry, with the most common type of task performed in the country relating to search engine optimization (see ODesk Bangladesh Country Dashboard).This contrasts to the situation in India, where most microworkers are employed for tasks related to Web programming and design (see ODesk India Country Dashboard). In India, we see the number of active workers decline in the US morning (9am Eastern Standard Time: which is 6.30pm Indian time).

The second image, weights the number of active microworkers from each country against that country’s Internet population. This gives us further insights into some of the country-specific differences in microwork practices. For instance, we can see that not only does ODesk have a large and around-the-clock workforce in the Philippines, but that the platform is also relatively popular in that country. On an average Tuesday at noon local time, ODesk employs 0.025% of the entire Filipino Internet population. This is almost ten times the global average. By way of comparison, the platform employs only 0.001% of the US Internet population.

Online microwork also appears to be relatively popular in Armenia and Moldova (in both countries over 0.01% of the Internet population are active on an average Tuesday at lunch time), mostly employing micoworkers in the fields of Web programming and design. In South America, Uruguay and Bolivia also demonstrate relatively high rates of microwork activity; Bolivia is particularly interesting because it is the only country that exhibits a visible decline in the number of active workers in the middle of the working day.

These data offer a fascinating insight into new practices of work in our global knowledge economy. The ability to carve up large projects into small digital tasks that can be performed by a globally distributed labour force has meant that global demands for, and supply of, digital tasks can be easily matched. But it remains to be seen whether these new work practices are a useful employment opportunity for many of the two and a half billion connected people in the world, or whether they represent a new type of digital sweatshop in which the world’s poor are enrolled, as expendable and unorganized workers, into exploitative digital divisions of labour.

ODesk-Local_practices_of_microwork-final-011

A world’s panorama

Density_Photographs_Panoramio-01

Description

This map represents the location of public photographs published on Panoramio, one of the largest photo-sharing services on the Web.

Data

The map uses data collected via the Panoramio Data API in December, 2013. We used the API to retrieve the number of public photos tagged to locations in each of 259,200 bounding boxes into which we divided the world. It’s worth noting that because our boxes are sized to be a quarter of a degree of latitude tall and a quarter of a degree of longitude wide, the mapped cells are not of a consistent size globally. A cell in Edinburgh has about half the area of a cell in Nairobi. This means that locations near the equator are more likely to show up as bright concentrations of content, compared to locations with equivalent numbers of photographs in higher or lower latitudes, although the used color-scale should limit this effect.

Findings

Building on our map of content in Flickr, this graphic tells a very similar story. Panoramio is smaller than Flickr, with about a tenth of its users, and only a fraction of its photos. Nonetheless, Panoramio plays an important role in online representations of places, as photographs on the site can be accessed as a layer in Google Maps and Google Earth.

The United States is layered with more than two million public photographs published on Panoramio. It is closely followed by Russia, China, Germany and Brazil, which are each covered with more than a million photos. These five countries account for about one-third of the entire public content on the platform.

However, it is the Netherlands that is covered by the densest layer of content, with over five pictures per square kilometer. The Netherlands are followed by Switzerland, Slovakia, Germany, and Belgium, which all have an average of three pictures per square kilometer.

In contrast, Africa in particular is characterised by very thin layers of digital content (Italy alone is covered by more photos than the whole continent). No African country has more than one picture per five square kilometers; the highest being Tunisia with 0.2 photos per square kilometer. Algeria is the country with the most photographs in Africa, but tiny Western Sahara has the fewest, representing just 0.016% of the content created about the United States.

Whilst Latin America and the Caribbean tend to score poorly on many other metrics of information production, they are represented by a non-trivial amount of content, with about as many photographs as the United States. In Asia, China accounts for the largest portion of pictures, followed by Turkey (with 800,000), and then Japan and India, each with about half a million pictures. The rest of Asia combined is described by about 1.8 million pictures.

These presences and absences all ultimately influence what we see, and where we see it, when using some of the web’s most popular platforms.

Geographic coverage of Wikivoyage

Wikivoyage_Circles_final

Description

This graphic depicts the geographic focus of four major languages of the Wikivoyage project; one of the world’s most popular crowd-sourced travel guides.

Data

This graphic uses data freely available from the WikiMedia Dumps website, collected in October 2013.

To determine to location of each article, we used WikiVoyage’s internal geographic hierarchy. The page on Blackburn, for instance, is nested within the categories of Lancashire, and the United Kingdom. English and German have been included as they are the two largest sub-projects in Wikivoyage according to WikiMedia Statistics. We selected Italian and Spanish because they respectively represent good examples of geographically concentrated and dispersed languages.

Each ring represents one of the languages, and is sized in relation to the number of articles present in that language. Each section of a ring represents the number of articles in that language about a country. The visualisation excludes countries represented by fewer than three pages.

Findings

The visualisation shows us that, in all four languages, extensive coverage exists of countries in which those languages are spoken. Wikivoyage — one of the world’s most used travel guides — therefore presents us with a very selective picture of the world.

The United States accounts for a large portion of the content included in the English edition of Wikivoyage, and the comparison with the other languages is striking. The same applies to Germany in Germany, and Spain in Spanish. English-speaking countries account for about half of the pages written in English, and Spanish-speaking countries account for about half of the pages written in Spanish. However, German-speaking countries account for only about one third of German Wikivoyage. and the Italian edition dedicates an even smaller percentage of pages to Italy (just above 18%).

In other words, despite the fact that WikiVoyage is by its nature a project designed to facilitate writing about distant parts of the world that people might travel to, people aren’t actually writing that much content about places in which the language that they speak isn’t widely spoken (notable exceptions being content about Egypt in German, and about Greece in Italian, which account for more than 4% of the respective guides).

Low-income countries are particularly under-represented by the English, German, and Italian projects, with only about one third of articles in those languages dedicated to countries outside Europe, North America, Australia, and New Zealand. The Spanish Wikivoyage, in contrast, devotes almost 40% of its content to the Latin America and Caribbean region (as Spanish is widely spoken in that region, and it is possible that a significant number of editors are writing from the region). Sub-Saharan Africa, in contrast, is heavily under-represented in the Spanish WIkivoyage, comprising only 0.1% of the collection.

As ever more people use online travel guides, it will be important to understand whether these inequalities in information begin to actively shape where and how people move around the world.

Gender and Social Networks

Gender_and_SocialNetworks

Description

This graphic shows the population of some of the world’s most popular social platforms segmented by the gender of their users.

Data

The graphic uses 2013 social network statistics (not publicly available at the time of writing, as being updated) collected via the Google Display Planner tool by Information is beautiful, who have also published a visualisation (using 2012 data) of the gender balance in social networks.

Each bar represents a social network, divided in two sections: a red one on the left, representing the proportion of female-registered users in the social network, and a blue one on the right, representing the proportion of male-registered users. The width of the bars is used to convey the total number of registered accounts.

Findings

On the whole there isn’t a large disparity between men and women on the social networks represented here. The data indicate a total ratio of 1.05 males to every female.

This more-or-less equal gender balance can be seen in the two largest social platforms, Facebook and YouTube, whose gender ratios are very close to the ratio in the general population (however, both do see slightly more male than female users).

The third largest social platform, Google+, in contrast, has a higher proportion of males, with only 43% of its user-base being female. Twitter, in contrast, has a slightly higher female participation rate, with women accounting for 53% of its users.

It is interesting to note that, despite the fact that we see a slightly higher overall male participation rate, we actually see a greater number of platforms that have a higher ratio of women to men. For instance, both Flickr and Tumblr have a ratio of over 55% female users. A similar ratio can be observed in the Google-owned social network Orkut, which is popular in Brazil and India. Meetup, which is one of the largest networks to facilitate group meetings, also has a predominantly female population, with only 38% male users. Foursquare and Myspace both have fewer than 40% male users, and the movie discovery website Flixster, has more than 70% female users.

On the other end of the chart, the largest professional social network LinkedIn has only 40% of its accounts owned by female users. The social news and entertainment website Reddit has a similar balance. The population of the social discovery website Tagged is also highly skewed towards male users, as males account for over 60% of the user base. Finally, the social networking and gaming platform Hi5 registers a mere 36% female users.

On the whole (and with exceptions) we see that social, local, artistic, and parental-oriented websites tend to have a higher ratio of female users, and professional and games-oriented sites have a higher ratio of male users.

As the Internet becomes available to billions of new people in the next few years, it will be important to keep a focus on how these statistics evolve. Will we start to see more gendered silos of use or more balanced platforms of participation?