Gender and Social Networks

Gender_and_SocialNetworks

Description

This graphic shows the population of some of the world’s most popular social platforms segmented by the gender of their users.

Data

The graphic uses 2013 social network statistics (not publicly available at the time of writing, as being updated) collected via the Google Display Planner tool by Information is beautiful, who have also published a visualisation (using 2012 data) of the gender balance in social networks.

Each bar represents a social network, divided in two sections: a red one on the left, representing the proportion of female-registered users in the social network, and a blue one on the right, representing the proportion of male-registered users. The width of the bars is used to convey the total number of registered accounts.

Findings

On the whole there isn’t a large disparity between men and women on the social networks represented here. The data indicate a total ratio of 1.05 males to every female.

This more-or-less equal gender balance can be seen in the two largest social platforms, Facebook and YouTube, whose gender ratios are very close to the ratio in the general population (however, both do see slightly more male than female users).

The third largest social platform, Google+, in contrast, has a higher proportion of males, with only 43% of its user-base being female. Twitter, in contrast, has a slightly higher female participation rate, with women accounting for 53% of its users.

It is interesting to note that, despite the fact that we see a slightly higher overall male participation rate, we actually see a greater number of platforms that have a higher ratio of women to men. For instance, both Flickr and Tumblr have a ratio of over 55% female users. A similar ratio can be observed in the Google-owned social network Orkut, which is popular in Brazil and India. Meetup, which is one of the largest networks to facilitate group meetings, also has a predominantly female population, with only 38% male users. Foursquare and Myspace both have fewer than 40% male users, and the movie discovery website Flixster, has more than 70% female users.

On the other end of the chart, the largest professional social network LinkedIn has only 40% of its accounts owned by female users. The social news and entertainment website Reddit has a similar balance. The population of the social discovery website Tagged is also highly skewed towards male users, as males account for over 60% of the user base. Finally, the social networking and gaming platform Hi5 registers a mere 36% female users.

On the whole (and with exceptions) we see that social, local, artistic, and parental-oriented websites tend to have a higher ratio of female users, and professional and games-oriented sites have a higher ratio of male users.

As the Internet becomes available to billions of new people in the next few years, it will be important to keep a focus on how these statistics evolve. Will we start to see more gendered silos of use or more balanced platforms of participation?

World-wide news web

GDELT_Worldwide_News_Web-top

Description

This map depicts mentions of multiple places in news articles between 1979 and 2013. Brighter lines indicate more connections between places.

Data

The map uses data from the  Global Database of Events, Language, and Tone (GDELT), which is an initiative aiming to provide a “realtime social sciences earth observatory”, by creating a freely available catalog of events derived from news stories. The database is compiled from stories in media outlets from almost every country in the world. Any story can contain more than one event, and events are automatically parsed out of news stories using a text analysis program called Tabari and encoded using a schema called Cameo.

A large portion of these events (140 million out of 250 million listed events) contains both a location of where the event happened and locations of the two primary actors involved. The Tabari algorithm associates events that it has already picked out of an article with geographic locations mentioned in the same text (by looking at verb usage in surrounding sentences). You can read the introductory paper on GDELT (Leetaru and Schrodt, 2013) for more on the specific geocoding methods employed.

We exclude all events where the two actors are geo-coded as being located in the same place (about 91 million events, or 36 percent of the full dataset), and location pairs referred to by fewer than 10 events (about 7 million events). This left us with about 43 million events (17 percent) and 216,000 connections between location pairs to visualize in the map.

The first map illustrates all the connections between pairs of locations. The brightness of each line reflects the number of events connecting the two places. The second graphic focus on international events, grouping the connections by country. Colour is used to map the world’s regions and the connections between them, with colour assigned to the ‘edges’ (i.e., connections) based on the colours of the two connected nodes. The thickness of the lines represents the number of events.

Note: in the second graphic below, “Countries, Dependencies, Areas of Special Sovereignty, and Their Principal Administrative Divisions” are labeled according to their classification in the GDELT database, using the FIPS 10-4 codes.

Findings

The map restates the United States’ position as a core geographical focal point of the collection. There are seven location pairs that are characterized by over 100,000 events happening between them. Every one of these seven pairs has one location outside of the United States and one inside the country. The brightest lines connect the United States (and Washington in particular), with Russia (twice), Iran, Iraq, Israel (twice), and China.

It is important to be aware of the scale at which this map should be interpreted. Many of the hotspots on the map are capital cities such as Washington or Moscow, but many locations also appear to be in relatively unpopulated places, such as the American Midwest or the middle of the Australian Great Victoria Desert. This occurs because many actors in the dataset are simply geocoded to a country rather than to a particular city or town. In those cases, the dataset locates them at the geometric centroid of countries. As such, this map is most useful to illustrate broad patterns of connections between regions and countries, rather than micro-connections between specific cities.

Russia, Iran, Iraq, Israel, and China are the countries most connected in general to the United States, along with Afghanistan, each one accounting for more than 500,000 events connecting a location in the United States to a location in one of those countries. The ‘special relationship’ between United Kingdom and United States accounts for over 450,000 events connecting two places on either side of the Atlantic.

The United States aside, the single most active connection between two cities is between Seoul and Pyongyang, with more than 98,000 events recorded in the database. At the country level, North and South Korea are connected by almost 250,000 events. The two most connected countries (excluding the United States) are Afghanistan and Pakistan, accounting for over 425,000 events, almost double the number of events connecting Pakistan and India (about 238,000 events).

The most active relationship in the Middle East and North Africa region involves Egypt and Israel, counting over 385,000 events connecting places in the two countries, followed by the relations between Israel and the West Bank (335,000 events), and between Israel and Lebanon (over 330,000 events). There are about the same number of events connecting Iran and Iraq as the number of events connecting the United States and Canada (about 315,000 events), and almost as many events connecting China and Japan as events connecting the United States and Mexico (about 270,000 events).

Aggregating data by country, we see that most of the events involving two distinct locations are international events, as only about 5 million events refer to two locations in the same country, whereas about 38 million events refer to locations in two different countries. The second graphic focuses on international events only.

Beyond the connections mentioned above, the second graphic highlights several inter-continental connections. Russia and the United Kingdom are among the most visible European countries, followed by Germany and France. Each one of these four European countries has strong connections with Asia, especially with China, Afghanistan, and Pakistan. A tight cluster is also visible in Asia, centered in China, and involving Hong Kong, Taiwan, South Korea, and North Korea.

Russia, the United Kingdom, Germany and France also have very visible connections with countries in the Middle East, in particular with Syria, Israel, Iran, and Iraq. The bright orange lines originating from Turkey also point to that country’s connections with a handful of Middle Eastern countries.

Sub-Saharan Africa is visibly the most disconnected of the seven regions. There are a few lines connecting Sub-Saharan African countries to the United States and the United Kingdom, and a few that link Sudan with its neighbour Egypt. Otherwise, we see very few connections. A similar pattern is evident in Latin America and the Caribbean, although the connections to the United States are stronger, especially those involving Mexico and Cuba.

The media inevitably present us with particular biases and objects of attention. This work is designed to show you both the locations and connections present in hundreds of millions of news stories from around the world.

GDELT_Worldwide_News_Web-bottom2-01

Mapping the Times Higher Education’s top-400 universities

MappingTimesHigherEducationstop-400universities_final1

Description

This map depicts the locations of the world’s top 400 universities as ranked by the Times Higher Education. It also illustrates the relative wealth of the country that hosts each university.

Data

The map uses data from the World University Rankings 2013-2014, published by the Times Higher Education, in collaboration with Thomson Reuters. Thirteen indicators that measure teaching, research, knowledge transfer and international outlook are taken into account in order to evaluate universities.

Each university is represented as a square, and shaded according to the World Bank income group that its country belongs to. The four World Bank income groups are high-income (GNI per capita of >$12,616), upper-middle income ($4,086 – $12,615), lower-middle income ($1,036 – $4,085), and low-income (<$1,036). We exclude the low-income category from this map because not one of the 400 universities is located in a low-income country.

The universities are grouped by world region, and the equator is depicted as a red line towards the bottom of the map.

Some universities are further grouped into metropolitan region clusters. The clusters have been identified using the DBSCAN density-based clustering algorithm, applying a 50 km distance threshold, and a minimum cardinality of four universities. Because of the compact nature of many European cities, we further refined some clusters manually in order to achieve meaningful definitions of metropolitan regions.

Findings

The primary finding is that most of the world’s top-ranked universities are located in the world’s wealthiest countries (a point also made by Benjamin Hennig and his cartograms of the Times Higher Education rankings). The Greater London cluster alone, which does not include Oxford and Cambridge, contains the same number of top-400 universities as all of Sub-Saharan Africa, the Middle East, and Latin America combined!

Not only are there are no low-income countries represented in the ranking, but India is also the only lower-middle income country represented, being home to five of the top-400 ranked universities. Latin America and Sub-Saharan Africa are home to three universities each, all six being based in upper-middle-income countries (i.e., Brazil, Colombia, and South Africa). These eleven elite universities in India, Latin America, and Sub-Saharan Africa serve a population of over 2.7 billion people.

The ranking also includes ten universities in China, an upper-middle-income economy that is home to over 1.3 billion citizens, and seven other universities from the same income group: five in Turkey, one in Iran, and one in Thailand. The remaining 34 Asian universities included in the ranking are mostly concentrated in densely populated (and wealthy) cities like Hong Kong, Seoul, Taipei, Tokyo, and Singapore.

The Middle East and North Africa also reveals a relatively concentrated geography of elite universities. Of the six universities included from the region, three are in Israel, two in Saudi Arabia, and one in Iran.

Oceania is interestingly the largest world region (in terms of number of top universities) present below the equator. All the top-400 universities in this region are found either in Australia or New Zealand, with two large clusters in Melbourne and Sydney.

Almost half of the top-400 universities are located in Europe, and over a quarter are in the United States. Northern Europe and the US East Coast are home to some of the largest university clusters, most notably in Greater London and Boston.

It’s important to remember that there are tens of thousands of universities that aren’t represented on this map; what this graphic doesn’t do is visualize the potentials or practices of all higher education worldwide. However, what it does do is clearly illustrate the highly uneven geography of elite education. The universities in the top-400 list don’t just command an undue amount of power, resources, and influence, but also serve to actively produce and reproduce it in particular parts of the world.

Geography of Top-Level Domain names

Geography_of_TopLevel_Domain_names2

Description

This graphic maps a combination of generic top-level domains (gTLDs) and country code top-level domains (ccTLDs) in order to provide an indication of the total number of domain registrations in every country worldwide.

Data

The graphic is based on data collected in 2013, and provided by Professor Matthew Zook (University of Kentucky). It also uses 2011 population and Internet penetration data from the World Bank (the same data we employed in our visualisation of Internet population and penetration).

All gTLDs are mapped through an analysis of information returned by the WHOIS Internet protocol, that provides contact information for any given domain. For instance, this meant that for every .com domain name, the location registered in that domain’s WHOIS data was retrieved and stored in a database.

The gTLD country-level data are supplemented with the number of ccTLDs that can be associated with each country. Here we operate under the assumption that in contrast to gTLDs, most ccTLDs will be registered and used by people in the country associated with it. For instance, we assume that a majority of .fr domains are used in France and a majority of .za domains are used in South Africa.

In making this assumption, we have also taken care to remove all ccTLDs that function as de-facto gTLDs from the map. This has meant removing the following ccTLDs:

  • .tv (Tuvalu): used by the media industry
  • .fm (Federated States of Micronesia): used by the media industry
  • .am (Armenia): used by the media industry
  • .mu (Mauritius): used by music websites
  • .ac (Ascension Island): used by education-related websites
  • .re (Réunion): used by real-estate agents
  • .ws (Samoa): used as an abbreviation for “web site”
  • .me (Montenegro): used for personal websites
  • .cc (Cocos Islands): used as an alternative to .com (administered by VeriSign)
  • .cm (Cameroon): used as an alternative to .com (as a way of exploiting typing errors)
  • .nu (Niue): means “now” in Danish, Dutch, and Swedish
  • .as (American Samoa): the suffixes “AS” and “A/S” are used in some countries (e.g. Norway, Denmark, and the Czech Republic) for joint stock companies
  • .io (British Indian Ocean Territory): used by start-up companies
  • .st (São Tomé and Príncipe): is used around the world in several ways
  • .tk (Tokelau): the .tk domain can (unusually) be registered for no monetary cost. This has meant that there are over 17 million domains registered to the country (which is more than the total registered in the UK).

We have included the following ccTLDs, resized according to estimates reported on Wikipedia:

  • .co (Colombia): used as an alternative to .com (as a way of exploiting typing errors)
  • .md (Moldova): used by medical doctors

The graphic also excludes all countries that contain fewer than 10,000 domains.

Despite these exclusions, we would maintain that this method offers the most comprehensive overview of the geography of top-level domains.

Findings

The cartogram illustrates that a majority of domains (78%) are registered in Europe or North America: a finding that reinforces the dominance of those two regions in terms of Internet content production. Asia, in contrast, is home to 13% of the world’s domains while Latin America (4%), Oceania (3%), and the Middle East and Africa combined (2%) have even smaller shares of the world’s websites.

Globally, there are about 10 Internet users for every registered domain. However, we see a significant standard variance from that number, and the relation between number of registered domains and Internet population seems to be bipartite. On the one hand, it is quite common for European and North American Internet users and companies to register a domain name; in the Netherlands and Switzerland, we see as few as two Internet users per domain. On the other hand, registering a domain name is relatively rare in much of the rest of the world. In most of the Middle East and Africa, we see over 50 Internet users per domain.

The United States is home to almost a third of all registered domains, and has about one website for every three Internet users. China, in contrast, can boast the world’s largest Internet population, but has only one registered domain for every 40 Internet users. In fact there are fewer domains registered in China than in the United Kingdom: which has about one tenth of China’s Internet population.

More broadly, Asian countries tend to have relatively low numbers of registered domains compared to European countries with similar Internet populations. Japan is home to twice the number of Internet users as the UK, but hosts less than one third of the number of British websites. Italy and Vietnam have almost the same Internet populations, but Italy is home to more than seven times the number of websites. Uzbekistan has more Internet users than Switzerland, but not even one percent of the number of Swiss websites (likely as a result of the extensive Internet censorship in operation in the country).

Interestingly there is a significant positive correlation between a country’s rank in Gross National Income (GNI) per capita, and the number of domain names per Internet user. A country’s ranked position by GNI per capita explains about 50% of the variance in its ranking by number of domain names per Internet user.

These data offer a fascinating window into one important facet of Internet content production. We see that large Internet populations in some countries (e.g. China) are not necessarily indicators of large numbers of domain registrations. In other words, just because a country is home to a large number of Internet users, doesn’t necessarily mean that it is also home to an active group of content producers. However, it is also likely that we see a lot of the world’s content simply placed within websites that are hosted within only a few countries (in particular, the US).

Yet it remains that, as we also see with other metrics (such as Wikipedia articles vs. edits), amongst those that are online, some countries are producers of large amounts of content whilst others remain largely consumers.

See also: Zook, M.A. (2001). Old hierarchies or new networks of centrality? The global geography of the internet content market. American Behavioral Scientist  44 (10) 1679-1696.