Freedom and Justice in our Technological Predicament

This is the thesis I wrote for my Master of Arts in philosophy.
It can also be dowloaded as a PDF.

Introduction

As the director of an NGO advocating for digital rights I am acutely aware of the digital trail we leave behind in our daily lives. But I too am occasionally surprised when I am confronted with concrete examples of this trail. Like when I logged into my Vodafone account (my mobile telephony provider) and—buried deep down in the privacy settings—found a selected option that said: “Ik stel mijn geanonimiseerde netwerkgegevens beschikbaar voor analyse.”¹ I turned the option off and contacted Vodafone to ask them what was meant by anonymized network data analysis. They cordially hosted me at their Amsterdam offices and showed me how my movement behaviour was turned into a product by one of their joint ventures, Mezuro:

Smartphones communicate continuously with broadcasting masts in the vicinity. The billions of items of data provided by these interactions are anonymized and aggregated by the mobile network operator in its own IT environment and made available to Mezuro for processing and analysis. The result is information about mobility patterns of people, as a snapshot or trend analysis, in the form of a report or in an information system.²

TNO had certified this process and confirmed that privacy was assured: Mezuro has no access into the mobility information of individual people. From their website: “While of mobility patterns is of great social value, as far as we’re concerned it is certainly not more valuable than protecting the privacy of the individual.”³

Intuitively something about Vodafone’s behavior felt wrong to me, but I found it hard to articulate why what Vodafone was doing was problematic. This thesis is an attempt to find reasons and arguments that explain my growing sense of discomfort. It will show that Vodafone’s behavior is symptomatic for our current relationship with technology: it operates at a tremendous scale, it reuses data to turn it into new products and it feels empowered to do this without checking with their customers first.

The main research question of this thesis is how the most salient aspects of our technological predicament affect both justice as fairness and freedom as non-domination.

The research consists of three parts. In the first part I will look at the current situation to understand what is going on. By taking a closer look at the emerging logic of our digitizing society I will show how mediation, accumulation and centralization shape our technological predicament. This predicament turns out to be one where technology companies have a domineering scale, where they employ a form of data-driven appropriation and where our relationship with the technology is asymmetrical and puts us at the receiving end of arbitrary control. A set of four case studies based on Google’s products and services deepens and concretizes this understanding of our technological predicament.

In the second part of the thesis I will use the normative frameworks of John Rawls’s justice as fairness and Philip Pettit’s freedom as non-domination to problematize this technological predicament. I will show how data-driven appropriation leads to injustice through a lack of equality, the abuse of the commons, and a mistaken utilitarian ethics. And I will show how the domineering scale and our asymmetrical relationship to the technology sector leads to unfreedom through our increased vulnerability to manipulation, through our dependence on philanthropy, and through the arbitrary control that technology companies exert on us.

In the third and final part I will take a short and speculative look at what should be done to get us out of this technological predicament. Is it possible to reduce the scale at which technology operates? Can we reinvigorate the commons? And how should we build equality into our technology relationships?

Continue reading “Freedom and Justice in our Technological Predicament” →

Artificial Intelligence as a Service

Companies like Google and IBM are opening up services through APIs that will allow you to do things like check if an image contains adult/violent content, check to see what mood a face on a picture is in, or detect the language a piece of text is written in. Artificial Intelligence as a Service as it were (or maybe Machine Learning as a Service would be more appropriate).

So imagine building your product on top of these services. What happens if they start asking you to pay? Or if they censor particular types of input? Or if they stop existing? Where are the open alternatives that you can host yourself?

For anyone who likes logical Lego, the availability of these plug and play services means that in many cases you don’t have to worry about the base technology, at least to get a simple demo running. Instead, the creativity comes in the orchestration of services, and putting them together in interesting ways in order to do useful things with them…

Source: Recognise This…? A Quick Round-Up of Some *-Recognition Service APIs | OUseful.Info, the blog…

Notes On a Full Day of Innovation

I was at a full day about innovation at Mediaplaza in Utrecht today. We used a room that had a stage in the center and chairs on four sides around it. This is a bit weird as the speaker has to look in four directions to be able to connect with the audience. The funny thing is that it actualy works (also because there are four screens on each wall): each of the speakers could do nothing else than be dynamic on the stage.

Below my public notes on a few of the presentations:

Gijs van der Hulst, Business Development Manager at Google

Gijs kicked off his presentation by showing this Project Glass demo:

[youtube=https://www.youtube.com/watch?v=9c6W4CCU9M4]

The Wall Street Journal has done some research and found out that there has been an increase of 65% in how often top 500 companies mention the word “innovation” in their public documents in the last five years. Unfortunately the business practices of these companies have not really changed. How can you really effect change?

Google has nine “rules for innovation”:

Innovation, not instant perfection. Another way of saying this is “launch and iterate”: first push it to the market and then see if it is working.
Ideas come from everywhere. They can come from employees, but also from acquisitions or from outsiders.
A licence to pursue your dreams. An example of a 20% project that was very succesful is Gmail. This was started by somebody who didn’t like how email was working at the time.
Morph projects – don’t kill them. Google’s failed social efforts (Buzz, Wave) has taught it valuable lessons for its current effort: Google+
Share as much information as you can. This is very different from most companies. The default for documents within the company is to share with everyone.
Users, users, users. At Google they innovate on the basis what users want, not on profit.
Data is apolitical. Opinions are less important than the data that supports them. They always seek evidence in the data to support their ideas. Personal note from me: Really? Really?? You cannot be serious!
Creativity love constraints. Their obsession with speed (with hard criteria for how quickly the interface has to react to user input) is an example of an enabler for many of their innovations.
You’re brilliant? We’re hiring. In the end it is about people and Google puts a lot of effort into making sure they have the right people on board.

Larger companies are more bureaucratic than smaller companies. Google is now more bureaucratic than it used to be. One of the ways this can be battled is by reorganizing which is exactly what Google has done recently.

Sean Gourley, Co-founder and CTO of Quid

Sean talked about our eye as an incredible machine with an incredible range. We enhanced our sight through microscopy and telescopy which opened up views towards the very small and the very big. We have yet to develop something that helps us see the very complex. He calls that “macroscopy”. For macroscopy you need:

big data
algorithms
visualization

[youtube=http://www.youtube.com/watch?v=5hGTjhuimH0]

He used this framing for his PhD work on understanding war. His team used publicly available information to analyze the war. When wikileaks leaked the US sig event database they could validate their data set and found that they had 81% coverage. His work was published in Science and in Nature. He decided to take it further though as he really wanted to understand complex systems. They needed to go from 300K in funding and 6 people towards an ambition level of about $100M and a 1000 people. He sought venture capital and had Peter Thiel as his first funder for Quid.

Sean then demoed the Quid software analyzing the term “big data”. Quid allows you to interactively play with the information. They extract entities from the information. So for example there are about 1500 companies involved in the big data space which can be put into different themes allowing you to see the connections between them while also sizing them for influence. Next was a fractal zoom into American Express where they looked at their patents portfolio and explored their IP creating a cognitive map of what it is that American Express does.

In 1997 Deep Blue changed the way we discussed artificial intelligence. We were beaten in chess by brute horsepower. As a reaction Kasparov started a new way of playing chess where you are allowed to bring anything you want to the chess table. The combination of human and machine turned out to be the best one. Gourley sees that as a metaphor for what he is trying to do with Quid: enhancing human cognitive capacity with machines, augmenting our ability to perceive this complex world.

Sean also talked about the adjacent possible: the way that the world could be if we used the pieces that are on the table right in front of you (e.g. the Apollo 13 Air Filter and duct tape).

His research on insurgents has taught him that some of them are successful and when they are, it is because of the following reasons:

Many groups
Internal Competition
Long Distance Connections
Reinforce Success
Fail
Shatter
Redistribute

Polly Summer, Chief Adoption Officer at Salesforce

Salesforce was recently recognized by Forbes as the most innovative company in the world. According to Polly the tech industry has significant innovations every 10 years. For each of these ten-year cycles the industry has 10 times more users.

The ingredients for continueous innovation at Salesforce are: Alignment & Collaboration, “A Beginners Mind”, Agility, Listen to customers and Think big.

Polly talked about how she used their social platform called Chatter to collaborate in a completely “flat” way. They now even use Chatter as a means to make the worldwide management offsite meeting radically transparent. The next step in the Chatter platform is to “gamify” it and let the individual contributors rise and recognize their contributions (they’ve acquired Rypple for example).

Agile is about maintaining innovation velocity and delivering at speed. The “prioritize, create, deliver, get feedback, iterate”-cycle needs to be sped up. One way of doing this is by listening to your customers as they are all a natural source for ideas. She showed a couple of examples from Starbucks and KLM:

Polly then shared an example of where Salesforce made a mistake: they announced a premium service that they wanted to charge extra for. Customers complained loudly on social media and within 24 hours they reversed their decision.

In 2000 they asked themselves the questions: Why isn’t all enterprise software like Amazon.com? Right now in 2011 they asked themselves a different question: Why isn’t all enterprise software like Facebook? She would consider 2011 the year of Social Revolution. Salesforce’s vision is that of a social enterprise: allowing the employee social network and the customer social network to connect (preferably in a single social profile).

Bjarte Bogsnes, VP Performance Management Development for Statoil, chairman of Beyond Budgeting Roundtable Europe

On Fortune 500 Statoil rates first on social responsibility and seventh on Innovation.

Bjarte discussed the problems with traditional management. He used my favourite metaphor, traffic, comparing traffic lights to roundabouts. Roundabouts are more efficient, but also more difficult to navigate. A roundabout is values-based and a traffic light is rules-based. Roundabouts are self-regulating and this is what we need in management models too. He then touched on Theory X and Theory Y.

When you combine Theory X with a perception of a stable business environment you get traditional management (rigid, detailed and annual, rules-based micromanagement, centralised command and control, secrecy, sticks and carrots). If you perceive the business environment as stable and you have Theory Y your management is based on values, autonomy, transparency (can be an alternative control mechanism) and internal motivation. If you combine Theory X with a dynamic business environment you get relative and directional goals, dynamic planning, forecasting and resource allocation and holistic performance evaluation.

Finally, if you combine Theory Y with a dynamic business environment you get Beyond Budgeting.

Beyond Budgeting has a set of twelve principles (it isn’t a recipe, but more of an idea or a philosophy):

Governance and transparency

Values: Bind people to a common cause; not a central plan
Governance: Govern through shared values and sound judgement; not detailed rules and regulations
Transparency Make information open and transparent; don’t restrict and control it

Accountable teams

Teams: Organize around a seamless network of accountable teams; not centralized functions
Trust: Trust teams to regulate their performance; don’t micro-manage them
Accountability: Base accountability on holistic criteria and peer reviews; not on hierarchical relationships

Goals and rewards

Goals: Set ambitious medium-term goals; not short-term fixed targets
Rewards: Base rewards on relative performance; not on meeting fixed targets

Planning and controls

Planning: Make planning a continuous and inclusive process; not a top-down annual event
Coordination: Coordinate interactions dynamically; not through annual budgets
Resources: Make resources available just-in-time; not just-in-case
Controls: Base controls on fast, frequent feedback; not budget variances

Most companies use budgeting for three different things:

Setting targets
Forecasting
Resource allocation

When we combine these three things in a single number then we might run into its conflicting purposes. So the first step towards Beyond Budgeting is separating these three things. So for example the target is what you want to happen and the forecast is what you think will happen. The next step is to become more event driven rather than calendar driven.

Statoil has a programme called “Ambition to Action”:

Performance is ultimately about performing better than those we compare ourselves with.
Do the right thing in the actual situation, guided by the Statoil book, your Ambition to action, decision criteria & authorities and sound business judgement.
Within this framework, resources are made available or allocated case-by-case.
Business follow up is forward looking* and action oriented.
Performance evaluation is a holistic assessment of delivery and behaviour.

From strategic ambitions to KPIs (“Nothing happens just because you measure: you don’t lose weight by weighing yourself.”) and then into actions/forecasts and finally into individual or team goals.

Fosdem 2012 or Why Open Source is Still Revelant

Fosdem is the place where you’ll find a Google engineer who as a “full time hobby” is lead developer for WorldForge an open source Massive Multiplayer Online game, or where you have a beer with a developer who has a hard time finding a job, because all the code he write has to have a free software license: “you don’t ask a vegan to have a little bit of meat do you?”. It probably is the world’s biggest free software conference: More than 5000 people show up yearly in Brussels, there is no fee to attend and there is no registration process.

I really enjoy going because there are few other events that have this few barriers to attendance and to approaching the event the way you want to approach it. I like wondering around and thinking about how these are the people that actually keep the Internet working. Below some notes about the different talks that I attended (very little educational technology to be found, beware!).

Free Software: A viable model for Commercial Success

Robert Dewar from AdaCore had an interesting talk about how to use free software as a true commercial offering. There was no ideology in his talk but only a pure commercial perspective. They usually sell free software as “open source” and focus on convenience and utility in their selling proposition. They tell the customer they get the source code included without locks and with no limits on the number of installs.

The business model is based around subscriptions (for support, testing, etc.). What he really likes about that model is that the interests of them and the customer are fully aligned: they only make money when the customer renews. Often companies have to get used to asking for support though, they have not been “trained” to value support in the past.

He considers commercial versus open source a bogus distinction. In many ways he would consider AdaCore to be very similar to what Microsoft in what they do. The main difference is the license of the software. The AdaCore is much more permissive as you are allowed to copy and do with it what you want.

He also spent some time thinking about whether AdaCore’s approach would work with other companies. Could Microsoft open source Windows? He thinks they could without it affecting them badly: people would be willing to pay for timely updates and support. Could a games company open source their games? Copryright protection is one way they currently protect their very large investments. It might be hard for them to open source, but in general the model could be used much more widely. Every company is in the business of giving users what they want and open source licenses are that much more convenient for users.

A New OSI For A New Decade

Simon Phipps has joined the board of the Open Rights Group and the Open Source Initiative (OSI). He talked about reptiles: they have no morality and are very old and only react to fear and hunger. Corporates are reptiles too. Corporations don’t have ethics, people have ethics. OSI tried to find a way to show large organizations that the four software freedoms (use, study, modify and distribute) are important for them too. A pragmatic rather than a moral perspective on open source software helped the OSI to be able to get corporate involvement. Their initial focus was very much on licensing. They have been succesful: OSI has become the standard for open source in government and the fear around the term has been turned around: other processes are now appropriating the term.

We are now in a new decade: Open Source is the default and digital liberty is moving to centre stage. OSI has lost some of its relevance, so they decided to reinvigorate the organization with a member-based governance which should include all stakeholders. They now have new affiliates (other open source non-profits like Mozilla or Drupal) and the next stage will be government bodies and non-entities (whatever that might mean). Later they will get personal associates and then corporate patrons. All of this should enable a bottom-up governance. Members will decide how OSI will operate, they will create OSI initiatives, they can use OSI as a policy venue and they will co-ordinate initiatives locally and globally.

A new OSI project will try and help educators educate the world about open source: FLOSSBOK. I am personally not sure the world is waiting for another project like this. There are quite a few alternatives already.

Mozilla Devroom

Tristan Nitot, Principal Mozilla Evangelist kickstarted the Mozilla Devroom. He told us that six European organisations have gotten significant grants from Mozilla (one of them being Fosdem). Mozilla strives to create an Internet that is benefiting everyone. The Internet that is being built currently does not benefit everyone. He focused on a couple of trends on the net:

App Stores have good sides (app discovery and monetization), but also very bad sides: they create vendor lock-in and prevent people from switching platform (I have personally felt this when contemplating switching away from the iOS platform) and occasionally inhibit free speech through “censorship”. Mozilla believes you can get the good of the app stores without the bad.
Social networks have obvious good sides, but also profile users, prevent users from porting their data to other services and identity providers can even lock people out of their digital lives. Using Facebook is ok, but don’t use it exclusively to interact with others. When you use something for free, then you can assume that you are the products. He showed us a great cartoon about Facebook users:
The "Free" Model by Geek&Poke

Newer devices (tablets, smartphones and netbooks) are increasingly convenient and popular. Very often they force users to a specific browser (e.g. Chrome on the Chromebook or Safari on iOS) making them definition the opposite of the web.

What is Mozilla doing about these things:

Open Web Apps are based on open web technologies, cross-browser and available in multiple app stores. You can even host your own apps on your websites for others to install in their browser. WebRT brings this a step further. It is a runtime for web applications that makes web apps look and feel like native apps on multiple platforms. Things like a Media Capture API will really change what is possible to do with Javascript in a browser. Other surprising APIs are the Battery API, the WebNFC (Near Field Communications) API and the Vibration API(!). More documentation is available here
They are trying to solve identity in a decentralized, browser agnostic and privacy respecting way. The codename for the project is BrowserID and it is based on using email addresses to provide identity.
Boot2Gecko (B2G) is a complete operating system build for the open web. Check out the Frequently Asked Questions about the project.

In my book these three projects (especially the last one) make Mozilla a group of absolute heroes. Donate here!

There was an interesting talk about how Mozilla organizes its own IT services. Currently that is done by paid staff, but they strongly believe they can get this done through the community (MediaWiki does something similar.

Kai Engert talked about a very important topic: “Web security, and how to prevent the next DigiNotar“. He has a let’s say “unconventional” presentation style: instead of slides he used a piece of written text that he displayed on the screen and read out loud. Maybe this should be called something like “live visual podcasting”. His points were good though. He explained how it is a problem that every Certificate Authority (CA) has unlimited power and he listed the alternatives. You could maybe use a web of trust like the CAcert community. This still doesn’t solve the problem of a single root key. Another proposed solution was Convergence using notaries that would monitor certificates. Kai see too many problems with this as a solution for general users. One suggestion could be build on top of DNSSEC. Again that has problems. How do you know who has signed the the DNS? Google has also proposed something called Certificate Transparency which might work, but also might create some problems. His proposed solution builds on what is in existence using the existings CA combined wit the notary system. This talk was bit dense (I got lost half way if I am honest, obsessibely reading Megan Amram), so if you want to read it yourself find it here.

Michelle Thorne is the global event strategist for Mozilla. She is currently very focused on creating communities of “webmakers” and they are starting with children, video makers and journalists first. She presented three tools/projects for these webmakers:

Hackasaurus let’s anybody edit the web. Kids are suddenly empowered to remix existing web pages. Check out the hacktivity kit if you want to use this in the classroom.
Popcorn.js is a HTLM5 media framework that allows you to connect web content with video.
OpenNews (formerly called knight-mozilla) puts web developers in newsrooms building tools that help journalistic challenges.

One thing I noticed is that she used htmlpad to present a few slides. I need to check this out as it is probably one of the simplest ways of collaborating around text or getting a quick HTML page online.

The focus for Mozilla in Fosdem is very much on the technology side of things and less on the broader themes that the Mozilla foundation is tackling. I had a hard time finding somebody from the Mozilla Learning team to talk about Open Badges, but did get some good connections to have this conversation later in the year.

Wikiotics

Wikiotics did a very short lightning talk of which I only managed to catch the tail end. Their goal is to make a site that allows anybody to create, update, remix interactive language lessons.

The Pandora

The Pandora is a small Nintendo DS sized open Linux computer designed for gaming. It has a 800×480 touchscreen, wifi, bluetooth, two SDHC card slots, SVideo output, two analogue controllers, a DPad, L/R buttons, a QWERTY thumb keyboard, 256/512MB RAM and 512MB NAND Storage. It has about 10 hours of battery life (full use).

It comes with its own repository (an app store) allowing for easy installation and updating of games and other applications. One thing that will appeal to many people is the amount of emulators that it can run. If you want to relive the days you spent on the Amiga 500, Commodora 64, Apple II or the Atari ST it will work for you.

Because the device is so open, the possibilities are limitless. For example, you could connect a keyboard and mouse using a USB hub and connect it to a TV to turn the Pandora into a small desktop PC or connect a USB harddisk and turn it into a web- or fileserver. The price price will be €375 (ex VAT). What is great is that the device is produced in Germany and so does not have any sick labour conditions for the people building it.

Balancing Games, The Open Source Way

Jeremy Rosen has been working on Battle for Wesnoth, a turn-based strategy game, since 2004. He talked about how to achieve balance in a game. When you are talking about multiplayer balance:

No match should be decided by the matchup
No match should be decided by the chosen map
The best player should win… usually

Single player balance is different, in single player game fairness is not important anymore, it is just about having fun:

The AI won’t complain if the game is unfair (Jeremy on the AI: “By the way our AI doesn’t cheat, but is very good in math”)
Players want the game to be challenging
Each player has different capacities, we need to decide who we balance for

Balance problems can occur in many places (e.g. map balance, cross scenario balance, unit characteristics) and aren’t easy to find. One way of finding them is by organizing tournaments as people will do their best to exploit balance weaknesses to win. Balance will always be a moving target and new strategies will appear. User feedback is not so useful because players think they never make mistakes and that all their strategies should work. Sometimes you can find some good providers of feedback: “These persons are important, and like all of us, they are fueled by ego. Don’t forget to fuel them”.

His recommendation is to find somebody in your game’s community who can make a balance a fulltime job.

Freedom Box: Out of the Box!

Bdale Garbee, gave us an update on the activities at the FreedomBox Foundation. According to him it really is a problem that we are willfully hand over a lot of personal data to companies to manage on our behalf without thinking much about the consequences. Regardless of the intention of companies, for-profit companies have to operate within the rules of the jurisdictions that they operate and can lead to things like Photo DNA.

Freedombox’ vision is to create a personal server running a free software operating system and applications designed to create and preserve personal privacy that should run on cheap, power-efficient plug computers that people can install in their own homes. That will then be a platform on which privacy-respecting federated alternatives to current social networks can be build. These devices will probably be mesh-networked to augment or replace the current infrastructure.

The foundation has to do four things:

Technology
User Experience (this is very important if it is going to be useful for people who are not “geeks”)
Publicity and Fund-Raising
Industry Relations

They have had to bound the challenge by focusing on software, rather than custom hardware and on servers and services rather than client devices. They have also decided to use existing networking infrastructure where appropriate while working to move away from central infrastructure control points (like the Domain Name System (DNS)). Another decision has been to build all elements of their reference implementation on top of Debian which is a completely open volunteer based International organisation. This means that regardless of how successful they will be as a foundation all of their work will survive and remain available. Their goal is that new stable releases of Debian should have everything needed to create FreedomBoxes “out of the box”.

The first “application” they want to deliver is a secure chat service. They have based this on XMPP with Prosody on a single host (by chance I was sitting next to one of the Prosody developers).

They have also decided to make OpenPGP (GnuPG) keys as the root of trust. It is great technology, but it is hard to establish initial trust relationships. One interesting idea is to take advantage of smartphone technology (that we all walk around with) to facilitate initial key exchange (see the work from Stefano Maffuli).

They have done some investigations into plug computers. They focused mostly on the Dreamplug (which gave them quite a bit of GPL related headaches), but you also have the Sheeva and the Tonido.

He finished his talk by quoting Benjamin Franklin:

They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.

What I should have written last year: distributed and federated systems

There is an overarching trend at Fosdem that I could already see last year: the idea of decentralisised, distributed and federated systems for social networking and collaboration. There is a whole set of people working on creating social networks without a center (e.g. BuddyCloud or Status.net or distributed filesystems (like OpenAFS), alternatives to GoogleDocs (LibreDocs) and mesh networking (like Village Telco with the Mesh Potato). There are even people who are trying to separate cloud storage from the cloud application (Project Unhosted). These are very important project that have my full attention.

If you have reached this far in the post and still want to read more (with a little bit more of a learning perspective) then you should check out Bert De Coutere’s blogpost. Through him I learned about Open Advice, an interesting approach to capturing lessons learned.

Lak11 Week 3 and 4 (and 5): Semantic Web, Tools and Corporate Use of Analytics

Two weeks ago I visited Learning Technologies 2011 in London (blog post forthcoming). This meant I had less time to write down some thoughts on Lak11. I did manage to read most of the reading materials from the syllabus and did some experimenting with the different tools that are out there. Here are my reflections on week 3 and 4 (and a little bit of 5) of the course.

The Semantic Web and Linked Data

This was the main topic of week three of the course. Basically the semantic web has a couple of characteristics. It tries to separate the presentation of the data and the data itself. It does this by structuring the data which then allows linking up all the data. The technical way that this is done is through so-called RDF-triples: a subject, a predicate and an object.

Although he is a better writer than speaker, I still enjoyed this video of Tim Berners-Lee (the inventor of the web) explaining the concept of linked data. His point about the fact that we cannot predict what we are going to make with this technology is well taken: “If we end up only building the things I can imagine, we would have failed“.

[youtube=http://www.youtube.com/watch?v=OM6XIICm_qo]

The benefits of this are easy to see. In the forums there was a lot of discussion around whether the semantic web is feasible and whether it is actually necessary to put effort into it. People seemed to think that putting in a lot of human effort to make something easier to read for machines is turning the world upside down. I actually don’t think that is strictly true. I don’t believe we need strict ontologies, but I do think we could define more simple machine readable formats and create great interfaces for inputting data into these formats.

Use cases for analytics in corporate learning

Weeks ago Bert De Coutere started creating a set of use cases for analytics in corporate learning. I have been wanting to add some of my own ideas, but wasn’t able to create enough “thinking time” earlier. This week I finally managed to take part in the discussion. Thinking about the problem I noticed that I often found it difficult to make a distinction between learning and improving performance. In the end I decided not to worry about it. I also did not stick to the format: it should be pretty obvious what kind of analytics could deliver these use cases. These are the ideas that I added:

Portfolio management through monitoring search terms
You are responsible for the project management portfolio learning portfolio. In the past you mostly worried about “closing skill gaps” through making sure there were enough courses on the topic. In recent years you have switched to making sure the community is healthy and you have switched from developing “just in case” learning intervention towards “just in time” learning interventions. One thing that really helps you in doing your work is the weekly trending questions/topics/problems list you get in your mailbox. It is an ever-changing list of things that have been discussed and searched for recently in the project management space. It wasn’t until you saw this dashboard that you noticed a sharp increase in demand for information about privacy laws in China. Because of it you were able to create a document with some relevant links that you now show as a recommended result when people search for privacy and China.
Social Contextualization of Content
Whenever you look at any piece of content in your company (e.g. a video on the internal YouTube, an office document from a SharePoint site or news article on the intranet), you will not only see the content itself, but you will also see which other people in the company have seen that content, what tags they gave it, which passages they highlighted or annotated and what rating they gave the piece of content. There are easy ways for you to manage which “social context” you want to see. You can limit it to the people in your direct team, in your personal network or to the experts (either as defined by you or by an algorithm). You love the “aggregated highlights view” where you can see a heat map overlay of the important passages of a document. Another great feature is how you can play back chronologically who looked at each URL (seeing how it spread through the organization).
Data enabled meetings
Just before you go into a meeting you open the invite. Below the title of the meeting and the location you see the list of participants of the meeting. Next to each participant you see which other people in your network they have met with before and which people in your network they have emailed with and how recent those engagements have been. This gives you more context for the meeting. You don’t have to ask the vendor anymore whether your company is already using their product in some other part of the business. The list also jogs your memory: often you vaguely remember speaking to somebody but cannot seem to remember when you spoke and what you spoke about. This tools also gives you easy access to notes on and recordings of past conversations.
Automatic “getting-to-know-yous”
About once a week you get an invite created by “The Connector”. It invites you to get to know a person that you haven’t met before and always picks a convenient time to do it. Each time you and the other invitee accept one of these invites you are both surprised that you have never met before as you operate with similar stakeholders, work in similar topics or have similar challenges. In your settings you have given your preference for face to face meetings, so “The Connector” does not bother you with those video-conferencing sessions that other people seem to like so much.
“Train me now!”
You are in the lobby of the head office waiting for your appointment to arrive. She has just texted you that she will be 10 minutes late as she has been delayed by the traffic. You open the “Train me now!” app and tell it you have 8 minutes to spare. The app looks at the required training that is coming up for you, at the expiration dates of your certificates and at your current projects and interests. It also looks at the most popular pieces of learning content in the company and checks to see if any of your peers have recommended something to you (actually it also sees if they have recommended it to somebody else, because the algorithm has learned that this is a useful signal too), it eliminates anything that is longer than 8 minutes, anything that you have looked at before (and haven’t marked as something that could be shown again to you) and anything from a content provider that is on your blacklist. This all happens in a fraction of a second after which it presents you with a shortlist of videos for you to watch. The fact that you chose the second pick instead of the first is of course something that will get fed back into the system to make an even better recommendation next time.
Using micro formats for CVs
The way that a simple structured data format has been used to capture all CVs in the central HR management system in combination with the API that was put on top of it has allowed a wealth of applications for this structured data.

There are three more titles that I wanted to do, but did not have the chance to do yet.

Using external information inside the company
Suggested learning groups to self-organize
Linking performance data to learning excellence

Book: Head First Data Analytics

I have always been intrigued by O’Reilly’s Head First series of books. I don’t know any other publisher who is that explicit about how their books try to implement research based good practices like an informal style, repetition and the use of visuals. So when I encountered Data Analysis in the series I decided to give it a go. I wrote the following review on Goodreads:

The “Head First” series has a refreshing ambition: to create books that help people learn. They try to do this by following a set of evidence-based learning principles. Things like repetition, visual information and practice are all incorporated into the book. This good introduction to data analysis, in the end only scratches the surface and was a bit too simplistic for my taste. I liked the refreshers around hypothesis testing, solver optimisation in Excel, simple linear regression, cleaning up data and visualisation. The best thing about the book is how it introduced me to the open source multi-platform statistical package “R”.

Learning impact measurement and Knowledge Advisers

The day before Learning Technologies, Bersin and KnowledgeAdvisors organized a seminar about measuring the impact of learning. David Mallon, analyst at Bersin, presented their High-Impact Measurement framework.

Bersin High-Impact Measurement Framework

The thing that I thought was interesting was how the maturity of your measurement strategy is basically a function of how much your learning organization has moved towards performance consulting. How can you measure business impact if your planning and gap analysis isn’t close to the business?

Jeffrey Berk from KnowledgeAdvisors then tried to show how their Metrics that Matter product allows measurement and then dashboarding around all the parts of the Bersin framework. They basically do this by asking participants to fill in surveys after they have attended any kind of learning event. Their name for these surveys is “smart sheets” (an much improved iteration of the familiar “happy sheets”). KnowledgeAdvisors has a complete software as a service based infrastructure for sending out these digital surveys and collating the results. Because they have all this data they can benchmark your scores against yourself or against their other customers (in aggregate of course). They have done all the sensible statistics for you, so you don’t have to filter out the bias on self-reporting or think about cultural differences in the way people respond to these surveys. Another thing you can do is pull in real business data (think things like sales volumes). By doing some fancy regression analysis it is then possible to see what part of the improvement can be attributed with some level of confidence to the learning intervention, allowing you to calculate return on investment (ROI) for the learning programs.

All in all I was quite impressed with the toolset that they can provide and I do think they will probably serve a genuine need for many businesses.

The best question of the day came from Charles Jennings who pointed out to David Mallon that his talk had referred to the increasing importance of learning on the job and informal learning, but that the learning measurement framework only addresses measurement strategies for top-down and formal learning. Why was that the case? Unfortunately I cannot remember Mallon’s answer (which probably does say something about the quality or relevance of it!)

Experimenting with Needlebase, R, Google charts, Gephi and ManyEyes

The first tool that I tried out this week was Needlebase. This tool allows you to create a data model by defining the nodes in the model and their relations. Then you can train it on a web page of your choice to teach it how to scrape the information from the page. Once you have done that Needlebase will go out to collect all the information and will display it in a way that allows you to sort and graph the information. Watch this video to get a better idea of how this works:

[youtube=http://www.youtube.com/watch?v=58Gzlq4zSDk]

I decided to see if I could use Needlebase to get some insights into resources on Delicious that are tagged with the “lak11” tag. Once you understands how it works, it only takes about 10 minutes to create the model and start scraping the page.

I wanted to get answers to the following questions:

Which five users have added the most links and what is the distribution of links over users?
Which twenty links were added the most with a “lak11” tag?
Which twenty links with a “lak11” tag are the most popular on Delicious?
Can the tags be put into a tag cloud based on the frequency of their use?
In which week were the Delicious users the most active when it came to bookmarking “lak11” resources?
Imagine that the answers to the questions above would be all somebody were able to see about this Knowledge and Learning Analytics course. Would they get a relatively balanced idea about the key topics, resources and people related to the course? What are some of the key things that would they would miss?

Unfortunately after I had done all the machine learning (and had written the above) I learned that Delicious explicitly blocks Needlebase from accessing the site. I therefore had to switch plans.

The Twapperkeeper service keeps a copy of all the tweets with a particular tag (Twitter itself only gives access to the last two weeks of messages through its search interface). I manage to train Needlebase to scrape all the tweets, the username, URL to user picture and userid of the person adding the tweet, who the tweet was a reply to, the unique ID of the tweet, the longitude and latitude, the client that was used and the date of the tweet.

I had to change my questions too:

Which ten users have added the most tweets and what is the distribution of tweets over users?
This was easy to get and graph with Needlebase itself:

Top 11 Lak11 Twitter Users

I personally like treemaps for this kind of data, so I tried to create one in IBM’s ManyEyes. Unfortunately they seem to have some persistent issues with their site:

ManyEyes error message
Which twenty links were added the most with a “lak11” tag? Another way of asking this would be: which twenty links created the most buzz?
This was a bit harder because Needlebase did not get the links for me. I had to download all the text into a text file and use some regular expressions to get a list of all the URLs in the tweets. 796 of the 967 tweets had a URL (that is more than 80%), 453 of these were unique. I could then do some manipulations in a spreadsheet (sorting, adding and some appending) to come up with a list. Most of these URLs are shortened, so I had to check them online to get their titles. This is the result:
- Elluminate | Session Log-in (23 mentions)
- Learning Analytics Syllabus – Google Docs (17 mentions)
- The # LAK11 Daily (15 mentions)
- Elluminate | Session Log-in (14 mentions)
- Where do we find good critiques of learning analytics? | Learning and Knowledge Analytics (10 mentions)
- MOOC newbie voice “a slackers entrance into lak11″» Dave’s Educational Blog (10 mentions)
- Half an Hour: Why the Semantic Web Will Fail (9 mentions)
- Artifacts of sensemaking | Learning and Knowledge Analytics (9 mentions)
- Help Me Understand The Buzz Around Learning Analytics | FunnyMonkey (9 mentions)
- Social Networks in Action – Learning Networks @ UOW (8 mentions)
- Reflections on Open Courses: Curation, Ombuds, and Concierges | Learning and Knowledge Analytics (7 mentions)
- Course: Learning and Knowledge Analytics (7 mentions)
- YouTube – What is Hadoop? Other big data terms like MapReduce? Cloudera’s CEO talks us through big data trends (6 mentions)
- @Ignatia Webs: #LAK11 a free and open #elearning course on #statistics starts today, join!Ignatia Webs (6 mentions)
- Learning & Knowledge Analytics 2011 (6 mentions)
- for the love of learning: Measurable Outcomes (6 mentions)
- Free project in the clouds for teachers around the world http://web20ineducation2010.ning.com/: applications, e-safety, education, education , social media, tools, web 2.0 | Glogster EDU – 21st century multimedia tool for educators, teachers and students (6 mentions)
- Study Group: Learning Analytics | OpenStudy (6 mentions)
- Course: Learning and Knowledge Analytics (6 mentions)
- http://paper.li/tag/LAK11 (5 mentions)

One problem I noticed is that two of the twenty results were the same URL with a different shortened URLs (the link to the Moodle course and to the Paper.li paper): URL shorteners make the web the more difficult place in many ways.

What other hashtags are used next to Lak11?
Here I used a similar methodology as for the URLs. In the end I had a list of all the tags with their frequencies. I used Wordle and ManyEyes to put them into tag clouds:

Wordle Lak11 Hashtags

ManyEyes Lak11 Hashtags

Also compare them to tag clouds of the complete texts of the tweets (cleaned up to remove usernames, “RT”, “Lak11” URLs and the # in front of the hash tags):

Wordle Lak11 Tweets Texts

ManyEyes Lak11 Tweets Texts

Which one do you find more insightful? I personally prefer the latter one as it would give somebody who knows nothing about Lak11 a good flavor of the course.
How are the Tweets distributed over time? Is the traffic increasing with time or decreasing?
I decided to just get a simple list of days with the number of tweets per day. As an exercise I wanted to graph it in R. These are the results:

Tweets per day

I couldn’t learn anything interesting from that one.
Imagine that the answers to the questions above would be all somebody were able to see about this Knowledge and Learning Analytics course. Would they get a relatively balanced idea about the key topics, resources and people related to the course? What are some of the key things that would they would miss? If you would automate getting answers to all these question (no more manual writing of regex!) would that be useful for learners and facilitators?
I have to say that I was pleasantly surprised by how fruitful the little exercise with getting the top 20 links was. I really do believe that these links capture much of the best materials of the first couple of weeks of the course. If you would use the Wordle as the single image to give a flavour of the course and then point to the 20 URLs and get the names of the top Twitterers, than you would be off to badly.

Another great resource that I re-encountered in these weeks of the course was the Rosling’s Gapminder project:

[youtube=http://www.youtube.com/watch?v=BPt8ElTQMIg]

Google has acquired some part of that technology and thus allows a similar kind of visualization with their spreadsheet data. What makes the data smart is the way that it shows three variables (x-axis, y-axis and size of the bubble and how they change over time. I thought hard about how I could use the Twitter data in this way, but couldn’t find anything sensible. I still wanted to play with the visualization. So at the World Bank’s Open Data Initiative I could download data about population size, investment in education and unemployment figures for a set of countries per year (they have a nice iPhone app too). When I loaded that data I got the following result:

Click to be able to play the motion graph

The last tool I installed and took a look at was Gephi. I first used SNAPP on the forums of week and exported that data into an XML based format. I then loaded that in Gephi and could play around a bit:

My participation in numbers

I will have to add up my participation for the two (to three) weeks, so in week 3 and week 4 of the course I did 6 Moodle posts, tweeted 3 times about Lak11, wrote 1 blogpost and saved 49 bookmarks to Diigo.

The hours that I have played with all the different tools mentioned above are not mentioned in my self-measurement. However, I did really enjoy playing with these tools and learned a lot of new things.