Lak11 Week 2: Rise of “Big Data” and Data Scientists

These are my reflection and thoughts on the second week of Learning and Knowledge analytics (Lak11). These notes are first an foremost to cement my own learning experience, so for everybody but me they might feel a bit disjointed.

What was week 2 about?

This week was an introduction to the topic of “big data”. As a result of all the exponential laws in computing, the amount of data that gets generated every single day is growing massively. New methods of dealing with the data deluge have cropped up in computer science. Businesses, governments and scientists are learning how to use the data that is available to their advantage. Some people actually think this will fundamentally change our scientific method (like Chris Anderson in Wired).

Big data: Hadoop

Hadoop is one of these things that I heard a lot about without ever really understanding what it was. This Scoble interview with the CEO of Cloudera made things a lot clearer for me.

[youtube=http://www.youtube.com/watch?v=S9xnYBVqLws]

Here is the short version: Hadoop is a set of open source technologies (it is part of the Apache project) that allows anyone to do large scale distributed computing. The main parts of Hadoop are a distributed filesystem and a software framework for processing large data sets on clusters.

The technology is commoditised, imagination is what is needed now

The Hadoop story confirmed for me that this type of computing is already largely commoditised. The interesting problems in big data analytics are probably not technical anymore. What is needed isn’t more computing power, we need more imagination.

The MIT Sloan Management Review article titled Big Data, Analytics and the Path from Insights to Value says as much:

The adoption barriers that organizations face most are managerial and cultural rather than related to data and technology. The leading obstacle to wide-spread analytics adoption is lack of understanding of how to use analytics to improve the business, according to almost four of 10 respondents.

This means that we should start thinking much harder about what things we want to know that we couldn’t get before in a data-starved world. This means we have to start with the questions. From the same article:

Instead, organizations should start in what might seem like the middle of the pro-cess, implementing analytics by first defining the insights and questions needed to meet the big busi-ness objective and then identifying those pieces of data needed for answers.

I will therefore commit myself to try and formulate some questions that I would like to have answered. I think that Bert De Coutere’s use cases could be an interesting way of approaching this.

This BusinessWeek excerpt from Stephen Baker’s The Numerati gives some insight into where this direction will take us in the next couple of years. It profiles a mathematician at IBM, Haren, who is busy working on algorithms that help IBM match expertise to demand in real time, creating teams of people that would maximise profits. In the example, one of the deep experts takes a ten minute call while being on the skiing slopes. By doing that he:

[..] assumes his place in what Haren calls a virtual assembly line. “This is the equivalent of the industrial revolution for white-collar workers,”

Something to look forward to?

Data scientists, what skills are necessary?

This new way of working requires a new skill set. There was some discussion on this topic in the Moodle forums. I liked Drew Conway’s simple perspective, basically a data scientist needs to be on the intersection of Math & Statistics Knowledge, Substantive Expertise and Hacking Skills. I think that captures it quite well.

Data Science Venn Diagram (by Drew Conway)

How many people do you know who could occupy that space? The How do I become a data scientist? question on Quora also has some very extensive answers as well.

Connecting connectivism with learning analytics

This week the third edition of the Connectivism and Connective Knowledge course has started too. George Siemens kicked of by posting a Connectivism Glossary.

It struck me that many of the terms that he used there are things that are easily quantifiable with Learning Analytics. Concepts like Amplification, Resonance, Synchronization, Information Diffusion and Influence are all things that could be turned into metrics for assessing the “knowledge health” of an organisation. Would it be an idea to get clearer and more common definitions of these metrics for use in an educational context?

Worries/concerns from the perspective of what technology wants

Probably the most lively discussion in the Moodle forums was around critiques of learning analytics. My main concern for analytics is the kind of feedback loop it introduces once you become public with the analytics. I expressed this in a reference to Goodhart’s law which states that:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes

George Siemens did a very good job in writing down the main concerns here. I will quote them in full for my future easy reference.

1. It reduces complexity down to numbers, thereby changing what we’re trying to understand
2. It sets the stage for the measurement becoming the target (standardized testing is a great example)
3. The uniqueness of being human (qualia, art, emotions) will be ignored as the focus turns to numbers. As Gombrich states in “The Story of Art”: The trouble about beauty is that tastes and standards of what is beautiful vary so much”. Even here, we can’t get away from this notion of weighting/valuing/defining/setting standards.
4. We’ll misjudge the balance between what computers do best…and what people do best (I’ve been harping for several years about this distinction as well as for understanding sensemaking through social and technological means).
5. Analytics can be gamed. And they will be.
6. Analytics favour concreteness over accepting ambiguity. Some questions dont have answers yet.
7. The number/quantitative bias is not capable of anticipating all events (black swans) or even accurately mapping to reality (Long Term Capital Management is a good example of “when quants fail”: http://en.wikipedia.org/wiki/Long-Term_Capital_Management )
8. Analytics serve administrators in organizations well and will influence the type of work that is done by faculty/employees (see this rather disturbing article of the KPI influence in universities in UK: http://www.nybooks.com/articles/archives/2011/jan/13/grim-threat-british-universities/?page=1 )
9. Analytics risk commoditizing learners and faculty – see the discussion on Texas A & M’s use of analytics to quantify faculty economic contributions to the institution: http://www.nybooks.com/articles/archives/2011/jan/13/grim-threat-british-universities/?page=2 ).
10. Ethics and privacy are significant issues. How can we address the value of analytics for individuals and organizations…and the inevitability that some uses of analytics will be borderline unethical?

This type of criticism could be enough for anybody to give up already and turn their back to this field of science. I personally belief that this would a grave mistake. You would be moving against the strong and steady direction of technology’s tendencies.

SNAPP: Social network analysis

The assignment of the week was to take a look at Social Networks Adapting Pedagogical Practice (better known as SNAPP) and use it on the Moodle forums of the course. Since I had already played with it before I only looked at Dave Cormier‘s video of his experience with the tool:

[youtube=http://www.youtube.com/watch?v=ZHNM8FWrpLk]

Snapp’s website gives a good overview of some of the things that a tool like this can be used for. Think about finding disconnected or at-risk students, seeing who are the key information brokers in the class, use it for “before and after” snapshots of a particular intervention, etc.

Before I was able to use it inside my organisation I needed to make sure that the tool does not send any of the data it scrapes back home to the creators of the software (why wouldn’t it, it is a research project after all). I had an exchange with Lori Lockyer, professor at Wollongong, who assured me that:

SNAPP locally complies the data in your Moodle discussion forum but it does not send data from the server (where the discussion forum is hosted) to the local machine nor does it send data from the local machine to the server.

Making social networks inside applications (and ultimately inside organisations) more visible to many more people using standard interfaces is a nice future to look forward to. Which LMS is the first to have these types of graphs next to their forum posts? Which LMS will export graphs in some standard format for further processing with tools like Gephi?

Gephi is one of the tools by the way, that I really should start to experiment with sooner rather than later.

The intelligent spammability of open online courses: where are the vendors?

One thing that I have been thinking about in relation to these Open Online Courses is how easy it would be for vendors of particular related software products to come and crash the party. The open nature of these courses lends itself to spam I would say.

Doing this in an obnoxious way will ultimately not help you with this critical crowd, but being part of the conversation (Cluetrain anybody?) could be hugely beneficial from a commercial point of view. As a marketeer where else would you find as many people deeply interested into Learning Analytics as in this course? Will these people not be the influencers in this space in the near future?

So where are the vendors? Do you think they are lurking, or am I overstating the opportunity that lies here for them?

My participation in numbers

Every week I give a numerical update about my course participation (I do this in the spirit of the quantified self, as a motivator and because it seems fitting for the topic). This week I bookmarked 37 items on Diigo, wrote 3 Lak11 related tweets, wrote 5 Moodle forum posts and 1 blog post.

The Future State of Capability Building in Organizations: Inspirations

CC-licenced photo by Flickr user kevindooley

I have been involved in organizing a workshop on capability building in organizations hosted on my employer‘s premises (to be held on October 20th). We have tried to get together an interesting group of professionals who will think about the future state of capability building and how to get there. All participants have done a little bit of pre-work by using a single page to answer the following question:

What/who inspires you in your vision/ideas for the future state of capability building in organizations?

Unfortunately I cannot publish the one-pagers (I haven’t asked their permission yet), but I have disaggregated all their input into a list of Delicious links, a YouTube playlist and a GoodReads list (for which your votes are welcome). My input was as follows:

Humanistic design
We don’t understand ourselves well enough. If we did, the world would not be populated with bad design (and everything might look like Disney World). The principles that we use for designing our learning interventions are not derived from a deep understanding of the humand mind and its behavioural tendencies, instead it is often based on simplistic and unscientific methodologies. How can we change this? First, everybody should read Christopher Alexander’s A Pattern Language. Next, we can look at Hans Monderman (accessible through the book Traffic) to understand the influence of our surroundings on our behaviour. Then we have to try and understand ourselves better by reading Medina’s Brain Rules (or check out the excellent site) and books on evolutionary psychology (maybe start with Pinker’s How the Mind Works). Finally we must never underestimate what we are capable of. Mitra’s Hole in the Wall experiment is a great reminder of this fact.

Learning theory
The mental model that 99% of the people in this world have for how people learn is still informed by an implied behaviourist learning theory. I like contrasting this with George Siemens’ connectivism and Papert’s constructionism (I love this definition). These theories are actually put into practice (the proof of the pudding is in the eating): Siemens and Stephen Downes (prime sense-maker and a must-read in the educational technology world) have been running multiple massive online distributed courses with fascinating results, whereas Papert’s thinking has inspired the work on Sugarlabs (a spinoff of the One Laptop per Child project).

Open and transparent
Through my work for Moodle I have come to deeply appreciate the free software philosophy. Richard Stallman‘s four freedoms are still relevant in this world of tethered appliances. Closely aligned to this thinking is the hacker mentality currently defended by organizations like the Free Software Foundation, the EFF, Xs4all and Bits of Freedom. Some of the open source work is truly inspirational. My favourite example is the Linux based operating system Ubuntu, which was started by Mark Shuttleworth and built on top of the giant Debian project. “Open” thinking is now spilling over into other domains (e.g. open content and open access). One of the core values in this thinking is transparency. I actually see huge potential for this concept as a business strategy.

Working smarter
Jay Cross knows how to adapt his personal business models on the basis of what technology can deliver. I love his concept of the unbook and think the way that the Internet Time Alliance is set up should enable him to have a sustainable portfolio lifestyle (see The Age of Unreason by the visionary Charles Handy). The people in the Internet Time Alliance keep amplifying each other and keep on tightening their thinking on Informal Learning, now mainly through their work on The Working Smarter Fieldbook.

Games for learning
We are starting to use games to change our lives. “Game mechanics” are showing up in Silicon Valley startups and will enter mainstream soon too. World Without Oil made me understand that playing a game can truly be a transformational experience and Metal Gear Solid showed me that you can be more engaged with a game than with any other medium. If you are interested to know more I would start by reading Jesse Schell’s wonderful The Art of Game Design, I would keep following Nintendo to be amazed by their creative take on the world and I would follow the work that Jane McConigal is doing.

The web as a driver of change
Yes, I am believer. I see that the web is fundamentally changing the way that people work and live together. Clay Shirky‘s Here Comes Everybody is the best introduction to this new world that I have found so far. Benkler says that “technology creates feasibility spaces for social practice“. Projects like Wikipedia and Kiva would not be feasible without the current technology. Wired magazine is a great way to keep up with these developments and Kevin Kelly (incidentally one of Wired’s cofounders) is my go-to technology philosopher: Out of Control was an amazingly prescient book and I can’t wait for What Technology Wants to appear in my mailbox.

I would of course be interested in the things that I (we?) have missed. Your thoughts?

New Paradigms for Course Delivery

As I write this I am participating in two exciting courses. Each course is an example of how new paradigms for course delivery are coming to the fore in this online world. I will probably write more about both of them in the near future, but will kick off today with just a simple explanation of both courses.

Rapid eLearning Development
LearningAge Solutions has developed an online course about Rapid eLearning Development. I am a participant in the pilot group: I don’t have a course fee to pay, but have committed myself to giving weekly feedback so that the course can be fine-tuned.

The “Ministry of Instructional Design” (LearningAge Solutions)

Part 3D computer game, part social network, part collaborative learning, the ReD course will teach you how to build effective elearning and informal media using leading elearning author tools.

Designed by Rob Hubbard of LearningAge Solutions with input from some of the smartest people in the elearning industry including Clive Shepherd, Jane Hart and Patrick Dunn. This is a course unlike any other, designed to show how great elearning can be and built using tools that you too can master.

The way that this course is created/structured is smart and inspiring (regardless of the content which is good too). The course is made from a loosely coupled set of (mostly) free online web applications.

The core of the course is a private Ning network which has links to all the other parts of the course. This is the place where participants do reflective blogging and where people hand in their assignments and comment on other people’s assignments.

Mindmeister is used for mindmaps that contain the learning objectives for each module, ClassMarker contains a couple of knowledge checks/assessments, Dimdim delivers the web conferencing functionality and there is a 3D game made with the gaming technology from Thinking Worlds.

To me this type of course design shows that it is not necessary to assume that one single tool should deliver the full learner experience. It is perfectly viable to use a collection of tools and use each for its strengths. Once I have finished the course I will post a bit more about my experiences.

Connectivism and Connective Knowledge

This is the second year that George Siemens and Stephen Downes (actually my two favourite learning gurus) organise the “rather large open online course” Connectivism and Connective Knowledge. It is their attempt to destabilise the concept of a course.

The course is open to anyone. You attend freely if you do not need any university course credits, or you pay if you do. The course is decentralised (or maybe “loosely federated” is a better word): the two facilitators set out reading materials and organise a couple of webcasts every week, but the meat of the course is to be found in the discussions that participants have (online in Moodle forums) and the reflections that participants post on their blogs.

A single tag, CCK09, is used by all participants for their posts. This pulls the all the course activity together and makes it easy to find course related postings (e.g. on Twitter or in the blogosphere). By connecting to people with similar interests, it is possible to go on a tangent and explore the things that you want to work on in relation to connectivism and connective knowledge.

A daily newsletter is sent out. This is an edited version of the aggregated posts and discussions and includes commentary by Stephen Downes. Just reading the newsletter is already incredibly valuable.

I tried to actively participate in this course last year, but was not able to keep up with it. It requires a lot of discipline to study this way: there is no passive consumption of information. Instead it requires a lot of effort to select what you want to read and post your reflections. I hope I will be able to do better this year (although things are already not looking good right for that to be the case)!

What on Earth is RSS Cloud?

Arjen Vrielink and I write a monthly series titled: Parallax. We both agree on a title for the post and on some other arbitrary restrictions to induce our creative process. For this post we agreed to write about a new technology using Linux Format‘s “What on Earth is …?” style (see example on Android). We did not agree on a particular technology and we would get bonus points for a nice pixellated image to accompany the post. You can read Arjen’s post with the same title here.

RSS Cloud? I am getting a bit tired of this cloud computing trend.
Yes, I also think that cloud computing is slightly over hyped. However RSS Cloud is not about cloud computing. It is about bringing real-time updates to the RSS protocol.

I have only just grasped what RSS is. Only the technorati seem to use it, normal computer users have no idea.
Indeed: most people have no idea what RSS is or how they can use it. They still visit all their favourite news sites one after the other to check whether something new has been posted. However even people that don’t understand it often use it. If you download podcasts through iTunes you are using RSS technology. Furthermore RSS is the technological glue for many of the popular mashup sites. You don’t need to understand a technology for it to be useful to you.

Fair enough, so how would you explain RSS Cloud to a lay person?
Sites that have content that changes often (think blogs or news sites) publish an RSS feed on their server. Whenever a new item is posted it will be added to the feed, usually dropping the oldest item from the list at the same time. If you are interested in those news items you can use a news reader (also called an aggregator) and tell this news reader to check whether new items are added to the feed, if there is an update, then the news reader can retrieve it. A news reader typically does this every fifteen minutes or so. This means the news can be 15 minutes old when you get it. RSS Cloud makes it possible for news readers to subscribe to the updates of a feed. Whenever something new is added the feed, the RSS Cloud server notifies all subscribers so that they can pick up the content immediately: in real-time.

Another buzz word! What is the benefit of real-time? Can’t people just wait a couple of minutes before they get their news?
People listen to the radio so that they can hear the sports results in real-time. Weren’t you upset when all your friends knew about Michael Jackson’s death earlier than you, because they heard it on Twitter? The success of Twitter search and trending topics shows that people want to know about stuff as it happens and not fifteen minutes later.

Now that you mention it: Twitter indeed works in real-time. Why do we need something else, what’s wrong with Twitter?
Twitter actually also uses a “polling” model for its content. Each single Twitter client will have to access the Twitter API to see whether something new has been posted by the people you are following. This is a huge waste of computer resources. All these clients asking for new information even if there is none. It is a model that does not scale well. A “push” model actually works much better in this respect.

Oh, so it is a bit like the difference between getting your email once every couple of minutes and getting it immediately on your Blackberry?
Yes, that is a nice analogy. The Blackberry uses push email. You get the email as soon as it hits the server, because it is pushed to your phone. Traditional email clients, like Outlook, go to the server once every couple of minutes to see whether something new is there.

So what large company is trying to push this idea?
This time it is not a big company trying to establish a standard or protocol. The RSS Cloud protocol is designed by Dave Winer who also drafted the original RSS specification.

Dave Winer, isn’t that the guy that loves to rub people the wrong way?
He is a controversial character and is certainly very vocal and opinionated. At the same time, he is a true pioneer and one of those people that embody the values of the Internet. His vision for Cloud RSS is not about blogging. Instead, he wants to provide a decentralised architecture for microblog messages. To him the fact that Twitter centralises all the microblogging activity is a real vulnerability. His goal is to create a network that can work alongside Twitter without being in the control of a single company.

Talking about companies. I suddenly remember hearing about a similar technology. One of these cute names with many vowels?
You probably mean PubSubHubbub. This is a Google sponsored protocol that has already been implemented in Google Reader.

Great: another standards war. VHS versus Betamax, RSS versus Atom, Britney versus Whitney. Will we never learn?
This shouldn’t become a problem. RSS and Atom for example live happily next to each other now. It is easy to implement both. PubSubHubbub has a slightly different goal in comparison to Cloud RSS. It focuses mainly on blogging and associates itself with Feed Burner. The two technologies should be able to live next to each other, at least that is what Dave says.

Well, let’s hope he and you are right. By the way, isn’t this Cloud RSS just another sneaky way to measure subscribers, generate some statistics and store information about where they are from and what they are doing?
It is true that an RSS reader will have to register itself with the the RSS cloud for the protocol to work. However the RSS cloud forgets about the RSS reader if the registration isn’t renewed every 24 hours. You also have to remember that many people will use readers that do not support RSS Cloud. There are much better ways to get statistics.

Aren’t you a learning technology person? What does this have to do with learning?
I am very interested in Cloud RSS because I am a learning technologist! Like all new Internet based technologies it will only be a matter of time before some smart developer finds a way of using this in some unexpected fashion. Remember: technology creates feasibility spaces for social practice! Just think of what kind of course delivery models RSS has made possible: the Connectivism and Connective Knowledge course could not run without it for example.

You are a Moodle evangelist. Does Moodle support RSS Cloud yet?
I haven’t checked, but I doubt it. It is very new and the Moodle developers are focusing on getting Moodle 2.0 to a beta release. However, I am sure that in the future, parts of Moodle will move towards real-time. Imagine how Cloud RSS could be used to create activity streams or notify people of comments on their work. It could effectively bridge the gap between asynchronous activities like discussion forums and assignments and synchronous activities like web conferencing.

Ok, you have managed to pique my interested. Where can I go if I want to start using it?
There are two ways of using it. First, you can make your own feeds RSS Cloud enabled. If you have blog at WordPress.com this is automatically the case. You can opt-in if you host your own WordPress blog. The other way of using it would be to have an RSS reader that supports the protocol. Currently only River2 supports it and Lazyfeed has announced that it will support it too. Only web based readers can support it, as the RSS Cloud server needs to be able to ping the reader with the update.

Are there any sites that can tell me a bit more?
The current home of the protocol is http://www.rsscloud.org. Here you will find news about the protocol and an implementation guide. The Wikipedia entry could be better. Why don’t you help fixing it?

Online Educa Berlin 2008: Language

Language is still our prime tool for learning. I find language a fascinating subject and noticed a couple of things about language during the Online Educa.

First, Jay Cross. He was a panelist during the Battle of the Bloggers session. One topic they discussed was the financial crisis and how it could affect our profession. Jay said that if you are currently a Director of Training it would probably be smart to change your job title to something like Director of Sales Readiness (“we can’t let the director of sales readiness go…”). I think he is right. Language changes perception and a change in how you call something can significantly alter people’s behaviour. This is also the reason why I don’t like to use the Dutch word “allochtoon“: I think it has an unnecessary connotation of exclusiveness and us versus them.

Jay was very insightful about the other topics too, so I decided to go to the front desk an buy his book Informal Learning: Rediscovering the Natural Pathways That Inspire Innovation and Performance. I like how he consciously has put “performance” in the title of his book. That way he instantly disarms any suggestion that informal learning is just a pet topic for educational scientists. Instead, it directly addresses the issue that is central in the corporate world: “executives don’t want learning; they want execution. They want the job done. They want performance.”

The ability to adapt your language to the language of the client is one of the skills that any good consultant should have. Ton Zijlstra had an interesting take on this. We met at an Edublog dinner and one of the things we talked about was how he uses del.icio.us to find people who bookmark the same sites as he does, but who do this using different tags. If they use different tags for the same concepts it means they are in a different community or network. That is interesting, because they could be starting point for a whole set of new connections.