Archive

Archive for October, 2009

Machine Learning – Day 5

October 28th, 2009

So that’s the end of the taught course in Machine Learning, finishing up learning about Markov Chains and Hidden Markov Models.

Yep, those are just links to the Wikipedia articles, and it’s quite possible that if you clicked on them and you’re anything like me, the crazy-looking maths stuff at the other end made the sensible part of your brain run off and sit in a corner, clutching its knees and rocking gently back and forth.

Probably muttering to itself.

To be honest, I can’t really explain what this stuff is about just yet – I’ve had a lot crammed into my head over the past few weeks, and I think I need a really good night’s sleep before I can comprehend the deeper aspects of this last bit. Suffice to say for now that it seems like some really interesting and powerful ideas are in play, and when I’ve got my head round it I’ll blog up my thoughts.

I’ve now got one more homework assignment on today’s material to complete by next Wednesday, and the project we’ve been assigned to do is then due on Friday 6th November – a nice surprise, as a typo on the schedule had us believe it was due on the previous Tuesday.

I’m sorry the taught part of the course is done, to be honest. Although I’m not sure I could have taken any more at the pace it was being taught, I’ve thoroughly enjoyed the material.

In fact, I’d say I feel a little inspired.

And, as James Brown might say – it feels good.

Paul Brabban Computer Science, MSc, Machine Learning

Machine Learning – Day 4

October 22nd, 2009

Day 4 covered methods of automatically identifying clusters in data – and some of the issues that arise using those techniques.

Doing this automatic identification is called unsupervised learning, because it doesn’t depend on having a set of labelled data examples to hand. The learning is done purely based on the statistical and probabilistic properties of the data itself.

I got to say, I’m struggling with the probablistic side of things – my intuition isn’t helping me much, so I’ve been doing the books thing to try and really get my head round it. So little time…

We also covered some techniques involving reducing the dimensionality of data – say a dataset has a thousand properties, and the computational overhead of processing increases with the number of properties. You’ll need some way of reducing the number of properties, whilst retaining the maximal information they encoded. We were looking at selecting features a couple of weeks ago, but today we looked at PCA – Principle Component Analysis, a technique to ‘project’ information into a smaller number of dimensions, or properties.

I quite like this paper on PCA, if you’re looking for somewhere to get an idea what it’s about.

And that, if you were reading this blog a few weeks ago, is where the eigenvalues and eigenvectors come in.

We also have a project to complete in the next couple of weeks, so time is very much of the essence right now. I suspect that as with the Semi-Structured Data and the Web course last year, the deeper concepts behind some of this material will only become clear to me when I’ve completed the set material and start to work on the revision of what we’ve covered.

Back in the day, revision was time off between lessons and exams – these days, not so much!

Paul Brabban MSc, Machine Learning

about:robots

October 18th, 2009

Welcome to my shortest blog post to date.

If you’re a Firefox user and you’ve not yet typed about:robots into your address bar, you’re missing out on an Easter Egg.

That concludes today’s public information announcement.

Cheers!

Paul Brabban Uncategorized