Setting up my Project Website
One of the assessed deliverables for my MSc project is a project website, so I’ve been having a bit of a setup session this weekend.
The objectives set for the website are a little… what’s the word… vague? See what you think:
A multipage website summarizing the work so far.
- Objectives
- Deliverables
- Plan
- Literature
That’s it as far as I can tell. Exactly how will the delivered work be assessed? Your guess is probably about as good as mine. Having looked at the discussion forum for the module (the full-timers did this in the first half of the year – I’ve been told I set my own deadlines when it comes to the project stuff as I’m not a full-time student) it seems that the marking scheme was quite severe with many complaints about low marks and little evident explanation, so I’ll make some enquiries before I start work on the content proper.
Back in April, I asked how the website deliverable should be ‘handed in’ and was told that a zip with some files in it would be fine.
Screw that.
I mean, seriously – the world has moved on. To be even vaguely interesting, I’m thinking about reusing relevant content from this blog, and some of the tooling I’m using like Ganttproject saves XML data that’s crying out for some transformation and JavaScript magic. I have my own domain name and there’s an opportunity here to learn some stuff about infrastructure (and I am doing this MSc. to learn stuff in the first place), so I’ve been setting up a server. Again, checking back on the forums, some of the other students went the same route and there’s no evidence of it harming their chances. I think hosting the project website as a subdomain of crossedstreams.com makes sense – I already own the domain name and subdomains are a simple matter of extra DNS records, which is dead easy to set up with my provider, getNetPortal.
I shan’t be hosting my site on getNetPortal though. As I spend most of my professional life working on the Java EE platform, Java is the obvious choice. Why not use a different language for the experience? Whilst I’ve got the time to learn a bit about hosting a public-facing website, I’m not sure I’ll have the time to learn a new way of creating websites that I’ll be happy with… not to mention that there’s a toolset and delivery pipeline that varies from platform to platform. Playing about with Erlang or some such will have to wait for another day.
GetNetPortal do host Java web applications, but it’s a shared Tomcat environment with a bunch of limitations as well as apparently risks to other people’s app availability if I deploy more than three times in a day. So where else can I go? Other specialised hosting companies are out there, but they’re not exactly cheap…
So I’ve provisioned myself a server on Amazon’s Elastic Compute Cloud (Amazon EC2). Amazon provide a bunch of images themselves and one of them happens to be a Linux-based 64bit Tomcat 7 server. Time between me finding the image I wanted and having a working server available? About five minutes. No matter how you cut it, that’s pretty awesome. To be honest, the biggest challenge was choosing an image – there’s a huge number to choose from and I tried a couple of other images that weren’t as well set up before settling on the Amazon-provided one. The best thing – EC2 is pay-as-you-go, at dirt cheap rates for low utilisation.
For those of you who haven’t seen EC2, here’s a couple of screenshots that might help explain what it’s all about. First up, let’s take a look at the application server I provisioned.
Checking my bill tonight, I can see an itemised account of exactly what I’ve been billed for. Being able to see this level of detail should let me stay in control of what I’m spending.
The rest of my time has been spent having a look around my new server, setting up Tomcat (to host a placeholder app in the root context) and iptables (to route traffic from the privileged ports 80 and 443 out to the ports Tomcat is listening on – 8080 and 8443 – thus avoiding the need to install a dedicated webserver or run Tomcat with root privileges), setting up some self-signed SSL certificates (I’ll need those so that I can bring up apps that require logon – without SSL, those usernames and passwords would be floating around the internetz in clear, negating the point of their existence) and finally scripting up the setup process in case I need to set this stuff up again.
Now, I can tick off the project tasks around setting up hosting nice and early. Quite a productive weekend!
In: Development, MSc, Project · Tagged with: mscproject
Planning my Project
It’s been a bit quiet on crossedstreams.com for the past month or so. Between lots of great stuff going on at work keeping me very busy, some Stag Do related shenanigans and working on my project, here hasn’t been much time for blogging.
In order to complete my MSc, I need to complete a project and produce a dissertation. In addition there is a pre-requisite module that sets up the project, requiring the submission of a project statement, a project plan, a project website and a project background report. It’s these aspects I have been working on.
Additional complexity is introduced by my choice to prepare my own project involving what I do for a day job. This introduces certain additional hoops that need to be jumped through that happen to take a fair bit of time and effort, but wih any luck those hurdles are nearly cleared now and the actual work can kick off properly.
In: MSc, Project · Tagged with: mscproject
Ubuntu, Fedora or Mint?
About a month ago after I finished my last module, I upgraded to the latest Ubuntu release, 11.04 or ‘Natty Narwhal’. My first impressions over the course of a week or two were sufficient to have me go looking elsewhere.
There were some big problems.
Ubuntu 11.04
The new Unity interface, whilst it’s very pretty, is totally unfamiliar and feels rather like a toy. The menus I used to start applications from are gone, the taskbar I used to see what was running and place shortcuts on is gone. Now to start a program there’s a glossy, full screen… thing… it’s a bit like a menu… but takes up the whole screen with big Fisher-Price icons. To see what’s running at a glance… I can’t. The idea where the title bar of a window with the window buttons and menus isn’t attached to the window and appears at the top of the screen… seriously? I hear that this idea is nicked from Apple – but it really doesn’t work for me.
I guess the idea is that you type the name of the application instead of finding it in the menus. Nicked from Windows 7, I think. If I want to find and launch applications by typing their names, I use the command line – I’m not sure I get how search instead of menus is a step forward.
Then there was the speed, or rather, the total lack thereof. Using my computer went from effortless to wading through treacle. In snowshoes. I notice performance tips and tweaks guides for 11.04 starting to appear out there, so it’s not just me. The poor performance was the dealbreaker.
Fedora 15
I downloaded Fedora 15, having previously been a user of that distro. I know that 15 ships with Gnome 3, but I didn’t realise it would be so similar to Unity, with all the same bizarre UI quirks. On the bright side, it was a lot snappier… but all in, still not really usable.
Mint
So yesterday, I pulled Linux Mint 11 off the shelf and I’m happy to say that it is a joy to use. Menus, task bars, windows that work properly, fast, easy to set up. Back to business as usual. If you’re not loving the Gnome 3/Unity thing, I can recommend Mint (so far, based on 24h usage… mileage may vary!)
Serious or Casual?
With my immediate problems addressed, the direction that Gnome and Unity are taking for Linux is interesting. Are we seeing the Linux windowing systems fragment into serious and casual usecases? I can see how the new UI might be familiar and easy for someone who is used to their tablet or their smartphone. Maybe it’s also good angle for relatively small screen devices like netbooks and tablets – certainly the apparent ‘every pixel is precious’ mindset doesn’t make much sense on a big widescreen monitor.
I expect that broadening the appeal of an operating system is a good thing, and perhaps Ubuntu and Fedora are setting their stalls out as ‘for the casual user’. If that’s so, then thank goodness for distros like Mint that give folks who use their computers to do work the power of old(er) school Linux without the pain.
Essays on the State of the Art and Future of Text Mining
The coursework for this Text Mining module has been quite challenging. Each week we had a task to complete, along the lines of evaluating training of a part-of-speech tagger (a piece of software that tries to tag words with the part of speech they serve), or create a named entity recogniser (a piece of software that tries to work out that some sequences of words have meaning above their component parts – for example “New York” means something different to “new” and “York”) using various methods. As I’ve worked through though, the goals have become clear – we were building up components that could work in sequence to process text. Neat.
One aspect of the coursework that was unusual was that it is all to be handed in together at the end, rather than week by week. If I’m honest it’d probably have been a little easier if I’d done the coursework in step with the lecture days – I actually fell a little behind because of various commitments.
Then there was the essay. A 3,000 word essay on the state of the art of text mining and my views for the future of the field.
I’ve not written an essay for at least 15 years now, and getting started was a real challenge. Text mining and Semantic Web maybe? Sentiment analysis is the future? I was pulling my hair out, trying to find an angle that I could argue cleanly though, citing academic research and the like. I’ve been screwing up outlines on bits of paper about a week now!
That said, when I headed into Manchester yesterday and sat in my lectures, I had something of an epiphany. I guess the problem was that I feel the field has huge untapped potential, and I struggle to argue through a point of view I care about when I can’t see the current approaches panning out. I’m going to take a bit of a risk, and write an essay that (constructively) criticises some aspects of text mining today, proposing and arguing through a slightly different approach.
We’ll see how it goes – the last few bits of paper have so far avoided a one-way ticket to the bin. Hopefully I can produce a well-argued, reasonably interesting essay that I’ll get some marks for!
Text Mining – Day 4
Between prep for my MSc. project, getting married, snowed under at work, starting the my next MSc. module and being full of cold, there hasn’t been much time for blogging…
So today was day 4 of the Text Mining module. As a friend put it, “Text Mining? What – like using grep?”
Text Mining is defined as finding previously unknown information in unstructured data. Unknown – as in never explicitly written down.
So by ‘text’, we mean un- or partially-structured data, like word documents or this blog page. There’s some structure here, headings, subheadings, lists and the like. but it’s not ‘structured’ in the sense that database tables are, with fields and columns and a type system.
Tools like grep can match words (more generally, expressions describing relatively simple patterns of characters called regular expressions), so whilst they’re fairly easy to use (so long as you don’t try to push them too far), they are limited in the complexity of what they can do.
For example, you can’t easily use grammatical ideas, like identifying documents that are about fish (a fish), but not fishing (I fish). You can’t search for documents related to a concept, and recognising generic names or technical terms is out. You can’t build structures like indices to help with searches, which means that over reasonably large collections of documents, grep is too slow to be very useful.
I’m still getting my head around how it hangs together, but text mining seems like a set of gloriously messy, pragmatic and seemingly pretty successful ways to let computers listen in on the languages that humans have evolved.
In: Computer Science, MSc


