An argument against out of hours system maintenance

Late night again...
Late night again…

“You have no problem with out of hours work, right?”

It’s that question during the interview that causes a knot in your stomach. You hope, even pray, that what they mean is the occasional bit of emergency response work or a crunch period at a critical part of a project, both of which you have no issue with. But after you accept that job you find out that the work is routine system maintenance.

I’m going to put forward the argument that in this, dare I say it, golden age of automation, continuous deployment, commodity hardware, virtualisation, micro-services and other cloud-related fluff that there is very few situations why this work needs to be done out of hours. If anything using this model incurs additional risk.

(To any prospective employers out there, don’t take this as my refusal to do such work, this piece is simply about thinking about the issue differently)

Read more

Maximising outcomes by exploiting knowledge (and how things can still go wrong)

Sunrise on approach to Sydney
Sunrise on approach to Sydney

November 2015 was when I went on my third trip to the US.  This trips have presented an interesting experience in how one can learn things and then leverage that knowledge.  On my first trip, I was a complete novice.  I’d never planned a trip like this myself within Australia, never mind the complications of going overseas.  By exercising a lot of caution and care, I was able to minimise the chances of things going wrong.

Following that trip, I began a period of rapid education about travel and flights and all that stuff.  I quickly found out that the seats I had booked on the first trip were pretty bad (thanks Seatguru).  The second trip involved going on a slightly higher level of fare to mitigate some of the unpleasantness of sitting in economy for 14 hours, as well as just the experience that I might not get again.  All of the planning the first and second time involved a lot of manual work in Google Calendar, inputting place holder appointments for flights and hotel stays and other engagements, trying to keep a handle on the multiple time zones involved.

The third trip was the new apex in planning.  I managed to get all my air fares at very good prices, structured them for maximum benefit to my chosen frequent flyer account.  I also decided to put in for a points upgrade, given that doing so was one of the most effective ways to use the points and I now had a reasonable good status with the airline so my chances were good.  This time I also decided to use TripIt, which made the calendar population much easier this time around and essentially eliminated the need for paper copies of things (I still had them anyway).

But even with that level of things, some things still went wrong.  The hotel I stayed at in LA was still under renovation, this time it was the small convenience store I would visit for light snacks and coffee.  Ongoing sleep problems also came into play, as well as a few other things, some of which were out of my control.  Those will go into the lessons learned pile for trip #4.

US Trip 2015 Travel Report

This was my third trip to the US.  What began as a crazy idea to see a Blizzcon convention before they went down in quality has turned into an annual trip of balancing enjoyment with squeezing “maximum efficiency” from everything in sight.

This is pretty long, so get ready…

Read more

vCloud Air Test Experience

vCloud Air is VMware’s public cloud offering, similar to Amazon’s AWS or Microsoft’s Azure. The key distincion between vCloud Air and these other offerings is that vCloud Air uses VMware’s products such as vSphere.

The VMWare User Group (VMUG) recently added free credits on vCloud Air OnDemand as part of their EVALExperience program. As the name suggests, vCloud Air OnDemand is a pay-as-you-go service. I looked at this service offering as a server engineer with a reasonable background in VMware, considering aspects such as the ease of basic tasks, general administration, technical considerations for the business (good and bad) and how it compares to other offerings.

Read more

CeBIT Australia 2015 Experience

This year was the first time I had attended CeBIT, with a primary motivator being the Cloud Conference component, with that component being of interest personally and professionally, as well as the fact that events of this size are rarely held in Perth.  The speakers for the Cloud Conference covered private enterprise and government, giving a broad view of how cloud was making IT work better.

The first speaker was Chris C Kemp, former CTO at NASA and co-founder of OpenStack.  He spoke about the concepts of anti-scarcity, where a thing can be made more valuable by making it more freely available and accessible because more parties are involved and invested in the item.  This concept applies strongly with open source software and OpenStack in particular, which because as an internal project at NASA.  By allowing it to be available to all, OpenStack now has support and investment from large IT vendors such as HP, IBM and Cisco.

David Boyle from NAB opened his presentation by stating he wouldn’t use the “c word” (cloud).  He managed to stick with his promise, and talked about the concepts of traditional “horse and cart” IT, the model we’ve used in the past of physical infrastructure, lengthy release cycles and waterfall development.  This was contrasted with “Ferrari” IT, which uses virtualisation, frequent release models, continuous deployment, automation and dev ops.  A key concept he outlined was “fail fast” – having a deployment framework that can be run rapidly so success or failure can be determined quickly and subsequent deployments attempted once problems have been fixed.

Read more

Hands on with the Lenovo Thinkpad X230t

One of the fortunate things of working in the SCCM space is you sometimes get first shot at a new piece of hardware when it comes in.  In this case, it was the Lenovo Thinkpad X230t, a convertible laptop.

The tablet has become a disruptive technology since the iPad really burst onto the scene and one of the outcomes of this is the convertible laptop, a device that can be used as a table but still managed as a native Windows device and have the benefit of whatever Windows software you have available.

The X230t is the first Thinkpad I’ve managed to use in a serious way and having heard about the legendary quality when they were made by IBM, it seems that has carried over with Lenovo.  The construction feels good, not tacky or cheap.  The only point of physical design I wasn’t overly comfortable about was the swivel point to convert between laptop and tablet modes.  The swivel can only twist one way and seems to be betting to be twisted the wrong way and broken by a user.  The only other sore point I had with the physical design was the location of the 3G card slot – it’s located under the battery, requiring you to take out the battery to access it.

In operation, the machines I used were quite fast thanks to the SSD in them.  They come with a stylus for using the touch screen and there’s a nice little slot to put the stylus when not in use. The screen seems to have a matt coating of some sort, possibly to prevent scratches/damage from the touch use.  Using the 3G slot was good, as the device appears as another wireless style connection, meaning you don’t have to deploy/install the proprietry application your 3G provider has you use.

The last point to note is Lenovo has come to the party with SCCM support and added driver packs similar to what HP and Dell are doing.  This made adding the X230t to the existing SCCM setup quite easy.

Office 2010 products MIA in SCCM 2007 reporting

One of the curious things about SCCM 2007 is the number of hot fixes it has to fix what are (in my experience) relatively common problems.  One of these came up when I wanted to do a report of how many Office 2010 installations there were versus other versions and reconcile those against the installed base of Windows 7 machines the RAC had.  The idea was to see how many machines were out of MOE compliance and start remediating them.

I got a bit of a shock when I started doing reporting and no numbers, not even entries, would appear for Office 2010 products, including Visio and Project.  Some of the more recent server products also weren’t listed (although this was a secondary concern at the time).

I soon found out there is not just one hot fix for this, but at least 3 have been issued, going back as far as November 2010 at least.  Following a change request, I got the latest version of the hot fix in and the reporting started working correctly.

Voting keypads and bluescreens

One of the more interesting problems I was given at RAC is the issue of Turning Point voting keypads causing computers to blue screen.

For those who don’t know, the keypads allow things like voting by an audience using little keypads.  The signal is sent wirelessly to a USB receiver plugged into a computer.  Following the roll-out of RAC’s new Managed Operating Environment (MOE), plugging in one of the receivers would cause the computer to blue screen.  I was tasked with fixing it.

The initial route I followed was to examine the blue screen dump files using the Windows Debugging tools.  Doing so pointed to the drivers for the receiver being the cause and after googling a bit, there did seem to be some supporting evidence that these style of devices didn’t play well on Windows 7.

The change in troubleshooting approach came from the fact that, if the device was plugged into a freshly un-boxed HP machine which hadn’t been re-imaged yet (that is, it was running the manufacturer image and drivers), it would work.  There was a date difference between the drivers, with the ones in the MOE being slightly older.

Going on the theory that one of the older drivers in the MOE was causing the blue screens in some way, I downloaded the latest drivers from HP and applied them one at a time, testing the receiver each time.  Eventually I hit a point where the blue screens stopped and the culprit was the fingerprint scanner’s driver.

I reproduced the fix on a few other machines with success, so there must’ve been some incompatibility between the MOE’s finger print scanner driver and the keypad receiver.

Getting serious about a home lab – Part 2 (The hardware arrives)

With my new lab, I ended up deciding on a two server approach to address the issues mentioned in part one – one server will purely be the “brains”, performing the virtualisation functions, while the other server will provide storage.  In line with that, I ended up getting the following parts:

Virtualisation Server

  • 1 x Intel i7 3930K
  • 1 x Intel DX79SI motherboard
  • 8 x Kingston 8GB RAM (64GB total)
  • 1 x Corsair 120GB SSD
  • 2 x Intel gigabit NICs
  • 1 x Fractal Design Define XL case

Storage Server

  • 1 x Intel i7 3820
  • 1 x Gigabyte GA-X79-UD3 motherboard
  • 1 x Kingston 8GB RAM
  • 1 x Corsair 180GB SSD
  • 9 x Hitachi Ultrastar A7K3000 (2TB) HDD
  • 2 x Intel gigabit NICs
  • 1 x Fractcal Design Define XL case

I also bought a Netgear gigabit switch and put an Adaptec RAID controller into the storage server.  The 2 additional NICs in each server are for iSCSI traffic with their own IP range.

Getting serious about a home lab – Part 1 (The high concept)

Back in November I posted about the implementation of what was essentially version 2 of my home lab on the VMWare ESX platform, moving an i3 with 8GB of RAM to an i7 with 16GB and proper storage on a RAID controller.  The problem with that environment is it only really served the needs of my “production” virtual machines and hasn’t allowed me to expand further as about 14GB of the 16GB total is allocated out (even with the VMs being under-provisioned in terms of RAM).

Running in parallel with this is my desire to get some traction on my certifications and having a good virtualisation platform that I can quickly set-up lab environments on is an important part of this.  Thus begins the project, to implement version 3 of my home lab environment.  The broad hardware bits of the “high concept” are:

  • Capable of supporting my current virtual environment with better performance (essentially meaning I can allocate more correct amounts of RAM to my current VMs)
  • Capable of scaling up to support multiple simple lab environments or singular complex lab environments
  • Address the storage issues with v2 of the home lab

In terms of software or process, the outcome I’m hoping for is fairly high levels of automation leveraging the feature set available to me in my current available software pool.  Similarly, in the future I could use this model for a testing environment at work, which again means leveraging technologies that may be licensed with (ie. standard vSphere, SCCM, etc) as opposed to the best solutions possible that they may not be licensed for (some of the automation-heavy vmware products come to mind).

Going forward I hope to document the process I’ll be going through in making the home lab, including research and testing to get the end result.