Before graduating high school, I’d been a paperboy, a
bagboy, a dishwasher, facilities engineer for a ski resort (garbageboy), then moved up to
rentals at that same ski resort. One of the primary reasons I picked the college
I did was that it had a full-time job that I could take
advantage of right away to earn some money, and more importantly, get experience in their
computer department.
That first “professional” experience was helping
connect the dorms to a brand new Ethernet network, the first dorm network
connectivity. Before that, all they had were VAX OpenVMS terminals. Speaking of the
VAX, the bulk of my time was spent managing the VAX and backups for it as well
as 2 brand new DEC Alphas. At the time, making sure those backups were legit
was my most important job. Those tapes were to a tape drive that was already an
antique when I had to start watching over it. There were dailies and weeklies
and monthly fulls, 100% of this was manual, and I did it from a line printer,
not a monitor.
I could get called at any time of any day, and I had to
restore any files accidentally deleted or corrupted from the old winchester
hard drives attached to the VAX. Imagine an irate professor working
late into the night on a weekend to finish research and losing a file for
whatever reason. I was the person who needed to respond and fix it immediately without complaint. There was one occasion where the files were
corrupted, and I spent an entire weekend not able to recover a professor’s
culmination of a semester’s worth of chemistry research. I almost lost my job,
almost lost my income to pay for tuition, almost lost any credibility in the
CompSci department, pretty much almost lost everything I had worked for up
until that weekend. This was one of the lowest points of my life and one of the
turning points of my career, despite it basically being “undifferentiated heavy
lifting,” and I was still making minimum wage, had a timecard, and
was at the bottom of the ladder career-wise. I bet that professor still
remembers me, if only to wonder whether I've been run over by a truck.
Luckily, I didn’t lose my job and got a second chance. I
picked up an additional job working at a thermography company writing printer
drivers for an AS/400. Again, basically lowest rung of the ladder to get
experience, but this company’s core business was printing very elaborate
wedding invitations, graduation announcements, etc. Again, this was a job that
held the lowest pay and the highest responsibility, because if the AS/400
couldn’t print, the entire business was at a standstill. Since those early
days, I knew, quite viscerally, that I could never get comfortable where I was.
Fast-forward to my career at Nutanix today. I talk to
customers on a daily basis about running HPC, Big Data, and container workloads
on baremetal. Don’t get me wrong, baremetal is worthy competition as there is
nothing so self-service as your own dedicated, brand-new hardware. In my
career, I worked for Argonne National Labs and there is no one in the world
with a longer, more respectable track record for managing baremetal at scale
than the Department of Energy labs. With a batch scheduler or even a
multi-framework scheduling distributed system like Mesos, the baremetal becomes
a distributed pool of compute. With HDFS or Elasticsearch or Cassandra for
example, baremetal becomes a distributed pool of persistent storage.
So why not just use baremetal for these workloads? Well,
Hadoop, for example, is great at distributed resiliency, however it does not
manage the hardware for you. Sure, a drive can fail, nodes can fail,
top-of-rack switches can fail, but does Hadoop recover failed hardware?
Brand-new baremetal is great, but how long is that expected to last? What is
the amortization and depreciation schedule? Just like driving a new car off the
lot, new hardware innovation driven by silicon and server vendors is still in an
ever-escalating competition so that by the time that fancy brand-new hardware
is installed, it's already depreciating and may be rendered obsolete
relatively quickly. The advent of “software-defined” has not slowed down that
deathmarch.
There have been many tools created to alleviate these
concerns and make it easier to handle hardware management. Cobbler, Razor, and
now RACKHD, for example, are stabs in the right direction. Web-scale companies
who maintain public clouds or just a ton of infrastructure and services like
Facebook and Twitter have built the necessary tooling to scale their own
hardware management efforts, but how is this composable or consumable outside of
their respective platforms? Not to mention there’s simplifying hardware
compatibility and then there’s trying to accommodate any hardware where the
rows and columns of interoperability represent an exponentially growing
opportunity for issues. Where Nutanix really shines is the infrastructure, the
tooling, and the team behind making this the platform for simplifying hardware
management for myriad applications at scale.
These baremetal clusters running Hadoop or Mesos are truly
responsible for the life-blood of the business, from its data to its
second-by-second operation. If you’re running on baremetal, to borrow from my
early experiences changing tapes and tweaking printer drivers, you are still
stuck spending time on the most menial part of the infrastructure. More value is derived from the systems and data built on the hardware not just the hardware itself, which should be no surprise, so why not spend more time on that? From
the H. G. Wells novel, you are dependent on the Morlocks, those tape-changing,
baremetal-replacing denizens of the datacenter, to keep up their ceaseless, yet
thankless duties. Where I see customers able to take advantage of Nutanix is
shifting the time spent to more fruitful pursuits to expanding their intelligence
and their careers. Besides full-time Morlocks, plenty of people can get trapped
into doing this part-time, beholden to esoteric troubleshooting of the nuances
of hardware.
“There is no intelligence where there is no
need of change.” - H.G. Wells, The Time Machine
I can imagine if I had not tried to advance my career from
changing tapes, that I easily would not be where I am today. If I had been
content swapping tapes and performing on-demand restores 24x7, I would have
been miserable until I was obsolete. If I had been content configuring ‘bin’
files and configuring Symmetrix nights and weekends, I would have been
miserable until I was obsolete. And so
on with just building VM’s and workflows for managing VM’s and hypervisors.
“An animal perfectly in harmony with its
environment is a perfect mechanism. Nature never appeals to intelligence until
habit and instinct are useless. There is no intelligence where there is no
change and no need of change. Only those animals partake of intelligence that have
a huge variety of needs and dangers.” - H.G. Wells, The Time Machine
Of course, this is nothing new from what AWS or other public
clouds accomplish for their customers. How much hardware management do I have
to do for my AWS usage? Absolutely zero. It has always been zero, and I expect
it to always be zero. One of AWS’s secrets to success, in my opinion, is that
it emulates the feeling of getting brand-new hardware all the time. If I want a
new instance, it’s just like brand-new and only an API call away, cost-permitting
of course.
Why turn very smart, very ambitious people into Morlocks, if
you’re making your admins spend their critical career-time on provisioning,
managing, and troubleshooting hardware? Instead, help them focus on the
next-generation of applications or analytics or programming frameworks that
make them grow. Help them be heroes to their partners or teams or maybe most
importantly to themselves.
“We should strive to welcome change and
challenges, because they are what help us grow. With out them we grow weak like
the Eloi in comfort and security. We need to constantly be challenging
ourselves in order to strengthen our character and increase our intelligence. ”
- H.G. Wells, The Time Machine