Web Admin Blog

Real Web Admins. Real World Experience.

A Case For Images

After speaking with Luke Kanies at OpsCamp, and reading his good and oft-quoted article “Golden Image or Foil Ball?“, I was thinking pretty hard about the use of images in our new automated infrastructure.  He’s pretty against them.  After careful consideration, however, I think judicious use of images is the right thing to do.

My top level thoughts on why to use images.

  1. Speed – Starting a prebuilt image is faster than reinstalling everything on an empty one.  In the world of dynamic scaling, there’s a meaningful difference between a “couple minute spinup” and a “fifteen minute spinup.”
  2. Reliability – The more work you are doing at runtime, the more there is to go wrong.  I bet I’m not the only person who has run the same compile and install on three allegedly identical Linux boxen and had it go wrong somehow on one of ’em.  And the more stuff you’re pulling to build your image, the more failure points you have.
  3. Flexibility – Dynamically building from stem cell kinda makes sense if you’re using 100% free open source and have everything automated.  What if, however, you have something that you need to install that just hasn’t been scripted – or is very hard to script?  Like an install of some half-baked Windows software that doesn’t have a command line installer and you don’t have a tool that can do it?  In that case, you really need to do the manual install in non-realtime as part of a image build.  And of course many suppliers are providing software as images themselves nowadays.
  4. Traceability – What happens if you need to replicate a past environment?  Having the image is going to be a 100% effective solution to that, even likely to be sufficient for legal reasons.  “I keep a bunch of old software repo versions so I can mostly build a machine like it” – somewhat less so.

In the end, it’s a question of using intermediate deliverables.  Do you recompile all the code and every third party package every time you build a server?  No, you often use binaries – it’s faster and more reliable.  Binaries are the app guys’ equivalent of “images.”

To address Luke’s three concerns from his article specifically:

  1. Image sprawl – if you use images, you eventually have a large library of images you have to manage.  This is very true – but you have to manage a lot of artifacts all up and down the chain anyway.  Given the “manual install” and “vendor supplied image” scenarios noted above, if you can’t manage images as part of your CM system than it’s just not a complete CM system.
  2. Updating your images – Here, I think Luke makes some not entirely valid assumptions.  He notes that once you’re done building your images, you’re still going to have to make changes in the operational environment (“bootstrapping”).  True.  But he thinks you’re not going to use the same tool to do it.  I’m not sure why not – our approach is to use automated tooling to build the images – you don’t *want* to do it manually for sure – and Puppet/Chef/etc. works just fine to do that.  So if you have to update something at the OS level, you do that and let your CM system blow everything on top – and then burn the image.  Image creation and automated CM aren’t mutually exclusive – the only reason people don’t use automation to build their images is the same reason they don’t always use automation on their live servers, which is “it takes work.”  But to me, since you DO have to have some amount of dynamic CM for the runtime bootstrap as well, it’s a good conservation of work to use the same package for both. (Besides bootstrapping, there’s other stuff like moving content that shouldn’t go on images.)
  3. Image state vs running state – This one puzzles me.  With images, you do need to do restarts to pull in image-based changes.  But with virtually all software and app changes you have to as well – maybe not a “reboot,” but a “service restart,” which is virtually as disruptive.  Whether you “reboot  your database server” or “stop and start your database server, which still takes a couple minutes”, you are planning for downtime or have redundancy in place.  And in general you need to orchestrate the changes (rolling restarts, etc.) in a manner that “oh, pull that change whenever you want to Mr. Application Server” doesn’t really work for.

In closing, I think images are useful.  You shouldn’t treat them as a replacement for automated CM – they should be interim deliverables usually generated by, and always managed by, your automated CM.  If you just use images in an uncoordinated way, you do end up with a foil ball.  With sufficient automation, however, they’re more like Russian nesting dolls, and have advantages over starting from scratch with every box.

A XSS Vulnerability in Almost Every PHP Form I’ve Ever Written

I’ve spent a lot of time over the past few months writing an enterprise application in PHP.  Despite what some people may say, I believe that PHP is as secure or insecure as the developer who is writing the code.  Anyway, I’m at the point in my development lifecycle where I decided that it was ready to run an application vulnerability scanner against it.  What I found was interesting and I think it’s worth sharing with you all.

Let me preface this by saying that I’m the guy who gives the training to our developers on the OWASP Top 10, writing secure code, etc.  I’d like to think that I have a pretty good handle on programming best practices, input validation, and HTML encoding.  I built all kinds of validation into this application and thought that the vulnerability scan would come up empty.  For the most part I was right, but there was one vulnerability, one flaw in particular, that found it’s way into every form in my application.  In fact, I realized that I’ve made this exact same mistake in almost every PHP form that I’ve ever written.  Talk about a humbling experience.

So here’s what happened.  I created a simple page with a form where the results of that form are submitted back to the page itself for processing.  Let’s assume it looks something like this:

<html>
 <body>
  <?php
  if (isset($_REQUEST['submitted']) && $_REQUEST['submitted'] == '1') {
    echo "Form submitted!";
  }
  ?>
  <form action="<?php echo $_SERVER['PHP_SELF']; ?>">
   <input type="hidden" name="submitted" value="1" />
   <input type="submit" value="Submit!" />
  </form>
 </body>
</html>

It looks fairly straightforward, right? The problem has to do with that $_SERVER[‘PHP_SELF’] variable. The intent here is that PHP will display the path and name of the current page so that the form knows to submit back to the same page.  The problem is that $_SERVER[‘PHP_SELF’] can actually be manipulated by the user.  Let’s say as the user I change the URL from https://www.webadminblog.com/example.php to https://www.webadminblog.com/example.php”><script>alert(‘xss’);</script>.  This will end the form action part of the code and inject a javascript alert into the page.  This is the very definition of cross site scripting.  I can’t believe that with as long as I’ve been writing in PHP and as long as I’ve been studying application security, I’ve never realized this.  Fortunately, there are a couple of different ways to fix this.  First, you could use the HTML entities or HTML special character functions to sanitize the user input like this:

htmlentities($_SERVER[‘PHP_SELF]);

htmlspecialchars($_SERVER[‘PHP_SELF]);

This fix would still allow the user to manipulate the URL, and thus, what is displayed on the page, but it would render the javascript invalid.  The second way to fix this is to use the script name variable instead like this:

$_SERVER[‘SCRIPT_NAME’];

This fix would just echo the full path and filename of the current file.    Yes, there are other ways to fix this.  Yes, my code example above for the XSS exploit doesn’t do anything other than display a javascript alert.  I just wanted to draw attention to this issue because if it’s found it’s way into my code, then perhaps it’s found it’s way into yours as well.  Happy coding!

Agile Operations

It’s funny.  When we recently started working on an upgrade of our Intranet social media platform, and we were trying to figure out how to meld the infrastructure-change-heavy operation with the need for devs, designers, and testers to be able to start working on the system before “three months from now,” we broached the idea of “maybe we should do that in iterations!”  First, get the new wiki up and working.  Then, worry about tuning, switching the back end database, etc.  Very basic, but it got me thinking about the problem in terms of “hey, Infrastructure still operates in terms of waterfall, don’t we.”

Then when Peco and I moved over to NI R&D and started working on cloud-based systems, we quickly realized the need for our infrastructure to be completely programmable – that is, not manually tweaked and controlled, but run in a completely automated fashion.  Also, since we were two systems guys embedded in a large development org that’s using agile, we were heavily pressured to work in iterations along with them.  This was initially a shock – my default project plan has, in traditional fashion, months worth of evaluating, installing, and configuring various technology components before anything’s up and running.   But as we began to execute in that way, I started to see that no, really, agile is possible for infrastructure work – at least “mostly.”  Technologies like cloud computing help, but there’s still a little more up front work required than with programming – but you can get mostly towards an agile methodology (and mindset!).

Then at OpsCamp last month, we discovered that there’s been this whole Agile Operations/Automated Infrastructure/devops movement thing already in progress we hadn’t heard about.  I don’t keep in touch with The Blogosphere ™ enough I guess.  Anyway, turns out a bunch of other folks have suddenly come to the exact same conclusion and there’s exciting work going on re: how to make operations agile, automate infrastructure, and meld development and ops work.

So if  you also hadn’t been up on this, here’s a roundup of some good related core thoughts on these topics for your reading pleasure!

Enterprise Systems vs. Agility

I was recently reading a good Cameron Purdy post where he talks about his eight theses regarding why startups or students can pull stuff off that large enterprise IT shops can’t.

My summary/trenchant restatement of his points:

  1. Changing existing systems is harder than making a custom-built new one (version 2 is harder)
  2. IT veterans overcomplicate new systems
  3. The complexity of a system increases exponentially the work needed to change it (versions 3 and 4 are way way harder)
  4. Students/startups do fail a lot, you just don’t see those
  5. Risk management steps add friction
  6. Organizational overhead (paperwork/meetings) adds friction
  7. Only overconservative goons work in enterprise IT anyway
  8. The larger the org, the more conflict

Though I suspect #1 and #3 are the same, #2 and #5 are the same, and #6 and #8 are the same, really.

I’ve been thinking about this lately with my change from our enterprise IT Web site to a new greenfield cloud-hosted SaaS product in our R&D organization.  It’s definitely a huge breath of fresh air to be able to move fast.  My observations:

Complexity

The problem of systems complexity (theses #1 and #3) is a very real one.  I used to describe our Web site as having reached “system gridlock.”  There were hundreds of apps running dozens to a server with poorly documented dependencies on all kinds of stuff.  You would go in and find something that looked “wrong” – an Apache config, script, load balancer rule, whatever – but if you touched it some house of cards somewhere would come tumbling down.  Since every app developer was allowed to design their own app in its own tightly coupled way, we had to implement draconian change control and release processes in an attempt to stem the tide of people lining up to crash the Web site.

We have a new system design philosophy for our new gig which I refer to as “sharing is the devil.”  All components are separated and loosely coupled.  Using cloud computing for hardware and open source for software makes it easy and affordable to have a box that does “only one thing.”  In traditional compute environments there’s pressure to “use up all that CPU before you add more”, which results in a penny wise, pound foolish strategy of consolidation.  More and more apps and functions get crunched closer together and when you go back to pull them out you discover that all kinds of new connections and dependencies have formed unbidden.

Complication

Overcomplicating systems (#2 and #5) can be somewhat overcome by using agile principles.  We’ve been delving heavily into doing not just our apps but also our infrastructure according to an agile methodology.  It surfaces your requirements – frankly, systems people often get away with implementing whatever they want, without having a spec let alone one open to review.  Also, it makes you prioritize.  “Whatever you can get done in this two week iteration, that’s what you’ll have done, and it should be working.”  It forces focus on what is required to get things to work and delays more complex niceties till later as there’s time.

Conservatism

Both small and large organizations can suffer from #6 and #8.  That’s mostly a mindset issue.  I like to tell the story about how we were working on a high level joint IT/business vision for our Web site.  We identified a number of “pillars” of the strategy we were developing – performance, availability, TCO, etc.  I had identified agility as one, but one of the application directors just wasn’t buying into it.  “Agility, that’s weird, how do we measure that, we should just forget about it.”  I finally had to take all the things we had to the business head of the Web and say “of these, which would you say is the single most important one?”  “Agility, of course,” he said, as I knew he would.  I made it a point to train my staff that “getting it done” was the most important thing, more important than risk mitigation or crossing all the t’s and dotting all the i’s.  That can be difficult if the larger organization doesn’t reward risk and achievement over conservatism, but you can work on it.

OpsCamp Debrief

I went to OpsCamp this last weekend here in Austin, a get-togther for Web operations folks specifically focusing on the cloud, and it was a great time!  Here’s my after action report.

The event invite said it was in the Spider House, a cool local coffee bar/normal bar.  I hadn’t been there before, but other people that had said “That’s insane!  They’ll never fit that many people!  There’s outside seating but it’s freezing out!”  That gave me some degree of trepidation, but I still racked out in time to get downtown by 8 AM on a Saturday (sigh!).  Happily, it turned out that the event was really in the adjacent music/whatnot venue also owned by Spider House, the United States Art Authority, which they kindly allowed us to use for free!  There were a lot of people there; we weren’t overfilling the place but it was definitely at capacity, there were near 100 people there.

I had just hears of OpsCamp through word of mouth, and figured it was just going to be a gathering of local Austin Web ops types.  Which would be entertaining enough, certainly.  But as I looked around the room I started recognizing a lot of guys from Velocity and other major shows; CEOs and other high ranked guys from various Web ops related tool companies.  Sponsors included John Willis and Adam Jacob (creator of Chef) from Opscode , Luke Kanies from Reductive Labs (creator of Puppet), Damon Edwards and Alex Honor from DTO Solutions (formerly ControlTier), Mark Hinkle and Matt Ray from Zenoss, Dave Nielsen (CloudCamp), Michael Coté (Redmonk), Bitnami, Spiceworks, and Rackspace Cloud.  Other than that, there were a lot of random Austinites and some guys from big local outfits (Dell, IBM).

You can read all the tweets about the event if you swing that way.

OpsCamp kinda grew out of an earlier thing, BarCampESM, also in Austin two years ago.  I never heard about that, wish I had.

How It Went

I had never been to an “unconference” before.  Basically there’s no set agenda, it’s self-emergent.  It worked pretty well.  I’ll describe the process a bit for other noobs.

First, there was a round of lightning talks.  Brett from Rackspace noted that “size matters,” Bill from Zenoss said “monitoring is important,” and Luke from Reductive claimed that “in 2-4 years ‘cloud’ won’t be a big deal, it’ll just be how people are doing things – unless you’re a jackass.”

Then it was time for sessions.  People got up and wrote a proposed session name on a piece of paper and then went in front of the group and pitched it, a hand-count of “how many people find this interesting” was taken.

Candidates included:

  • service level to resolution
  • physical access to your cloud assets
  • autodiscovery of systems
  • decompose monitoring into tool chain
  • tool chain for automatic provisioning
  • monitoring from the cloud
  • monitoring in the cloud – widely dispersed components
  • agent based monitoring evolution
  • devops is the debil – change to the role of sysadmins
  • And more

We decided that so many of these touched on two major topics that we should do group discussions on them before going to sessions.  They were:

  • monitoring in the cloud
  • config mgmt in the cloud

This seemed like a good idea; these are indeed the two major areas of concern when trying to move to the cloud.

Sadly, the whole-group discussions, especially the monitoring one, were unfruitful.  For a long ass time people threw out brilliant quips about “Why would you bother monitoring a server anyway” and other such high-theory wonkery.  I got zero value out of these, which was sad because the topics were crucially interesting – just too unfocused; you had people coming at the problem 100 different ways in sound bytes.  The only note I bothered to write down was that “monitoring porn” (too many metrics) makes it hard to do correlation.  We had that problem here, and invested in a (horrors) non open-source tool, Opnet Panorama, that has an advanced analytics and correlation engine that can make some sense of tens of thousands of metrics for exactly that reason.

Sessions

There were three sessions.  I didn’t take many notes in the first one because, being a Web ops guy, I was having to work a release simultaneously with attending OpsCamp 😛

[Read the rest of this entry…]

Come To OpsCamp!

Next weekend, Jan 30 2009, there’s a Web Ops get-together here in Austin called OpsCamp!  It’ll be a Web ops “unconference” with a cloud focus.  Right up our alley!  We hope to see you there.

Book Review: Smart & Gets Things Done, by Joel Spolsky

Joel Spolsky is a bit of an internet cause célèbre, the founder of Fog Creek Software and writer of joelonsoftware.com, an influential programming Web site.

The book is about technical recruiting and retention, and even though it’s a small format under 200 page book, it covers a lot of different topics.  His focus is on hiring programmers but I think a lot of the same principles apply to hiring for systems admin/Web systems positions.  Hiring has been one of the hardest parts of being a Web systems manager, so I got a lot out of the book and tried putting it into practice.  Results detailed below!

The Book

The first chapter talks about the relative effectiveness of programmers.  We often hire programmers and pay the good ones 10% more than the bad ones.  But he has actual data, drawn from a Yale professor who repeatedly teaches the same CS class and assigns the same projects, which shows something that those of us who have been in the field for a long time know – which is that the gap in achievement between the best programmers and the worst ones is a factor of ten.  That’s right.  In a highly controlled environment, the best programmers completed projects 3-4 times faster than the average and 10x faster than the slowest ones.  (And this same relationship holds when adjusting for quality of results.)  I’ve been in IT for 15 years and I can guarantee this is true.  You can give the same programming task to a bunch of different programmers and get results from “Here, I did it last night” to “Oh, that’ll take three months.”  He goes on to note other ways in which you can get 10 mediocre programmers that cannot achieve the same “high notes” as one good programmer.  This goes to reinforce how important the programmer, as human capital, is to an organization.

Next, he delves into how you find good developers.  Unfortunately, the easy answers don’t work.  Posting on monster.com or craigslist gets lots of hits but few keeps.  Employee referrals don’t always get the best people either.  How do you?  He has three suggestions.

  1. Go to the mountain
  2. Internships
  3. Build your own community

“Go to the mountain” means to figure out where the smart people are that you want to hire, and go hang out there.  Conferences.  Organizations.  Web sites.  General job sites are zoos, you need venues that are more specifically spot on.  Want a security guy?  Post on OWASP or ISSA forums, not monster.

We do pretty well with internships, even enhancing that with company sponsored student sourcing/class projects and a large campus recruiting program.  He has some good sub-points however – like make your offers early.  If you liked them as an intern, offer them a full-time job at that point for when they graduate, don’t wait.  Waiting puts you into more of a competitive situation.  And interns should be paid, given great work to do, and courted for the perm job during the internship.

Building a community – he acknowledges that’s hard.  Our company has external communities but not really for IT.  For a lot of positions we should be on our our forums like fricking scavengers trying to hire people that post there.

[Read the rest of this entry…]

Stupid Unix Trick – Command Mashups

I’ve been a *nix Administrator in some form or fashion for about 10 years now.  I remember back when I was first learning commands and how the OS works and every once in a while I’d come across something stupidly simple yet extremely useful to put in my bag of tricks.  Yesterday I was reminded about one of those things and I figured I’d share it here so that you can throw it in your bag of tricks as well if it’s not already in there.

To start out, let me illustrate the problem.  You are writing a shell script or running a series of commands on the CLI.  Let’s just say it’s something simple like creating a new directory, changing to that directory, and then creating a file.  When I first started out, that command would look something like this:

mkdir newdirectory; cd newdirectory; touch newfile

The problem with this is that each command is executed on it’s own regardless of whether or not the previous command was successful.  So if, for example, my mkdir and cd failed (permissions maybe?), I would be creating that newfile in whatever directory I started out in.  At best, I just created a new file in the wrong directory.  At worst, if the file which I’m creating was the same name as another file already in the current directory, I just overwrote it.  Not good!

The way to fix this is to add a dependency so that each command will not execute without the successful return of the command before that.  The way you do this is by putting an “&&” between them instead of the semi-colon.  So now the command string above should look like this:

mkdir newdirectory && cd newdirectory && touch newfile

Now you have guaranteed that the new file will not be created with the touch command unless both the mkdir and cd commands before it are successful.  Stupid simple, right?  Enjoy!

Techniques in Attacking and Defending XML/Web Services

This presentation was by Jason Macy and Mamoon Yunus of Crosscheck Networks – Forum Systems.  It wins the award (the one I just made up) for being the most vendor-oriented presentation at the conference.  Not that it wasn’t an interesting presentation, but their solution to defend against most of the attacks was “Use an XML Gateway” (guess what Forum Systems sells?) and the attacks were all presented using the CrossCheck SOAPSonar tool.  I realize that being a vendor they probably have more knowledge than most in the field, but being an Open Source conference, you’d think they would have demonstrated using a free/open tool (SOAPUI?) and talked more about non-hardware solutions to fix the issues.  My notes from the session are below:

Agenda

  1. Introduction to XML/Web Services Threats
  2. Techniques for Defending XML Threats
  3. XML Attack Examples and Classification
  4. Review sample attacks

Introduction to XML Threats

  • Explicit Attacks
    • Forced Disruption
    • Information Theft
    • Vendor Discovery
  • Implicit Vulnerability
    • Perimeter Breach (embeeded virus, malware)
    • Infrastructure Malfunction (parser and data processing failures)

New Attack Vectors

  • Protocol Firewalls are blind to XML
  • Malware and virus delivered via SOAP attachments
  • WSDL exposes schema and message structure
  • Injection attacks exposed via XML parameters
  • Data replay attacks

Security Testing – Base Requirements

  • Security Framework
    • Sign, ENcrypt, Decrypt, SSL
  • Identity Framework
    • Basic auth, SSL auth, WS-Security token auth
  • Parameter Injection
    • Database or file driven
    • Permutations for security, identity, and SOAP/XML
  • Concurrent Client Simultaneous Loading
    • Denial of Service Testing
  • SOAP with Attachments
    • Malware and Virus testing
  • Dynamic XSD Mutation
    • Derive SOAP vulnerability profile from WSDL schema

[Read the rest of this entry…]

The OWASP Security Spending Benchmarks Project

This presentation was by Boaz Belboard, the Executive Director of Information Security for Wireless Generation and the Project Leader for the OWASP Security Spending Benchmarks Project.  My notes are below:

It does cost more to produce a secure product than an insecure product.

Most people will still shop somewhere, go to a hospital, or enroll in a university after they have had a data breach.

Why do we spend on security?  How much should we be spending?

  • Security imposes extra costs on organizations
  • The “security tax” is relatively well knnown for network and IT security – 5 to 10% (years of Gartner, Forrester, and other studies)
  • No comparable data for development or web apps
  • Regualtions and contracts usually require “reasonable measures”.  What does that mean?

OWASP Security Spending Benchmarks Project

  • 20 partner organizations, many contributors
  • Open process and participation
  • Raw data available to community

Reasons For Investing in Security

  • Contractual and Regulatory Compliance
  • Incident Prevention, Risk Mitigation
  • Cost of Entry
  • Competitive Advantage

Technical and Procedural Principles

  • Managed and Documented Systems
  • Business-need access
  • Minimization of sensitive data use
  • Security in Design and Development
  • Auditing and Monitoring
  • Defense in Depth

Specific Activities and Projects

  • Security Policy and Training
  • DLP-Type Systems
  • Internal Configurations Management
  • Credential Management
  • Security in Development
  • Locking down internal permissions
  • Secure Data Exchange
  • Network Security
  • Application Security Programs

[Read the rest of this entry…]