Web Admin Blog

Real Web Admins. Real World Experience.

Customizing Apache Error Codes By URL

I’ve had a couple of discussions lately about customized Apache error pages that prompted me to do a little bit of research on it.  What I’ve come up with is somewhat interesting so I thought I’d share it with everyone.  First, it is not technically possible to tell Apache to serve up a different error page for image content than for html content than for php content since the only command Apache accepts for this is of the “ErrorDocument error-code document” format.  That said, if you allow .htaccess overrides on a particular directory, then you can specify your ErrorDocument directive in there as well; overriding the default error handling specified in the httpd.conf file.  An example:

In my httpd.conf file I have all 404’s going to errorpage.cgi with the following line:

ErrorDocument 404 /cgi-bin/errorpage.cgi

I’m a good little code monkey and put all of my images in a /images directory under the DirectoryRoot.  By default, if I were to hit a non-existent image in that directory, I would get the default error message defined in the httpd.conf file.  If that image were referenced in an html page that I hit, I now download the html page plus the errorpage.cgi page for the bad image reference, introducing one whole page’s worth of additional overhead.

But since I was a good code monkey and put all of my images in a /images directory, the fix for this is really simple.  I create a .htaccess file inside of my /images directory and add the following line to it:

ErrorDocument 404 “404 – Image does not exist       <– Note: No end quote is intentional

Now, if I hit http://www.mysite.com/badpage.html I get the errorpage.cgi page, but if I hit http://www.mysite.com/images/badimage.jpg I get a short and sweet message saying “404 – Image does not exist”.  I haven’t tested this yet to see how it works when you are using something like mod_oc4j to send certain URLs to an application server, but it’s possible that this could work there too if Apache checks for existing static URLs before passing requests to the app server.  Further testing could be useful there.

So there you have it.  I can’t tell Apache to serve up different error pages based on the URL or file type, but if I’m diligent about putting different files under different directories, I can effectively do the same thing using .htaccess files.  Woot!

How Secure is Your Bank Account?

Recently I was elected the new Treasurer of the Capitol of Texas Chapter of the Information Systems Security Association.  No, that’s not my way to seek your approval, but thanks for the kudos.  The reason why I bring this up is that one of the first things I needed to do as the new Treasurer was change the bank account information over from the old 2008 board members to the new 2009 ones.  I called in advance and scheduled a meeting with a banking representative and asked what I needed to bring with me.  The answer was documentation showing the board change, a current account signer, and a new account signer (me).  So far so good.

So me and two of the old board members show up at the bank to do the deed.  We sit down in the guys office with the door wide open while he proceeds to ask me personal questions such as my social security number and mother’s maiden name in front of those guys and anyone within earshot.  I probably should have said something right there, but lowered my voice and gave the guy the requested information, but that was strike #1 for a bank whose name I will not mention.

I tell him that I’ve brought two of the current signers with me and motion toward the guys sitting next to me.  They tell the bank representative their names and the representative acknowledges.  He starts handing me paperwork to sign effectively removing the old names off of the account and putting the account solely in my name.  At this point he’s asked for my driver’s license, my SSN, my mother’s maiden name, but has yet to verify that the guys sitting next to me were who they said they were.  No request for any form of identification from either of them.  Strike #2.

I ask him to assist me with setting up the online account access and he makes a quick call to find out what needs to be done and hands me another form which I sign.  At this point he tells us we’re all set.  One of the old board members asks “So at this point all of my information has been completely removed from the bank account?” and the bank representative says “yes”.  We thank him and leave only to discuss what just transpired outside amongst ourselves.  What would have prevented us from walking into that bank with a fake document showing a board member change, having two of my buddies pretend that they were the old board members, and getting the account changed into my name and walking off with the money?  They required no signature or identification from the old board members.  In fact, I did pretty much all of the talking and I’m pretty sure they didn’t even say their names (or that they were the old board members), I did.  You guessed it, strike #3!

So what have we learned from this little exercise?  First, no matter how secure your systems are, you need to make sure your process take security into account equally.  Second, Capitol of Texas ISSA really needs to find a new bank.  Do you have any idea how secure your bank account  is?

A DoS We Can Believe In

We knew that the historic inauguration of Barack Obama would be generating a lot more Internet traffic than usual, both in general and specifically here at NI.  Being prudent Web Admin types, we checked around to make sure we thought that there wouldn’t be any untoward effects on our Web site.  Like many corporate sites, we use the same pipe for inbound Internet client usage and outbound Web traffic, so employees streaming video to watch the event could pose a problem.  We got all thumbs up after consulting with our networking team, and decided to not even send any messaging asking people to avoid streaming.  But, we monitored the situation carefully as the day unwound.  Here’s what we saw, just for your edification!

Our max inbound Internet throughput was 285 Mbps, about double our usual peak.  We saw a ni.com Web site performance degradation of about 25% for less than two hours according to our Keynote stats.  ni.com ASPs were affected proportionately which indicates the slowdown was Internet-wide and not unique to our specific Internet connection here in Austin.  The slowdown was less pronounced internationally, but still visible.  So in summary – not a global holocaust, but a noticeable bump.

Cacti graphs showing our Internet connection traffic:

obamabumpcactihrlyobamabumpcactidaily

Keynote graph of several of our Web assets, showing global response time in seconds:obamabumpkeynoteLooking at the traffic specifically, there were two main standouts.  We had TCP 1935, which is Flash RTMP, peaking around 85 Mbps, and UDP 8247, which is a special CNN port (they use a plugin called “Octoshape” with their Flash streaming), peaking at 50 Mbps.   We have an overall presence of about 2500 people here at our Austin HQ on an average day, but we can’t tell exactly how many were streaming.  (Our NetQoS setup shows us there were 13,600 ‘flows,’ but every time a stream stops and starts that creates a new one – and the streams were hiccupping like crazy.  We’d have to do a bunch of Excel work to figure out max concurrent, and have better things to do.)

In terms of the streaming provider breakdown – since everyone uses Akamai now, the vast majority showed as “Akamai”.  We could probably dig more to find out, but we don’t really care all that much.  And, many of the sources were overwhelmed, which helped some.

We just wanted to share the data, in case anyone finds it helpful or interesting.

Beware The Wolf In Supplier’s Clothing

As you all know, the economic climate of 2009 is a cold, cold winter indeed.  And like wolves starved by the cold and hardship of the season, our suppliers have turned feral.

When everyone’s sales slip due to the down economy, companies (and individual sales reps) are desperate to make their numbers.  How are they doing it?  By trying to jack up maintenance costs, in some cases by more than 100%!  It’s way more than isolated incidents; all our maintenance renewals coming up are meeting with hugely inflated quotes.  And not fly-by-night companies either, I don’t want to name names but let’s just say I am confident everyone out there has heard of all of them.

So protect yourself.  In your dealings with your supplier reps, start making it clear way ahead of time that your economic situation sucks too and you certainly expect that there’s a price freeze in place.  Don’t put up with it either – they know they’re going to make plenty of money off all the goons they send quotes to who will just rubber-stamp it and send it on so they can return to ESPNZone (I’m looking at you, State of Texas).  If you put up enough resistance they’ll go looking for easier pickings, just like those mean ol’ wolves do.    We had one outfit that wanted to jack up our maintenance cost by $125k a year, but luckily our IT director is a firm lady who has no problems with browbeating a sales rep until he cries.  In the end, we let them have a 5% increase because we ended up feeling sorry for them.

And have a backup plan.  If they really do have you over a barrel, then you’re low on leverage – you can try offering reference calls, presenting at conferences, and other handy non-cash incentives to them.  But when it comes down to it, you need to be able to walk away from them.  And to do this you need to plan ahead.  There are very few things that there’s only one of.  Have multiple suppliers lined up, and have a plan to change hardware or software if you have to.  Also look into open source, or third party support – even if it’s “not as good,” these days you have to decide how much good is worth how much money.

Now don’t get me wrong, we like to partner with our suppliers and treat them friendly.  Win-win and all that.  But good fences build good neighbors, and there’s nothing friendly about showing up and saying  “Hey, your operations will grind to a halt without our product, so stick ’em up and give me double this year!”

Be advised, that gleam in Bob the Sales Rep’s eyes will be a little hungrier than usual these days, and he’s gotta eat one of God’s little forest creatures to live.  Just make sure it’s not you.

Google Chrome Hates You (Error 320)

The 1.0 release of Google Chrome has everyone abuzz.  Here at NI, loads of people are adopting it.  Shortly after it went gold, we started to hear from users that they were having problems with our internal collaboration solution, based on the Atlassian Confluence wiki product.  They’d hit a page and get a terse error, which if you clicked on “More Details” you got the slightly more helpful, or at least Googleable, string  “Error 320 (net::ERR_INVALID_RESPONSE): Unknown error.”

At first, it seemed like if people reloaded or cleared cache the problem went away.  It turned out this wasn’t true – we have two load balanced servers in a cluster serving this site.  One server worked in Chrome and the other didn’t; reloading or otherwise breaking persistence just got you the working server for a time.  But both servers worked perfectly in IE and Firefox (every version we have lying around).

So we started researching.  Both servers were as identical as we could make them.  Was it a Confluence bug?  No, we have phpBB on both servers and it showed the same behavior – so it looked like an Apache level problem.

Sure enough, I looked in the logs.  The error didn’t generate an Apache error, it was still considered a 200 OK response, but when I compared the log strings the box that Chrome was erroring on showed that the cookie wasn’t being passed up; that field was blank (it was populated with the cookie value on the other box and on both boxes when hit in IE/Firefox).  Both boxes had an identically compiled Apache 2.0.61.  I diffed all the config files- except for boxname and IP, no difference.  The problem persisted for more than a week.

We did a graceful Apache restart for kicks – no effect.  Desperate, we did a full Apache stop/start – and the problem disappeared!  Not sure for how long.  If it recurs, I’ll take a packet trace and see if Chrome is just not sending the cookie, or sending it partially, or sending it and it’s Apache jacking up…  But it’s strange there would be an Apache-end problem that only Chrome would experience.

I see a number of posts out there in the wide world about this issue; people have seen this Chrome behavior in YouTube, Lycos, etc.  Mostly they think that reloading/clearing cache fixes it but I suspect that those services also have large load balanced clusters, and by luck of the draw they’re just getting a “good” one.

Any other server admins out there having Chrome issues, and can confirm this?  I’d be real interested in knowing what Web servers/versions it’s affecting.  And a packet trace of a “bad” hit would probably show the root cause.  I suspect for some reason Chrome is partially sending the cookie or whatnot, choking the hit.

Using Proxies to Secure Applications and More

I’ve been really surprised that for as long as I’ve been active with OWASP, I’ve never seen a proxy presentation.  After all, they are hugely beneficial in doing web application penetration testing and they’re really not that difficult to use.  Take TamperData for example.  It’s just a firefox plugin, but it does header, cookie, get, and post manipulation just as well as WebScarab.  Or Google Ratproxy, which works in the background while you browse around QA’ing your web site and gives you a nice actionable report when you’re done.  I decided it was time to educate my peers on the awesomeness of proxies.

This past Tuesday I presented to a crowd of about 35 people at the Austin OWASP Meeting.  The title of my presentation was “Using Proxies to Secure Applications and More”.  Since so many people came up to me afterward telling me what a great presentation it was and how they learned something they can take back to the office, I decided (with a little insistance from Ernest) that it was worth putting up on SlideShare and posting to the Web Admin Blog.

The presentation starts off with a brief description of what a proxy is.  Then, I talked about the different types of proxies.  Then, the bulk of the presentation was just me giving examples and demonstrating the various proxies.  I included anonymizing proxies, reverse proxies, and intercepting proxies.  While my slides can’t substitue for the actual demo, I did try to include in them what tool I used for the demo.  If you have any specific questions, please let me know.  All that said, here’s the presentation.

Vignette Village 2008

Vignette, the Austin-based Web content management company,  has an annual show called Vignette Village.  A whole crew went from our company; Mark and I represented the Web Admins.

I got a lot out of Village though I wasn’t expecting to.  There was excitement in the air and clear commitment to continued development of their core Vignette Content Management (VCM V7) product and other products which had been lacking for the last couple years.  To be honest, I had begun to expect that it was a matter of time unti the Plone/Drupal/Joomla crowd outstripped VCM, but they seem to be making the changes required to keep the product as the true enterprise choice.  We already moved off Vignette Dialog, which was a very good email marketing package, because of the lackof support and new development.  I don’t know the details, but basically Vignette went all meathead and turned away from their core products to chase medical/legal document management money a couple years ago, combined with financial problems and layoffs, and so the products started to suck.  They seem to have turned that around, though, and everyone I spoke to inside Vignette is excited about their new leadership, especially Bertrand de Coatpont, the new VCM product manager.

The new Vignette Recommendations (OEMed from Baynote Systems) looks really good, and will expose some new data to us that I think can be used in a lot of different and innovative ways.  From previous descriptions I had thought “Yeah, whatever, BazaarVoice but from Vignette, which doesn’t necessarily inspire confidence in me” (frankly, we Web Admins have learned to be suspicious about additional offerings from Oracle, Vignette, HP, etc. as they will try to sell you crap on the strength of their brand name and alleged integration).  But the reality, which is an extremely elegant way of collecting and immediately reusing usage info, is brilliant and especially with their social search aspect to it, I feel like they have an actual vision they’re working towards.  So two thumbs up there!

Also two thumbs up on the Transfer Tool, which allows you to easily clone VCM installs to other servers – it’ll allow for frequent and efficient refreshes.  We had to have that working, so we Web Admins had devised a complicated two-day process to clone an environment; this should be much better.

VCM 7.6 is planned to be complete this year, and it has a lot of compelling features – you can migrate Content Type Definitions (change a CTD and the content changes inside the VCM to fit), lots of performance, availability, and console GUI fixes…  Then “Ace,” which everyone knows is VCM V8 but they don’t want to own up to that yet, has a total GUI overhaul.  Most of the issues we have with VCM are content contributor usability, so that’s great.

All in all, two days well spent.  It definitely exceeded my expectations (and I’ve been to Village in years previous).

Our Search Implementation In The News!

InformationWeek did a big story on enterprise search, and used NI as their lead example!  Note all the system info in the article that I fed them. And we’re getting a lot of fun out of Graff’s quote about how it’s easy to sign off on more resources forus, we’re including that in every purchase req now. 🙂

One of the reasons that our FAST enterprise search program has been so successful here is that the programmers and the Web Admins have worked pretty much 50/50 on the platform.  Also, FAST is a great product and has great support (we’re waiting with bated breath to see if Microsoft screws it up; we’ve been with FAST since way way before they got bought), and we have some very visionary search business folks who saw its potential early on.

Nowadays, search is more than it was considered traditionally.  We have a normal “search box”, of course.   But we also run our faceted navigation off search (e.g. our Data Acquisition product line page), pull things like related links and other resources (see resources tab on this page).  Search, in many ways, can be used the way people have used databases in the past.  With some metadata added, a search index is kinda like a big database, highly denormalized for speed, focusing on text search.  In fact, I think there’s a master’s thesis in there somewhere as to when search makes sense vs. when a database makes sense.  Databases make sense with lots of numerical information, but on the Web that’s frankly a fringe use case!   On the Web it’s all about text, from name/address to links to articles to product info…  When we did things like query related links out of a database table, and I mean an oracle database table on a big ass Solaris box, it was painfully slow.  Pulling from search, it’s 15 milliseconds.

As a result, our internal search use is even more killer.  We pull Intranet pages, documents from Notes repositories, data from our Oracle ERP system, files off file shares, etc. all into one place and let people delve through it.  They’ve even implemented “screens” on top of some of the data (mainly because Oracle ERP is painful to use).  Our entire sales force is gaga over it.

Anyway, so yay to modern search technology, yay to FAST, and yay us!

No No, You Really DO Want To Use Live Search

It’s been in the news that Microsoft is pushing “rewards programs” for people to use Live Search and the Live Toolbar.  But did you know they’re trying to get your local IT department to do it for you?

Yep, the program’s called the “Search@Work Rewards Program”.  If your IT department puts IE, with Live Search as the default search, and the Live Toolbar installed, and some kind of tracker plugin called the “Search Rewards Client,” on your company PCs, then they get Microsoft service credits!  Yay.  I can only assume my ISP is next.

Here’s the exact service description from Microsoft.  Note that they’re tracking Yahoo and Google ad impressions too!  The rest of it’s “fair enough” at least by usual IT industry standards but that’s kinda shady I think.

[Read the rest of this entry…]

Amazon Web Services S3, EC2 and other AWS services

First Speaker: VP of Amazon Web Services – Adam Selipsky

Motivation for building AWS – Scaling Amazon.com through the 90’s was
really rough.  10 years of growth caused a lot of headaches.

What if you could outsource IT Infrastructure?  What would this look like?
Needs:
Storage
Compute abilities
Database
Transactions
Middleware

Core Services:
Reliability
Scalability – Lots of companies have spiky business periods
Performance – CoLo facility and other silos in the past have shown that developers do not want slowness and wont accept it
Simplicity – No learning curve or as little as possible
Cost Effective – Prices are public and pay as you go.  No hidden fees.  Capital expenses cut way down for startups

Initial Suite of services: S3, EC2, SimpleDB, FPS, DevPay, SQS, Mechanical Turk

Cloud
Computing is a buzz word and allowing infrastructure to be managed by
someone else.  Time to market is huge since you dont have to buy boxes,
CoLo hosting, bandwidth, and more.

Second Speaker:  Jinesh Varia, Evangelist of AWS
Promise to see their roadmap for the next 2 years.
Amazon has 3 business units
Amazon.com, Amazon Services for Sellers, and Amazon Web Services
Spent 2 billion on infrastructure costs already for AWS

Analogy
– Electricity generated somewhere else doesnt really add any value.
There is a certain amount of undifferentiated services.  Server
Hosting, Bandwidth, hardware, contracts, moving facilities, … Idea to
product delay is huge.

Example of Animoto.com

They own no hardware.  None.  Serverless startup.

They went from 40 servers to 5000 in 3 days.  Facebook app.  Signed 25,000 users up every hour

Use Cases
Media Sharing and Distribution
Batch and Parallel Processing
Backup and Archive and Recovery
Search Engines
Social Netowrking Apps
Financial Applications and Simulations

What do you need?
S3, EC2, SimpleDB, FPS, DevPay, SQS, Mechanical Turk

S3
50,000 Transactions Per Second is what S3 is running right now.
99.9% Uptime

EC2
Unlimited Compute power
Scale Capacity up or down.  Linux and OpenSolaris (uggh, Solaris) are accepted
Elastic Block Store is finally here!  Yay!

SimpleDB
Not a Relational, no SQL.  But highly available and highly accessible.  Index Data…

SQS
Acts as a glue to tie all services together.  Transient Buffer?  Not sure how I feel about that.

DevPay and FPS
Developers get to use Amazon’s Billing Infrastructure.  Sounds lame and sort of pyrimad schemey

Mechanical Turk
Allows
you to get people on demand.  Perfect for high-volume micro tasks.
Human Intelligence tasks.  Outsource dummy work I guess…  Not sure.

Sample Architecture
Podango

He wrote a Cloud Architecture PDF

Future Roadmap

Focus on security features and certifications
Continued focus and operational excellence
US and international expansion
Localization of technical resources
Amazon
EC2 GA and SLA – Out of Beta and SLA delivered << This is really
good for us!  Now if gmail would get out of beta after 5 years!
Windows Server Support
Additional services

Amazon Start-Up Challenge is open.  100K

aws.amazon.com/blog

Jinesh Varia, jvaria@amazon.com

Customer Testimonials
Splunk
used AWS to host a development camp and start an instance.  Email instructions and SSH keys.  Free, Open Source.  DevCamp.
Fabulatr at @Google Code  It starts up an instance gets it ready, sends email with ssh key to user
Another Use Case – Sales Engineering – POC, Joint work with Support, A place to play, Splunk Live Demo
Splunk blog and there are some videos on blog.
Put splunk in your cloud

Resources
download.splunk.com

blogs.splunk.com/thewilde  -> Inside the Cloud Video

code.google.com/p/fabulatr

Rightscale, cant use elastic fox from iPhone, you can use RightScale

OtherInbox

Launched
on Monday.  Helps users manage inbox.  Emails from OnStar, Receipts
from Apple.  OtherInbox allows me to give out different addresses.
facebook@james.otherinbox.com
Seems like a cool app.
Use Google Docs to grab information ad hoc.
They use DB’s on EBS in a Master/Slave relationship for SQL, formerly on EC2 w/o EBS, now EBS is awesome.
Built on Ruby on Rails > MVC and SproutCore (JavaScript framework)

austincloudcomputing.com

MyBaby Our Baby
Share, Organize, Save all of the videos and pictures for kids
Invite friends and family to your site, they get emails about your kids when you add content
Other people can add photos of your children and pictures from other parents (at the park, babysitter, …)
Uses S3 only

Architecture for LB

Two Front End Load Balancing Proxy Servers that hit the right app servers.
Need
to read on Scalr (Pound)  HAProxy was also recommended.  He also
mentioned that Scalr is cool, but AWS is coming out with a LB and tool
for us to use.  He said to give it some time, but they would have
something for us!
http://aws.typepad.com/aws/2008/04/scalr-.html

GoDaddy vs AWS.  GoDaddy sucks…  but under all circumstances, “you need a geek” to get this running.

You
need a Linux System Administrator under all circumstances and a lot of
people seemed miffed by this.  I dont see what the big deal is and
under the AWS scenario, you don’t need all the infrastructure
(hardware) needed before and you need a lot less people than the
traditional model.  You always still need someone who knows how to work
the systems, but now you need fewer and you really need people that are
linux admins but also web admins that know traditional web services and
applications.  There will never be a magic button that just spins up
servers ready to go for your unique app, Amazon makes it easier, but
you still need a geek…  They make the world work…

Amazon has a long track record for success and there is a lot of trust from Other Inbox.