Web Admin Blog - Real Web Admins. Real World Experience.

Everything You Need To Know About Cloud Security in 30 Minutes or Less

Jun.25, 2009 in Cloud Computing Leave a Comment

The last presentation of the day was by Rich Mogull on “Everything you need to know about cloud security in 30 minutes or less”. It all started with all of the presentations and diagrams having pictures of clouds so some guy decides to sell that. Makes security practitioners sad.

Why the cloud is a problem for security

Poor understanding of cloud taxonomies and definitions
A generic term, frequently misused to refer to anything on the Internet
Lack of visibility into cloud deployments
Organic consumption

Couldn’t have talked about this stuff 6 months ago because nobody knew about it and it wasn’t discussed.

Security Implications

Variable control
Variable visibility
Variable simplicity/complexity
Variable resources

Control, visibility, and resources goes down as simplicity and management goes up

Is the cloud more or less secure than we are now? It depends. Something are more secure and some things are less secure because of all of the variability.

Saas

Most constrained
Most security managed by your provider
Least flexible

PaaS

Less constrained
Security varies tremendously based on provider and application-shared responsibility
Security responsibility

IaaS

Most flexible
Most security managed by your developers

Specific Issues

Spillage and data security
Reliability/availability
Capability to apply traditional security controls in a dynamic environment
Lack of visibility into cloud usage
Changing development patterns/cycles

How do you use your static and dynamic analysis testing tools in the cloud?

Where do you roll your cloud when it fails?

Your Top 2 Cloud Security Defenses

SLA
Contracts

Understand Your SLAs

Are there security-specific SLAs?
Can you audit against those SLAs?
Are there contractual penalties for non-compliance?
Do your SLAs meet your risk tolerance requirements?

Suggested SLAs

Availability
Security audits – including third party
Data security/encryption
Personal security
Security controls (depend based on service)
User account management
Infrastructure changes

Understand Your Cloud

What security controls are in your cloud?
How can you manage and integrate with the controls?
What security documentation is available?
What contingency plans are available?

Cloud Security Controls to Look For

Data encryption/security (key management)
Perimeter defenses
Auditing/logging
Authentication
Segregation
Compliance

Cloud Security Macro Layers

Network
Service
User
Transaction
Data

Don’t Trust

SAS70 Audits
Documentation without verification
Non-contractual SLAs

What to Do

Educate yourself
Engage with developers
Develop cloud security requirements

Tags: cloud, computing, mogull, rich, Security

Cloud Computing Panel Discussion

Jun.25, 2009 in Cloud Computing, Cloud Computing 1 Comment

Next up at the Cloud Computing and Virtualization Security half-day seminar was a Cloud Computing Panel moderated by Rich Mogull (Analyst/CEO at Securosis) with Josh Zachary (Rackspace), Jim Rymarczk (IBM), and Phil Agcaoili (Dell) participating in the panel. My notes from the panel discussion are below:

Phil: Little difference between outsources of the past and today’s Cloud Computing. All of that stuff is sitting outside of your environment and we’ve been evolving toward that for a long time.

Rich: My impression is that there are benefits to outsourced hosting, but there are clearly areas that make sense and areas that don’t. This is fundamentally different from shared computing resources. Very different applications for this. Complexity goes up very quickly very quickly for security controls. Where do you see the most value today? Where do people need to be most cautious?

Jim: Internal virtualization is almost necessary, but it impacts almost every IT process. Technology is still evolving and is far from advanced state. Be pragmatic and find particular applications with a good ROI.

Josh: Understand what you are putting into a cloud environment. Have a good understanding of what a provider can offer you in terms of sensitive data. Otherwise you’re putting yourself in a very bad situation. A lot of promise. Great for social networking and web development. Not appropriate with enterprises with large amounts of IP and sensitive data.

Jim: We’ll get there in 4-5 years.

Phil: Let supply chain experts do it for you and then interact with them. Access their enviornment from anywhere. Use a secure URL with a federated identity. Your business will come back to you and say “We need to do this” and IT will be unable to assist them. Use it as an opportunity to mobilize compliance and InfoSec and get involved. It’s going to come to use and we’re just going to have to deal with it. There’s a long line of people with a “right to audit”. Don’t think that someone is doing the right thing in this space, you have to ask.

Audience: What is the most likely channel for standards?

Phil: Cloud Security Alliance is a step in the right direction. Want to come up with PCI DSS like checklists. CSA is working with IEEE and NIST to work along with them. Goal is to be able to feed the standards process, not become a standards body.

Rich: The market is anti-standards based. If we get standardized, then all of the providers are only competing based on cost.

Jim: I think it’ll happen. We will see ISO groups for standards on cloud quality.

Audience: Moving data between multiple clouds. How do you determine who gets paid?

Jim: There are proposals for doing that. All of the resource parameters.

Phil: Should see standards based on federated identity. Who is doing what and where. That’s where I’ve seen the most movement. There is no ISO for SaaS. Remapping how 27001 and 27002 apply to us as a software provider.

Audience: Two things that drive standards. The market or monopoly (BetaMax).

Rich: We will have monopolistic ones and then 3rd parties that say they use those standards.

Audience: How can you really have an objective body create standards without being completely embedded in the technology?

Jim: You create a reference standard and the market drives that.

Phil: Gravity pulls us to things that work. Uses SAML as an example. It’s the way the internet has always worked. The strongest will survive and the right standards will manifest themselves.

Rich: What are some of things that you’re dealing with internally (as consumers and providers) and the top suggestions for people stuck in this situation?

Jim: People who don’t have all of the requirements do public clouds. If what you want is available (salesforce.com), it may be irresistible.

Josh: Solution needs to be appropriate to the need. Consult with your attorney to make sure you contract is in line with what you’re leveraging the provider for. It’s really about what you agree to with that provider and their responsibilities.

Phil: The hurricane is coming. You can’t scream into the wind, you gotta learn to run for cover. Find the safe spot.

Audience: What industries do you see using this? I don’t see it with healthcare.

Phil: Mostly providers for us. Outsourcing service desks. Government. Large states/local.

Josh: Small and medium retail businesses. Get products out there at a significantly reduced cost.

Jim: Lots of financial institutions looking for ways to cut costs. Healthcare industry as well (Mayo Clinic). Broad interest across the whole market, but especially anywhere they’re under extreme cost measures.

Rich: I run a small business that picked an elastic provider that couldn’t pay for a full virtual hosting provider. Doing shared hosting right now, but capable of growing to a virtual private server. Have redundancy. Able to go full-colocation if they need it. Able to support growth, but start with the same instance to get there.

Audience: How does 3rd party transparency factor into financial uses?

Jim: Almost exclusively private clouds. There are use cases playing out right now that will be repeatable patterns. Use cases.

Phil: When the volume isn’t there, offload to someone like Rackspace and they’ll help you to grow.

Audience: Are there guidelines to contracts to make sure information doesn’t just get outsourced to yet another party?

Phil: Your largest partners/vendors steal their contracts. Use them as templates.

Audience: What recourse do you have that an audit is used to verify that security is not an issue?

Rich: Contracts.

Phil: Third party assessment (ie. the right to audit). It’s in our interest to verify they are secure. It’s a trend and we now have a long list of people looking to audit against us as a provider. Hoping for an ISO to come up truly for the cloud.

Audience: Is cloud computing just outsourcing?

Rich: It’s more than that. For example, companies have internal clouds that aren’t outsourced at all.

Josh: Most of the time it’s leveraging resources more efficiently at hopefully a reduced cost.

Audience: How do I know you’re telling me the truth about the resources I’m using? What if I’m a bad guy who wants to exploit a competitor using the cloud?

Josh: We’ve seen guys create botnets using stolen credit cards. What you’re billed for is in your contract.

Jim: We’ve had this solved for decades on mainframes. Precious resources propagated amongst users. There’s no technical reason we’re not doing it today.

Rich: It depends what type of cloud you’re using. Some will tell you.

Josh: If you’re worried about someone abusing you, why are you there in the first place?

Phil: For our service desk we meter this by how many calls, by location. Monitor servers that were accessed/patched/etc. Different service providers will have different levels.

Audience: Seeing some core issues at the heart of this. For businesses, an assessment of core competencies. Can you build a better data center with the cloud? Second issue involves risk assessment. Can you do a technical audit? Can you pay for it legally? How much market presence does the vendor have? Who has responsibility for what? Notion of transparency of control. Seems like it distills down to those core basics.

Jim: I agree.

Rich: Well said.

Phil: Yes, yes, yes.

Audience: How do you write a contract for failed nation states, volatility, etc? Do we say you can’t put our stuff in these countries?

Phil: This is the white elephant in the room. How can you ensure that my data is being protected the way I’d protect it myself. It’s amazing what other people do when they get a hold of that stuff. This is the underlying problem that we have to solve. “Moving from a single-family home to a multi-tenant condo. How do we build that now?

Rich: You need to be comfortable with what you’re putting out there.

Audience: To what extent is the military or federal government using cloud computing?

Jim: They’re interested in finding ways, but they don’t talk about how they’re using it.

Audience – Vern: They’re doing cloud computing using an internal private cloud already. They bill back to the appropriate agency based on use.

Phil: Government is very wary of what’s going on.

Tags: cloud, computing, dell, ibm, jim, josh, mogull, panel, rackspace, rich, rymarczk, securosis, zachary

Virtualization Security Best Practices from a Customer’s and Vendor’s Perspective

Jun.25, 2009 in Virtualization, Virtualization 2 Comments

The next session during the ISSA half-day seminar on Virtualization and Cloud Computing Security was on security best practices from a customer and vendor perspective. It featured Brian Engle, CIO of Temple Inland, and Rob Randell, CISSP and Senior Security Specialist at VMware, Inc. My notes from the presentation are below:

Temple Inland Implementation – Stage 1

Overcome Hurdles

Management skeptical of Windows virtualization

Don’t Fear the Virtual World

First year:
- Built out development only environment
- Trained staff
- Developed support processes
- Showed hard dollar savings

Temple Inland – Stage 2

Build QA environment
Improve processes
Develop rapid provisioning
Demonstrate advanced functions
- Vmotion
- P2V Conversions

Temple Inland – Stage 3

First production environment

Temple-Inland Implementation

Prior to VMWare. Typical remote facility
- Physical domain controller
- Physical application/file server
- Physical tape drive
New architecture
- Single VMWare server
- No tape drive

Desktops
- Virtualize desktops through VMWare
- No application issues like Citrix Metaframe
- Quick deployment and repair

How Virtualization Affects Datacenter Security

Abstraction and Consolidation
- +Capital and Operational Cost Savings
- -New infrastructure layer to be secured
- -Greater impact of attack or misconfiguration
Collapse of Switches and servers into one device
- +Flexibility
- +Cost-savings
- -Lack of virtual network visibility
- -No separation-by-default of administration

Temple-Inland split the teams so that there was a virtual network administration team within the server administration team.

How Virtualization Affects Datacenter Security

Faster deployment of servers
- + IT responsiveness
- -Lack of adequate planning
- -Incomplete knowledge of current state of infrastructure
VM Mobility
- +Improved Service Levels
- -Identity divorced from physical location
VM Encapsulation
- +Ease of business continuity
- +Consistency of deployment
- +Hardware Independence
- -Outdated offline systems

Build anti-virus, client firewalls, etc into the offline images so that servers are up-to-date right when they are installed.

If something happens to a system, you can’t just pull the plug anymore. You have to have policies and processes in place.

With virtualization you can have a true “gold image” instead of having different images for all of the different types of hardware.

Security Advantages of Virtualization

Allows automation of many manual error prone processes
Cleaner and easier disaster recovery/business continuity
Better forensics capabilities
Faster recovery after an attack
Patching is safer and more effective
Better control over desktop resources
More cost effective security devices
App virtualization allows de-privileging of end users
Better lifecycle controls
Future: Security through VM Introspection

Gartner: “Like their physical counterparts, most security vulnerabilities will be introduced through misconfiguration”

What Not to Worry About

Hypervisor Attacks
- ALL theoretical, highly complex attacks
- Widely recognized by security community as being only of academic interest
Irrelevant Architectures
- Apply only to hosted architecture (ie. Workstation) not bare-metal (ie. ESX)
- Hosted architecture generally suitable only when you can trust the guest VM
Contrived Scenarios
- Involved exploits where best practices around hardening, lockdown, desgin, for virtualization etc not followed or
- Poor general IT infrastructure security is assumed

Are there any Hypervisor Attack Vectors?

There are currently no known hypervisor attack vectors to date that have lead to “VM Escape”

Architecture Vulnerability
- Designed specifically with isolation in mind
Software Vulnerability – Possible like with any code written by humans
- Mitigating Circumstances:
  - Small Code Footprint of Hypervisor (~21MB) is easier to audit
  - If a software vulnerability is found, exploit difficulty will be very high
    - Purpose build for virtualization only
    - Non-interactive environment
    - Less code for hackers to leverage
- Ultimately depends on VMWare security response and patching

Concern: Virtualizing the DMZ/Mixing Trust Zones

Three Primary Configurations

Physical separation of trust zones
Virtual separation of trust zones with physical security devices
Fully collapsing all servers and security devices into a VI3 infrastructure

Also applies to PCI requirement

Physical Separation of Trust Zones

Advantages

Simpler, less complex configuration
Less change to physical environment
Little change to separation of duties
Less change in staff knowledge requirements
Smaller chance of misconfiguration

Disadvantages

Lower consolidation and utilization of resources
Higher cost

Virtual Separation of Trust Zones with Physical Security Devices

Advantages

Better utilization of resources
Take full advantage of virtualization benefits
Lower cost

Disadvantages (can be mitigated)

More complexity
Greater chance of misconfiguration

Getting more toward “the cloud” where web zone, app zone, and DB zone are all virtualized on the same system, but still using physical firewalls.

Fully Collapsed Trust Zones Including Security Devices

Advantages

Full utilization of resources, replacing physical security devices with virtual
Lowest-cost option
Management of entire DMZ and network from a single management workstation

Disadvantages

Greatest complexity, which in turn creates highest chance of misconfiguration
Requirement for explicit configuration to define separation of duties to help mitigate risk of misconfiguration; also requires regualar audits of configurations
Potential loss of certain functionality, such as VMotion (being mitigated by vendors and VMsafe)

How do we secure our Virtual Infrastructure?

Use the principles of Information Security

Hardening and lockdown
Defense in depth
Authorization, authentication, and accounting
Separation of duties and least privileges
Administrative controls

Protect your management interfaces (VCenter)! They are the keys to the kingdom.

Fundamental Design Principles

Isolate all management networks
Disable all unneeded services
Tightly regualte all administrative access

Summary

Define requirements and ensure vendor/product can deliver
- Consider culture, capability, maturity, architecture and security needs
Implement under controlled conditions using a defined methodology
- Use the opportunity to improve control deficiencies in existing physical server areas if possible
- Implement processes for review and validation of controls to prevent the introduction of weaknesses
Round corners where your control environment allows
- Sustain sound practices that maintain required controls
- Leverage the technology to achieve efficiency and improve scale

Tags: best, brian, collapsed, customer, engle, inland, perspective, practices, randell, rob, Security, separation, temple, trust, vendor, Virtualization, vmware, zone

About the Cloud Security Alliance

Jun.25, 2009 in Cloud Computing, Virtualization Leave a Comment

The next presentation at the ISSA half-day seminar was on the “Cloud Security Alliance” and Security Guidance for Critical Areas of Focus in Cloud Computing by Jeff Reich. Here are my notes from this presentation:

Agenda

About the Cloud Security Alliance
Getting Involved
Guidance 1.0
Call to Action

About the Cloud Security Alliance

Not-for-profit organization
Inclusive membership, supporting broad spectrum of subject matter expertise: cloud experts, security, legal, compliance, virtualization, etc
We believe in Cloud Computing, we want to make it better

Getting Involved

Individual membership (free)
- Subject matter experts for research
- Interested in learning about the topic
- Administrative & organizational help
Corporate Sponsorship
- Help fund outreach, events
Affiliated Organizations (free)
- Joint projects in the community interest
Contact information on website

Download version 1.0 of the Security Guidance at http://www.cloudsecurityalliance.org/guidance

Overview of Guidance

15 domains
#1 is Architecture & Framework
Covers Governing in the Cloud (2-7) and Operating in the Cloud (8-15) as well

Assumptions & Objectives

Trying to bridge gap between cloud adopters and security practitioners
Broad “security program” view of the problem

Architecture Framework

Not “One Cloud”: Nuanced definition critical to understanding risks & mitigation
5 principal characteristics (abstration, sharing, SOA, elasticity, consumption/allocation)
3 delivery models
- Infrastructure as a Service
- Platform as a Service
- Software as a Service
4 deployment models: Public, Private, Managed, Hybrid

Governance & ERM

A portion of cloud cost savings must be invested into provider security
Third party transparency of cloud provider
Financial viability of cloud provider
Alignment of key performance indicators
PII best suited in private/hybrid cloud outside of significant due diligence of public cloud provider
Increased frequency of 3rd party risk assessments

Important thing to consider is the financial viability of your provider. You never want to have your data held hostage in a court battle.

Legal

Contracts must have flexible structure for dynamic cloud relationships
Plan for both an expected and unexpected termination of the relationship and an orderly return of your assets
Find conflicts between the laws the cloud provider must comply with and those governing the cloud customer

Compliance & Audit

Classify data and systems to understand compliance requirements
Understand data locations, copies

Information Lifecycle Management

Understand the logical segregation of information and protective controls imnplemented in storage, transfers, backups

Summary

Cloud Computing is real and transformational
Cloud Computing can and will be secured
Broad governance approach needed
Tactical fixes needed
Combination of updating existing best practices and creating completely new best practices
Common sense is not optional

Call to Action

Join us, help make our work better
www.cloudsecurityalliance.org
info@cloudsecurityalliance.org
Twitter: @cloudsa, #csaguide

Tags: about, alliance, cloud, framework, guidance, jeff, membership, objectives, reich, Security

Introduction to Cloud Computing and Virtualizaton Security

Jun.25, 2009 in Cloud Computing, Cloud Computing, Virtualization, Virtualization 1 Comment

Today the Austin ISSA and ISACA chapters held a half-day seminar on Cloud Computing and Virtualization Security. The introduction on cloud computing was given by Vern Williams. My notes on this topic are below:

5 Key Cloud Characteristics

On-demand self-service
Ubiquitous network access
Location independent resource pooling
Rapid elasticity
Pay per use

3 Cloud Delivery Models

Software as a Service (SaaS): Providers applications over a network
Platform as a Service (PaaS): Deploy customer-created apps to a cloud
Infrastructure as a Service (IaaS): Rent processing, storage, etc

4 Cloud Deployment Models

Private cloud: Enterprise owned or leased
Community cloud: Shared infrastructure for a specific community
Public cloud: Sold to the public, Mega-scale infrastructure
Hybrid cloud: Composition of two or more clouds

Two types: internal and external
http://csrc.nist.com/groups/SNS/cloud-computing/index.html

Common Cloud Characteristics

Massive scale
Virtualization
Free software
Autonomic computing
Multi-tenancy
Geographically distributed systems
Advanced security technologies
Service oriented software

Pros

Lower central processing unit (CPU) density
Flexible use of resources
Rapid deployment of new servers
Simplified recovery
Virtual network connections

Cons

Complexity
Potential impact of a single component failure
Hypervisor security issues
Keeping virtual machine (VM) images current
Virtual network connections

Virtualization Security Concerns

Protecting the virtual fabric
Patching off-line VM images
Configuration Management
Firewall configurations
Complicating Audit and Forensics

Tags: cloud, computing, issa, vern, Virtualization, williams

Velocity 2009 – Introduction to Managed Infrastructure with Puppet

Jun.24, 2009 in Automation, Conferences, Velocity 2009 Leave a Comment

Introduction to Managed Infrastructure with Puppet
by Luke Kanies, Reductive Labs

You can get the work files from git://github.com/reductivelabs/velocity_puppet_workshop_2009.git, and the presentation’s available here.

I saw Luke’s Puppet talk last year at Velocity 2008, but am more ready to start uptaking some conf management back home. Our UNIX admins use cfengine, and puppet is supposed to be a better-newer cfengine. Now there’s also an (allegedly) better-newer one called chef I read about lately. So this should be interesting in helping to orient me to the space. At lunch, we sat with Luke and found out that Reductive just got their second round funding and were quite happy, though got nervous and prickly when there was too much discussion of whether they were all buying Teslas now. Congrats Reductive!

Now, to work along, you git the bundle and use it with puppet. Luke assumes we all have laptops, all have git installed on our laptops, and know how to sync his bundle of goodness down. And have puppet or can quickly install it. Bah. I reckon I’ll just follow along.

You can get puppet support via IRC, or the puppet-users google group.

First we exercise “ralsh”, the resource abstraction layer shell, which can interact with resources like packages, hosts, and users. Check em, add em, modify em.

You define abstraction packages. Like “ssh means ssh on debian, openssh on solaris…” It requires less redundancy of config than cfengine.

“puppet” consists of several executables – puppet, ralsh, puppetd, puppetmasterd, and puppetca.

As an aside, cft is a neat config file snapshot thing in red hat.

Anyway, you should use puppet not ralsh directly. Anyway the syntax is similar. Here’s an example invocation:

puppet -e 'file { "/tmp/eh": ensure => present }'

There’s a file backup, or “bucket”, functionality when you change/delete files.

You make a repository and can either distribute it or run it all from a server.

There is reporting.

There’s a gepetto addon that helps you build a central repo.

A repo has (or should have) modules, which are basically functional groupings. Modules have “code.” The code can be a class definition. init.pp is the top/special one. There’s a modulepath setting for puppet. Load the file, include the class, it runs all the stuff in the class.

It has “nodes” but he scoffs at them. Put them in manifests/site.pp. default, or hostname specific (can inherit default). But you should use a different application, not puppet, to do this.

You have to be able to completely and correctly describe a task for puppet to do it. This is a feature not a bug.

Puppet uses a client-server pull architecure. You start a puppetmasterd on a server. Use the SSH defaults because that’s complicated and will hose you eventually. Then start a puppetd on a client and it’ll pull changes from the server.

This is disjointed. Sorry about that. The session is really just reading the slide equivalent of man pages while flipping back and forth to a command prompt to run basic examples. I don’t feel like this session gave enough of an intro to puppet, it was just “launch into the man pages and then run individual commands, many of which he tells you to never do.” I don’t feel like I’m a lot more informed on puppet than when I started, which makes me sad. I’m not sure what the target audience for this is. If it’s people totally new to puppet, like me, it starts in the weeds too much. If it’s for someone whohas used puppet, it didn’t seem to have many pro tips or design considerations, it was basic command execution. Anyway, he ran out of time and flipped through the last ten slides in as many seconds. I’m out!

Tags: Conferences, puppet, velocity, velocityconf, velocityconf09

Velocity 2009 – Death of a Web Server

Jun.24, 2009 in Conferences, Velocity 2009 Leave a Comment

The first workshop on Monday morning was called Death of a Web Server: A Crisis in Caching. The presentation itself is downloadable from that link, so follow along! I took a lot of notes though because much of this was coding and testing, not pure presentation. (As with all these session writeups, the presenter or other attendees are welcome to chime in and correct me!) I will italicize my thoughts to differentiate them from the presenter’s.

It was given by Richard Campbell from Strangeloop Networks, which makes a hardware device that sits in front of and accelerates .NET sites.

Richard started by outing himself as a Microsoft guy. He asks, “Who’s developing on the Microsoft stack?” Only one hand goes up out of the hundreds of people in the room. “Well, this whole demo is in MS, so strap in.” Grumbling begins to either side of me. I think that in the end, the talk has takeaway points useful to anyone, not just .NET folks, but it is a little off-putting to many.

“Scaling is about operations and development working hand in hand.” We’ll hear this same refrain later from other folks, especially Facebook and Flickr. If only developers weren’t all dirty hippies… 🙂

He has a hardware setup with a batch of cute lil’ AOpen boxes. He has a four server farm in a rolly suitcase. He starts up a load test machine, a web server, and a database; all IIS7, Visual Studio 2008.

We start with a MS reference app, a car classifieds site. When you jack up the data set to about 10k rows – the developer says “it works fine on my machine.” However, once you deploy it, not so much.

He makes a load test using MS Visual Studio 2008. Really? Yep – you can record and playback. That’s a nice “for free” feature. And it’s pretty nice, not super basic; it can simulate browsers and connection speeds. He likes to run two kinds of load tests,and neither should be short.

Step load for 3-4 hrs to test to failure
Soak test for 24 hrs to hunt for memory leaks

What does IIS have for built-in instrumentation? Perfmon. We also get the full perfmon experience, where every time he restarts the test he has to remove and readd some metrics to get them to collect. What metrics are the most important?

Requests/sec (ASP.NET applications) – your main metric of how much you’re serving
Reqeusts queued (ASP.NET) – goes up when out of threads or garbage collecting
%processor time – to keep an eye on
#bytes in all heaps (.NET CLR memory) – also to keep an eye on

So we see pages served going down to 12/sec at 200 users in the step load, but the web server’s fine – the bottleneck is the db. But “fix the db” is often not feasible. We run ANTS to find the slow queries, and narrow it to one stored proc. But we assume we can’t do anything about it. So let’s look at caching.

You can cache in your code – he shows us, using _cachelockObject/HttpContext.Current.Cache.Get, a built in .NET cache class.

Say you have a 5s initial load but then caching makes subsequent hits fast. But multiple first hits contend with each other, so you have to add cache locking. There’s subtle ways to do that right vs wrong. A common best practice patter he shows is check, lock, check.

We run the load test again. “If you do not see benefit of a change you make, TAKE THE CODE BACK OUT,” he notes. Also, the harder part is the next steps, deciding how long to cache for, when to clear it. And that’s hard and error-prone; content change based, time based…

Now we are able to get the app up to 700 users, 300 req/sec, and the web server CPU is almost pegged but not quite (prolly out of load test capacity). Half second page response time. Nice! But it turns out that users don’t use this the way the load test does and they still say it’s slow. What’s wrong? We built code to the test. Users are doing various things, not the one single (and easily cacheable) operation our test does.

You can take logs and run them through webtrace to generate sessions/scenarios. But there’s not quite enough info in the logs to reproduce the hits. You have to craft the requests more after that.

Now we make a load test with variety of different data (data driven load test w/parameter variation), running the same kinds of searches customers are. Whoops, suddenly the web server cpu is low and we see steady queued requests. 200 req/sec. Give it some time – caches build up for 45 mins, heap memory grows till it gets garbage collected.

As a side note, he says “We love Dell 1950s, and one of those should do 50-100 req per sec.”

How much memory “should” an app server consume for .NET? Well, out of the gate, 4 GB RAM really = 3.3, then Windows and IIS want some… In the end you’re left with less than 1 GB of usable heap on a 32-bit box. Once you get to a certain level (about 800 MB), garbage collection panics. You can set stuff to disposable in a crisis but that still generates problems when your cache suddenly flushes.

64 bit OS w/4 GB yields 1.3 GB usable heap
64 bit OS w/8 GB, app in 32-bit mode yields 4 GB usable heap (best case)

So now what? Instrumentation; we need more visibility. He adds a Dictionary object to log how many times a given cache object gets used. Just increment a counter on the key. You can then log it, make a Web page to dump the dict on demand, etc. These all affect performance however.

They had a problem with an app w/intermittent deadlocks, and turned on profiling – then there were no deadlocks because of observer effect. “Don’t turn it off!” They altered the order of some things to change timing.

We run the instrumented version, and check stats to ensure that there’s no major change from the instrumentation itself. Looking at cache page – the app is caching a lot o fcontent that’s not getting reused ever. There are enough unique searches that they’re messing with the cache. Looking into the logs and content items to determine why this is, there’s an advanced search that sets different price ranges etc. You can do logic to try to exclude “uncachable” items from the cache. This removes memory waste but doesn’t make the app any faster.

We try a new cache approach. .NET caching has various options – duration and priority. Short duration caching can be a good approach. You get the majority of the benefit – even 30s of caching for something getting hit several times a second is nice. So we switch from 90 minute to 30 second cache expiry to get better (more controlled) memory consumption. This is with a “flat” time window – now, how about a sliding window that resets each time the content is hit? Well, you get longer caching but then you get the “content changed” invalidation issue.

He asks a Microsoft code-stunned room about what stacks they do use instead of .NET, if there’s similar stuff there… Speaking for ourselves, I know our programmers have custom implemented a cache like this in Java, and we also are looking at “front side” proxy caching.

Anyway, we still have our performance problem in the sample app. Adding another Web server won’t help, as the bottleneck is still the db. Often our fixes create new other problems (like caching vs memory). And here we end – a little anticlimactically.

Class questions/comments:
What about multiserver caching? So far this is read-only, and not synced across servers. The default .NET cache is not all that smart. MS is working on a new library called, ironically, “velocity” that looks a lot like memcached and will do cross-server caching.

What about read/write caching? You can do asynchronous cache swapping for some things but it’s memory intensive. Read-write caches are rarer- Oracle/Tangosol Coherence and Terracotta are the big boys there.

Root speed – At some point you also have to address the core query, it can’t take 10 seconds or even caching cant’ save you. Prepopulating the cache can help but you have to remember invalidations, cache clearing events, etc.

Four step APM process:

Diagnosis is most challenging part of performance optimization
Use facts – instrument your application to know exactly what’s up
Theorize probable cause then prove it
Consider a variety of solutions

Peco has a bigger twelve-step more detailed APM process he should post about here sometime.

Another side note, sticky sessions suck… Try not to use them ever.

What tools do people use?

Hand written log replayers
Spirent avalanche
wcat (MS tool, free)

I note that we use LoadRunner and a custom log replayer. Sounds like everyone has to make custom log replayers, which is stupid, we’ve been telling every one of our suppliers in at all related fields to build one. One guy records with a proxy then replays with ec2 instances and a tool called “siege” (by Joe Dog). There’s more discussion on this point – everyone agrees we need someone to make this damn product.

“What about Ajax?” Well, MS has a “fake” ajax that really does it all server side. It makes for horrid performance. Don’t use that. Real ajax keeps the user entertained but the server does more work overall.

An ending quip repeating an earlier point – you should not be proud of 5 req/sec – 50-100 should be possible with a dynamic application.

And that’s the workshop. A little microsofty but had some decent takeaways I thought.

Tags: .NET, caching, Conferences, strangeloop, velocity, velocityconf, velocityconf09

The Velocity 2009 Conference Experience

Jun.23, 2009 in Application Performance Management, Conferences, Velocity 2009 Leave a Comment

Velocity 2009 is well underway and going great! Here’s my blow by blow of how it went down.

Peco, my erstwhile Bulgarian comrade, and I came in to San Jose from Austin on Sunday. We got situated at the fairly swank hotel, the Fairmont, and wandered out to find food. There was some festival going on so the area was really hopping. After a bit of wandering, we had a reasonably tasty dinner at Original Joe’s. Then we walked around the cool pedestrian part of downtown San Jose and ended up watching “Terminator: Salvation” at a neighborhood movie theater.

We went down at 8 AM the next morning for registration. We saw good ol’ Steve Souders, and hooked up with a big crew from BazaarVoice, a local Austin startup that’s doing well. (P.S. I don’t know who that hot brunette is in the lead image on their home page, but I can clearly tell that she wants me!)

This first day is an optional “workshop” day with a number of in depth 90 minute sessions. There were two tracks, operations and performance. Mostly I covered ops and Peco covered performance. Next time – the first session!

Tags: Conferences, velocity, velocityconf, velocityconf09

Velocity 2009 – The Web Performance and Operations Conference

Jun.23, 2009 in Application Performance Management, Conferences, Velocity 2009 Leave a Comment

You’re in luck! Peco and I are attending Velocity 2009 and we’ll be taking notes and blogging about the conference. You can see what to expect by going back and reading my coverage of Velocity 2008!

As Web Admins, we love Velocity. Usually, we have to bottom-feed at more generalized conferences looking for good relevant content on systems engineering. This is the only conference that is targeted right at us, and has a dual focus of performance and operations. The economy’s hitting us hard this year and we could only do one conference – so this is the one we picked.

Look for full coverage on the sessions to come!

Tags: Conferences, velocity, velocityconf, velocityconf09

Who Needs VPN When You Have PuTTY?

Apr.09, 2009 in Networking, Security 1 Comment

I was talking with my coworkers this afternoon about Time Warner’s plans to jack up rates for high-bandwith users and it got me thinking about how much of their precious bandwith I am actually using. I know that my router at home has a web browser interface where I can get that information, but I have it intentionally only allowing access from the local area network interfaces. I needed to find another way to view the site from work while making the router think that I was on the right network. What I ended up doing was using PuTTY to create a SSH tunnel from my work computer to my Linux box on the home network. I then just pointed my browser at the forwarded port on my work computer and up comes my router’s web interface. Who needs VPN when you have PuTTY? Anyway, here are the exact steps that I took to do this:

Start PuTTY
Under Connection->SSH->Tunnels specify a source port (the localhost port you want to connect to) and a destination (IP:port) that you want to connect to on your home network.
- Source port: 8008
- Destination: 192.168.0.1:80 (or whatever IP your router is at and it’s web interface port)
Click “Add”
Under “Session” specify the host name for your SSH server that lives on your internal network, but is exposed via port forwarding on your router with port 22.
Click “Open”
When prompted, enter your username and password for your SSH server.
Now just pull up your favorite web browser and navigate to http://localhost:8008. You should see the page just like you would if you were sitting at home.

Tags: putty, ssh, tunnel, vpn

« Previous Page — « previous entries

next entries » — Next Page »

Welcome to WebAdminBlog!

This blog site is run by Josh Sokol, the Founder and CEO of SimpleRisk, a free tool for Governance, Risk Management, and Compliance. Josh is a former Web Admin and Information Security Program Owner of National Instruments.

Categories
Recent Posts
Recent Comments
devops
Links
Security
Tags
21ct agile amazon analysis application appsec attack aws browser cloud Cloud Computing code Conferences data devops ec2 firewall google hansen internet lynxeon malware Management network Operations owasp PCI performance project rsnake SaaS secure Security strategies velocity velocity08 velocityconf velocityconf08 velocityconf09 Virtualization vpn vulnerability waf web wifi
Categories
- Advertising (2)
- Application Performance Management (14)
- Automation (4)
- Browsers (4)
- Cloud Computing (9)
  - Elastic Compute Cloud (3)
- Conferences (64)
  - BSides Austin 2013 (1)
  - BSides Austin 2016 (1)
  - OWASP AppSec DC 2009 (16)
  - OWASP AppSec NYC 2008 (18)
  - OWASP LASCON 2017 (1)
  - OWASP LASCON 2018 (1)
  - TRISC 2009 (8)
  - Velocity 2008 (8)
  - Velocity 2009 (8)
- Content Management (2)
- Featured (3)
- Green Computing (1)
- High Availability (1)
- Log Management (2)
- Management (4)
- Monitoring (4)
- Networking (12)
  - Firewalls (4)
  - NetFlow (4)
- Operating Systems (2)
  - Linux (2)
  - Mac OSX (1)
  - Unix (2)
- Operations (11)
- Popular (2)
- SaaS (2)
- Sarcasm (1)
- Search (1)
  - Enterprise Search (1)
- Security (75)
  - Access Management (1)
  - Capture the Flag (4)
  - Cloud Computing (4)
  - Compliance (1)
  - Disaster Recovery (1)
  - Malware (4)
  - Metrics (2)
  - OWASP (2)
  - PCI (2)
  - Phishing (2)
  - Physical (1)
  - Risk Management (2)
  - Virtualization (3)
  - Web Application Security (32)
    - Dynamic Analysis (1)
    - Static Analysis (1)
  - Wireless Networks (5)
- Service-Oriented Architecture (1)
- Software and Tools (15)
  - Crashplan (1)
  - Drobo (1)
  - GRC (1)
- Training (2)
- Uncategorized (1)
- Virtualization (4)

Blogroll
- Agile Operations Blog
- Agile Testing
- Agile Web Operations
- Amazon Web Services Blog
- dev2ops – Web Ops at Scale
- Gilligan on Data Web Analytics pro tips
- ISSA Home The Information Systems Security Association (ISSA)® is a not-for-profit, international organization of information security professionals and practitioners.
- Kitchen Soap, A WebOps Blog
- Michael Howard's Blog Software security guy at Microsoft.
- National Instruments Home The majority of the contributers here are current or past NI employees.
- OWASP Home The Open Web Application Security Project (OWASP) is a worldwide free and open community focused on improving the security of application software.
- RSnake's Blog ha.ckers.org web application security lab
- Server Fault
- Steve Souders’ Blog Google High Performance Guru
- The Madstop
- The Open Minded Enterprise
- The Simple Logic
- Transparent Uptime blog
Archives
- March 2019
- October 2017
- April 2016
- January 2016
- December 2015
- May 2015
- November 2014
- August 2014
- June 2014
- May 2014
- October 2013
- September 2013
- August 2013
- May 2013
- March 2013
- February 2013
- October 2012
- May 2011
- April 2011
- December 2010
- July 2010
- June 2010
- April 2010
- March 2010
- February 2010
- January 2010
- November 2009
- September 2009
- July 2009
- June 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
Tag Cloud
21ct agile amazon analysis application appsec attack aws browser cloud Cloud Computing code Conferences data devops ec2 firewall google hansen internet lynxeon malware Management network Operations owasp PCI performance project rsnake SaaS secure Security strategies velocity velocity08 velocityconf velocityconf08 velocityconf09 Virtualization vpn vulnerability waf web wifi

Real Web Admins. Real World Experience.

Welcome to WebAdminBlog!

Categories

Recent Posts

Recent Comments

devops

Links

Security

Tags

Categories

Blogroll

Archives

Tag Cloud