Open Stack is an open source cloud computing project. The Open Stack summit this week had 250 people from 12 countires and 90 different companies. The most visible employees in the summit were from the following companies: Rackspace , NASA , Citrix and NTT.
For those of you who are interested in the internal of cloud technologies, below are some of my notes I took during the summit. Hope you find it useful.
250-300 participants , 90 companies from 12 countries.This project is only a few months old and participation is above expectation.
Why opensource? reduce adoption barrier. Core competence is support and service. Rackspace has 118k customers, 80% use some sort of cloud service.
XaaS (everything as a service) == modular building blocks, each auto scales, auto deploys, monitoring and auto dynamic configuration
2011/2 – Rackspace timeline to converting internal production to openstack
The government wants the economy (commercial companies) to leverage NASA know how. They support NASA on this open source project (so it’s not a one time contribution). Nebula (OpenStack contribution) is the 1st Apache2 government program.
NASA would want to get out of the hosting business, but existing solutions are too slow for us. NASA has exabytes of data each day (coming from high resolution sensors)
Elasticity is needed (for reusing project resources that get canceled or work in space longer than expected)
NASA always upgrades to latest processors and works for processor specific tuning. Need for speed.
Accenture technology labs- Open Source survey in Enterprise.
1) All started looking at opensource alternatives. Open Source has imrpoved its quality,reliability and security.
2)Mission critical projects are not on cloud. Expect it to happen in 2-5 years (as enterprises start refreshing software, something they have delayed too long)
1+2) Open source cloud is a viable alternative.
3) CIOs want the same software stack inhouse and in public cloud. Don’t want to support two software stacks.
4) Objections: Compliance (ussually security issues), management issues, vendor lock-in issues
5) Private cloud today is not “really” a private cloud. Private cloud means extending to all of the enterprise data center, not just a wrapper of existing infrustructure.
6) Telcos want to do cloud (providers). Not sure what/how. Openstack could be attractive
7) Large enterprises – want more control over resources.
8 ) Government
Project Governance Process
The developers, code decision, for compute and storage. Self governing bodies.
Project Oversight Board
Looks at what should be going into openstack. And consider what should be added. 9 seats. 5 nominated by rackspace. 4 elected by the public. (currently 6 from Rackspace, 2 from NASA, 1 from Citrix). Those that write the most code were elected.Those 5 slots from Rackspace are not just Rackspace people. We appointed users, so those that writing the most code, may not be the only ones that should be on the board.
Project Advisory Boar
Comprised of folks that have commercial interest in the project, who want to make sure that their requirements are put in the project. Experts in technology, enterprise users.
Internal Admin APIs are available only for cloud hosting providers and would require an elevated token to be used. Access control will be fine grained based on access per methods and method groups. Each operator will be able to create their own groups and create a variation of the Internal APIs if needed.
Support Huge files in Swift
When uploading a large file, the bandwidth is limited by the hardrive speed.
Solution: Upload the file by chunking it into parts, and upload each part in parallel and save each part on a different physical disk.
Introduces new problems: Billing based on stats requires filtering, Need Virtual File APIs
Supporting VMWare and Hyper-V
We want openstack to expose features that are only in one hypervisor but is not implemented in others. We need to provide a way for the user to know if it’s implemented or not. (Introspection)
Question: Can we have a common administration of all hypervisors? How about affinity concepts?
Answer: It relates to yesterday’s discussion about zones and sub-zones.
Decision: First do implementation specific, and only then try to abstract it if possible. Some implementations would use libvirt.
Proposal: Each developer can login to the system and run tests on the code of the branch (even before code review). We want to run tests on each revision, otherwise you
can never know what check-ins caused the regression.
Commercial companies could sign up as a testing cluster (Rackspace is willing to host a testing cluster). Someone proposed perhaps to brand it – who donates the machines (“runs on a X cluster”).
Each test will collect Speed (performance metrics). It’s a brutaly transparent process, in which different hardware/hypervisor vendors would get different performance results.
Jokes about virtual worlds (matrix movie like…) was that a hypervisor running my tests or is this a real machine ?…
Copy on write virtual machine start-up
Problem: Want to start VMs much faster using copy-on-write machine images.
LVM copy on write happens on the host level, so it is hypervisore independent. Hosts can be preinstalled (like a cache warming).
May have performance degredation on first write.
Hypervisors are optimized on the use case of copy-on-write. LVM copy-on-write was not build for this kind of scenario.
Hypervisors can use an NFS share for the copy-on-write scenario. But it’s hard to make it work in a distributed system.
The slowest part is the DD (fill hardrive with zeroes). With LVM you may get the old image data, so you cannot do sparse DD. You need to clean the disk which takes forever.
KVM will have image streaming. You startup with a small image, and eventually get all of the image by streaming in the background.
LVM is not going to work on Hyper-V.
The actual performance characteristics is hindered when you have a file system on top of a file system if the block sizes are not aligned. It’s a multi-tenancy problem. Multiple VMs over the same underlying file system. Newer File Systems such as ZFS array file system don’t have alignment issue whatsoever. Next year 4K drives come out which is a change from 512 drives we have today. Requires OS tweaking.
Openstack requires to integrate with the Autherization,Authentication,Billing ecosystem. This proposal is just about auth.
Need to unify the auth system for compute and storage. A plugguble software module implements the auth protocol.
Openstack authentication component would be a reverse http proxy (such as a security appliance). (see repoze.who project).
We don’t plan to define any authentication itself since I’ts a contraversial subject.
(Optionally) separating the auth component from the http reverse proxy, since auth components may have hardware specific acceleration. Also it makes testing the service much easier.
Another reverse proxy (a mapper) checks the URI and forwards it to the authentication reverse proxy. This allows a different authentication scheme for internal/external users, different admin commands, etc…
Tennat– operators deal only with tennats (resellers)
Account– openstack deals only with accounts (end customer).
AccountID – string concatenation of Tennat and Account
As a customer you get an endpoint that is realted to an AccountID. So all API calls know which accountID requested them (for example if you want to implement isolation levels). So all openstack implementation (behind the endpoint) can be aware of this AccountID.
Question: About the seperation of authentication and accounts, how does it work together?
Answer: The autherization is a real problem, how to write auth policies that includes this accountID string is a difficult problem.
Live Migration of virtual machines to another physical machines is required by the cloud hosting provider for maintenance, software updates, patches to hypervisor and BIOS. Another use case is a power user that wants to have fine grained control of which VM runs on which physical machine.
An analysis of this requirement shows that two concurrent migrations may conflict. This could be resolved by a “reserve resource” API. Which leads to a broader requirement of power users that want to reserve a complete physical machine.
XEN/VMWare/KVM have different limitation on the live migration which needs to be taken into account.