|
1
|
- Barton@VelocitySoftware.com
- HTTP://VelocitySoftware.com
- HTTP://LinuxVM.com
|
|
2
|
- Velocity Software
- Performance Management Infrastructure
- Performance Analysis
- Operational Alerts
- Capacity Planning
- Accounting/Charge back
- Importance of technology
- Z/VM technology
- Linux (and SUN, NT, AIX, etc) Agent technology
- Showcase Demonstration, Live
|
|
3
|
- Founded 1988 to provide VM Performance Software and Services (Big party
at SHARE, Summer 2008)
- ESAMAP,ESAMON
- ESATCP, ESAWEB
- zTUNE
- Performance Workshops, Education
- Next performance workshop JUNE
- Performance seminars scheduled often
- The best marketing is education
|
|
4
|
- IBM Partner in Development since 1989
- Participate in IBM's VM Early Support Programs
- Every VM Early Support Program since 1988 (XA, ESA, z)
- Relationship with IBM’s Linux lab in Boeblingen
- Performance research
- Customer problems
- Redbooks
- Conference participation to present research
- SHARE
- GSE
- CMG
- Local VM/Linux user groups
|
|
5
|
|
|
6
|
- This is SHARED resource environment,
- z/VM Performance critical
- Any One server can impact all applications
- This is not z/OS
- This is not a mature environment
- Some metrics are not yet available
- This is not distributed Environment
- We do not have cycles to waste
- We DO have capacity planning, chargeback requirements
- Tools are needed that understand the environment
|
|
7
|
- Instrumentation Requirements
- Performance Analysis
- Operational Alerts
- Capacity Planning
- Accounting/Charge back
- Correct data (Virtual Linux CPU data wrong)
- Capture ratios
- Instrumentation is NOT the performance problem
|
|
8
|
- Why Performance Analysis:
Service Levels.
- Diagnose problems real time
- Manage Shared resource environment
- Any application may impact other applications
- Infrastructure Requirements
- Analyze all z/VM Subsystems in detail, real time
- (DASD, Cache, Storage, Paging, Processor, TCPIP)
- Analyze Linux
- (applications, processes, processor, storage, swap)
- Historical view of same data important
- Why are things worse today than yesterday?
- Did adding new workload affect overall throughput?
|
|
9
|
- Why Capacity Planning: Future Service Levels
- How many more servers can you support with existing z9?
- What is capacity requirements for an application?
- Avoid crises in advance
- Consolidation Planning – Projecting requirements of the next 1000
servers
- Infrastructure Requirements
- Performance database (long term)
- z/VM AND Linux data
- Resource requirements by Server, Application, User
- z/VM and z/Linux data must be usable by existing planners
- Interface to MICS, MXG, CIMS, TDS
|
|
10
|
- Why Chargeback?
- Distributed chargeback model is by server
- Shared chargeback model is by resource utilized
- Convincing customers to move applications to “z”
- Encourages efficient/effective resource use
- Infrastructure Requirements
- Identify Resource by server
- Identify Resource by Linux Application
- High capture ratio
- Every site does it differently, so flexible data is key
|
|
11
|
- Operational Requirements
- Operations will manage 100’s (1000’s) of servers
- Requires active performance management
- Alerts for processes in loops, disks 90% full, missing processes
- One test server in a loop impacts all other servers
- Requires active performance management
- Infrastructure Requirements
- Fast problem detection
- Interface to SNMP management console (HP, IBM, CA)
- User tailored alerts
- Web based alerts
|
|
12
|
- Performance data requirements
- Valid, correct – CPU data typically wrong or very wrong.
- Linux getting better with SLES10/RHEL5
- z/VM and Linux data integrated?
- Helpful in solving problems?
- Validate benefits of tuning
- Historical data requirements
- Capacity Planning input
- Problem Analysis
- Linux
- z/VM
- Accounting / Charge back
- By server, by application, by process, by Linux userid
- Manage Infrastructure cost
- Turning off agent solves the performance problem?
|
|
13
|
|
|
14
|
|
|
15
|
- Linux (and networks) adds requirement
- Correct data
- Complete data
- Low cost data
- Support requirements:
- z/VM 3.x, 4.x, 5.1, 5.2, 5.3, next….
- SLES 7,8,9,10 (Installations still have 7 and 8)
- RHEL 3,4,5
- Other distributions
- Other platforms
- Must support:
- Performance tuning
- Capacity planning
- Operational alerts
- Chargeback/Accounting
|
|
16
|
- Valid and Correct?
- Process data from Linux under z/VM is wrong
- All process accounting based on timer ticks
- Corrected in SLES 10, RHEL5
- TOP, ALL other agents “lie” when under z/VM
- Sample of factor of 10
- Well known issue since 2001
- HTTP://velocitysoftware.com/present/CaseAFS
- Leads to solving performance problems?
- z/VM owns the shared resources
- “Native” tools will not detect many problems
- “performance was unexplainably bad so we abandoned the
project”
- Skills, experience and Education help…
|
|
17
|
- Operational cost of agents
- Does your agent use 2%? 5%? 95%? of a processor per image?
- Does this matter on distributed servers where agents were created?
- Will local data collection fill up your file system?
- Does turning off performance monitoring solve the performance problem?
- Do you only turn on your agent when you have a problem???
- Customer quote: an agent that costs 1% of a processor will cost me 10 IFLs
- Agents must provide correct data
- Is your data correct? Or wrong by order of magnitude?
- Prior to SLES10/RHEL5, all “Virtual” agents provide wrong
data
- Why collect bad data?
|
|
18
|
- Performance Data infrastructure existed (ESAMON/ESAMAP)
- PDB already existed for performance analysis and Capacity Planning
- Data presentation tools existed
- Data source needed for Linux and Network:
- Passive agent (do not measure idle servers)
- Low overhead (want to monitor 100 / 1000 servers under z/VM)
- Most Agents developed for Intel did not care about overhead
- Open Source (fast development time)
- Standard interface
- SNMP: Standard interface
- TCPIP application provided by TCPIP Vendor
- Used to collect network, host data from NT, SUN, HP
- NETSNMP available for Linux - Meets all requirements
- (Distributed with RHEL 3,4,5 SLES 7,8,9,10)
|
|
19
|
- NETSNMP
- Default from redhat or Suse uses about 1% CPU
- Velocity Software version uses less than .1%
- Velocity Software version for idle server: 0.01%
- Currently installed on >10,000
of z/Linux servers
- (Actually, installed on all of them, but used on >10,000)
- RMFPMS (IBM’s direction 2003)
- Active agent, writes data to log
- Not recommended because of overhead
- New “Monitor Record” (IBM’s direction 2005)
- zLinux only, non-standard
- No process data
- CPU data can not be corrected
- What problem are we trying to solve?
- Proprietary agents
- Written for Intel or other Unix platforms, CPU cost didn’t matter
- Can be Expensive
- Ask for references for “z”
|
|
20
|
|
|
21
|
- Low cost agent - Cost of snmpd very low (.1%-.4%)
- (Objective; Determine what process spikes at 1am Monday morning)
- See “http://velocitysoftware.com/applic.html” for full listing (24 linux
servers)
- Report: ESALNXA LINUX
HOST Application Report
- ----------------------------------------------------
- Node/
Process/
ID
<---Processor Percent--->
- Date
Application
<Process><Children>
- Time
name
Total sys user syst
usrt
- -------- ----------- ----- ----- ---- ---- ---- ----
- 00:15:57
- LINUX16 *Totals*
0 16.9 2.5 11.6 1.9 1.1
-
amqpcsea 674 0.4 0.1 0.3 0 0
-
amqzxma0 600 0.8 0.1 0.7 0.0 0.0
-
cron
473 2.1 0.2 0.2 1.7 0.0
-
dsmc
938 0.1 0.0 0.0 0.0 0.0
-
httpd
31993 2.8 0.2 2.5 0.0 0.1
-
java
32066 8.0 1.3 6.7 0 0
-
kjournal
85 0.1 0.1 0 0 0
-
kswapd
6 0.1 0.1 0 0 0
-
qpea
4642 0.1 0.0 0.1 0 0
-
qpmon
4674 0.8 0.1 0.7 0.0 0
-
snmpd
361 0.1 0.1 0.0 0 0 ß=====
-
sshd
370 1.0 0.0 0 0.1 0.9
- LINUX13 *Totals*
0 2.7 0.8 0.3 0.6 1.0
-
cron
421 1.2 0.0 0.0 0.5 0.7
-
init
1 0.2 0.0 0.0 0.0 0.1
-
master
394 0.3 0.0 0.1 0.0 0.1
-
ntpd
453 0.8 0.6 0.2 0 0
- LINUX15 *Totals*
0 1.8 0.3 0.5 1.1 0.0
-
amqzxma0 844 0.2 0.0 0.1 0 0
-
cron
457 1.1 0.0 0.0 1.1 0.0
-
qpmon
4726 0.1 0.0 0.1 0 0
-
snmpd
354 0.4 0.2 0.2 0 0 ç======
|
|
22
|
- High cpu capture ratio
- Report: ESALNXV LINUX
Virtual Processor Analysis Report
- -----------------------------------------------------------------
- Node/
VM
<Linux Pct CPU> <Process Data> Capture Prorate
- Name ServerID Total
Syst User Total Syst User
Ratio Factor
- --------- -------- ----- ---- ---- ----- ---- ---- ------- ------
- 10:03:00
- NEALE1
LNEALE1 100.0 11.4
88.6 100.2 11.5 88.7 1.002 1.000
- -----------------------------------------------------------------
- Report: ESALNXP LINUX
HOST Process Statistics Report
- ---------------------------------------------------------
- node/
<-Process Ident-> Nice <------CPU Percents---->
- Name ID PPID GRP Valu Tot sys user syst usrt
- --------- ----- ----- ----- ---- ---- ---- ---- ---- ----
- 10:03:00
- NEALE1
0
0
0 0 100 0.43 3.35 11.0 85.4
- kswapd0 100 1 1 0 0.12 0.12 0 0 0
- snmpd 1013 1 1012 -10 0.13 0.03 0.10 0 0
- sh
3653 3652 30124 0 52.7 0 0 9.37 43.3
- gmake 9751 9750 30124 0 43.4 0.02 0.02 1.37
42.0
- sh
10129 9751 30124 0 0.02 0.02 0 0 0
- sh
10130 10129 30124 0 0.63 0.03 0.23 0.28
0.08
- cc1 10307
10306 30124 0
3.12 0.18 2.93
0 0
- rpmbuild 30124 16382
30124 0 0.07
0.03 0.03 0 0
- sh
30125 30124 30124 0 0.02 0 0.02 0 0
- gmake 30126 30125
30124 0
0.02 0 0.02 0 0
|
|
23
|
|
|
24
|
- New installations lack z/VM and Linux on z/VM tuning skills
- Velocity Software’s objective is to ensure our customer
performance problems are resolved – quickly.
- zTUNE includes configuration guidance, health checks when ever
installation requests, and assistance in all areas of Linux on z/VM and z/VM performance
|
|
25
|
- Focus more now on simplifying problem resolution
- Customer reports that application people complaining about zLinux
performance:
- Report: ESATUNE Tuning
Recommendation Report
- Monitor initialized:
on 2084 serial 9ABED
- ---------------------------------------------------------------
- The following changes are suggestions by Velocity Software
- to enhance performance of
this system.
- However, Velocity Software takes no responsibility -
- all tuning is the
responsibility of the installations.
- Please call 650-964-8867 if you have any questions about
- these values, or
suggestions on report enhancements.
- USR2 User LINUX160 is paging excessively (75.0 per second)
- This
user can be protected using SET RESERVED
- SPL5 Spool utilization is 100% full.
- Perform
Spool file analysis and purge large
- spool
files, or force users currently writing
-
excessively to spool.
- *****zTUNE Evaluation
*************
- XAC1 User total PROCESSOR WAIT excessive at 33 percent.
- Current
reporting threshold set to 20.
- This is
percent of inqueue time waiting for
- specific
(PROCESSOR)resources to become available.
- LPR3 LPAR share is too low, causing USER CPU Wait
- VM LPAR
allocated share: 0.94 percent of total
- VM LPAR
used 389 percent of allocated share
|
|
26
|
|
|
27
|
|
|
28
|
|
|
29
|
- Alerts
- User tailorable
- 3270 based, web based, and / or SNMP
- Alerts can be set on any variable or calculated variable
- Linux alert examples:
- Disk full
- Missing processes (requires complete data)
- Looping processes (requires correct data)
- z/VM alert examples
- Page/spool space full (avoid abends)
- Looping servers
- DASD service times
- Network alert examples
- Transport errors
- ICMP rates
- Bandwidth thresholds
|
|
30
|
- Linux tries to use all real storage
- Linux minimizes storage used for swap
- Swap historically was slow SCSI device
- One Vdisk experiment with linux swapped 40,000 per second
- First case study:
- Process took hours, system paged significantly
- Reduced size of Linux Virtual Machine, 128mb to 24mb
- Defined 100MB Swap disk
- Linux reduces storage requirement
- Process took minutes
- Virtual Disk paged out when not in use
- This works!!! Paging
greatly reduced, Linux performance greatly improved!!!
- This research critical to using Collaborative Memory Mgmt (CMM)
|
|
31
|
- Change 128MB Server to 24MB with 100MB Swap
- Reduction of Overall Storage Requirements of 100MB
- Unused VDISK is paged out
|
|
32
|
|
|
33
|
|
|
34
|
|
|
35
|
|
|
36
|
|
|
37
|
|
|
38
|
- Storage map - CAPTURE RATIOs
always critical for any instrumentation:
- CP Fixed Storage
- CP Non Pageable
- Free storage (only VMDBLKs)
- Frame tables
- Dynamic Paging Area(DPA)
- System Execution Space
- User storage, MDC, Address Space, Vdisk
- Available List (greater/less than 2gb)
- Report: ESASTR1 Main
Storage Analysis
Velocity Software, Inc.
ESAMAP 3.6.0 05/15/06
Page 57
- Monitor initialized: 06/06/05 at 08:42:16 on 2064 serial 11542 First record
analyzed: 06/06/05 08:42:42
- -------------------------------------------------------------------------------------------------------------------------
-
Users
<-----------------------------Pages---------------------------------------------------------->
-
Loggd System Fixed
Non- Free Frame
<Available> Systm
User NSS/DCSS <-AddSpace> VDISK
<MDC> Diag
Capt-
- Time
On Storage Store Pgble Stor Table <2gb >2gb ExSpc Resdnt Resident Systm User Rsdnt Rsdnt 98 Ratio
- -------- ----- ------- ----- ----- ---- ----- ----- ----- ----- ------
-------- ----- ----- -----
----- ---- -----
- 08:45:42 22
7864304 2907 3816 5 61440 513K 6292K 33150 778370 8408 1090 6235 0 133K 333 0.995
- **************************************************Summary**********************************************************
- Average: 22
7864304 2907 3816 5 61440 513K 6292K 33150 778370 8408 1090 6235 0 133K 333 0.995
|
|
39
|
- ESALPS Meets Data Requirements:
- Sufficient for performance, capacity planning, accounting, Operations
- Linux and z/VM data – Integrated
- Complete and correct data
- ESALPS Meets Infrastructural requirements
- Support all releases (SLES7,8,9,10, RHEL 3,4,5, z/VM V3,4,5…)
- Standard interfaces
- Low resource requirements
- ESALPS References (many):
- Many installations instrument hundreds of servers today on single LPARs
- zTUNE (Health Check for z/VM, Linux)
- zTUNE “http://velocitysoftware.com/products.html”
- Performance Education:
- Performance education, see: “http://velocitysoftware.com/workshop.html”
|