Archive for the ‘dsee’ Category

OpenSolaris 2009.06 is out

Monday, June 1st, 2009

I downloaded the OpenSolaris 2009.06 release and installed it on top of VirtualBox over lunch.   The previous release (2008.11) had a lot of good desktop support, this version has added a lot of enterprise-class features like automated installations, UltraSPARC support, multi-protocol SCSI target (COMSTAR), crazy-cool network virtualization (Crossbow) and much more.  You can check out the full set of new features at: http://www.opensolaris.com/learn/features/whats-new/200906/

While there is always room for improvement, I think given OpenSolaris’ design, feature set,  and maturity it is now in a place where I’d consider it a viable option for production deployments on x64 systems.  I’d still hold off for a little while on SPARC since I think it may take a bit for all the auto-install and boot-related code to gain maturity there.

Sun Directory Server support tool – Dirtracer

Wednesday, April 29th, 2009

I just watched Lee Trujillo give a presentation and demo of his Dirtrace, his cool tool for gathering support data on Sun’s DS.  The data captured is very helpful for troubleshooting Sun DS problems in a variety of situations ranging from hangs to replication problems to performance problems.  I’ve used it in the past, but the latest version looks even easier to use and captures more data.  If you manage Sun’s Directory Server on Solaris, Linux, or HP/UX, pull down a copy and check it out.

Using IBM Quickr with Sun Directory Server

Thursday, April 2nd, 2009

A customer was testing out Lotus/IBM’s Quickr collaboration software and using Sun’s Directory Server as the user store.  One of the system admins mentioned that queries searching for people were glacially slow.  We investigated by checking out the access log to look for slow queries and saw that Quickr was running un-indexed queries that searched against cn,  givenName, and displayName.  These queries were taking about 30 seconds to run since the directory server had to do the DB equivalent of full-table scans.  We checked the indexes and saw that displayName wasn’t indexed.  After adding an index for the displayName attribute the queries were snappy, taking less than a second.

Troubleshooting file descriptor problems in Sun Directory Server

Wednesday, April 1st, 2009

I have a customer that was encountering a problem where their test directory server (running Sun DS 5.2p4) was constantly running out of file descriptors.  They had bumped the allowed number of file descriptors up to 4096, and that slowed the occurrence of the error, but the  root cause had not been diagnosed yet.  We first took a look using netstat and saw:


netstat -an | grep ^$THEIR_IP.389 | grep -c ESTAB

4012

So we have confirmed the problem is as stated.  Often this problem is caused by applications that don’t use connection pools properly and open way too many connections.

Next we checked under cn=monitor to see which accounts were connected to the directory server:

/bin/ldapsearch -T -D cn=directory\ manager  -h ldap -b cn=monitor -s base objectclass=* connection | awk -F: '{ print $7 }' | sort | uniq  -c

2500  uid=application_xyz,ou=apps,dc=example,dc=com

1200  uid=application_foo,ou=apps,dc=example,dc=com

220  uid=application_shizzle,ou=apps,dc=example,dc=com

So it looks like applications xyz and foo are the primary culprits.

We’ll also count the established connections by IP address to tell which machines are creating the most connections:

netstat -an | nawk  '$1 == "$LDAP_IP.389" && /ESTAB/ { print $2}' | cut -d. -f1-4 | sort | uniq -c
2700   10.10.1.168
400    10.10.1.169
300    192.168.1.1
...

We  know that the server 10.10.1.168 is the machine with the most connections coming from it.  We then hoped over to 10.10.1.168 (running an application server) and took a look from its point of view:

netstat -an | grep -c $LDAP_IP.389

2

Woah!  Houston we have a problem.  From the LDAP server’s point of view, it has 2700 connections from the app server.  From  the app server’s point of view, it has 2 connection to the LDAP server.  If we had seen symmetry between the app server’s network connections and the directory server’s network connections, it would have been an application level problem of allocating too many connections.  In this case, since the connection count is extremely unsymmetrical, it looks like there is a firewall/load-balancer or other network device in the path between these two machines which is killing connections from the application server but not symetrically telling the LDAP server the connection is dead.  We ask the network team to investigate and in the meantime put in a work-around of setting an idle timeout on the LDAP server.  This lets the directory server kill any connections that it hasn’t received an operation from in some time period (we set it to a generous 12 hours) and we immediately see the number of established connections drop down to a few hundred.  Problem solved.

Viewing the current status of LDAP servers in Directory Proxy Server 6.3

Friday, March 20th, 2009

The dpconf command for managing DSEE Directory Proxy Servers (DPS) shows you a lot of information about the ldap-data-sources (the back-end directory servers), including whether or not they are administratively enabled or disabled.  One status that I couldn’t find was whether a given back-end server was actually considered on-line by the DPS itself.  It turns out the current status information is available, but only by digging through the cn=monitor entry on the DPS instance.  Bear in mind you will need to authentication as the proxy’s root DN (default is “cn=proxy manager”) to dig it up.   Also, it appears that logic that implements cn=monitor doesn’t hande all search criteria perfectly, so we will use a little bit of grep magic to reduce the result set to what we want.  Here is an example ldapsearch to get the current status of servers:

ldapsearch -D “cn=proxy manager” -j ~/.dmpass -b cn=monitor serveravailable=*  \
| egrep “^backendServer|^serverAvailable”

backendServer: testdscc01:3998/
serverAvailable: true
backendServer: testds05:389/
serverAvailable: true
backendServer: testds06:389/
serverAvailable: false
backendServer: testds07:389/
serverAvailable: true

In this case it would be good idea to check testds06 and see if the server is down, or perhaps it is failing a DPS health check for some other reason.

If you want to dig a little deeper into cn=monitor, you can find a lot of detailed information about the thread that is monitoring a particular data source. Here is an example of one pointing to an LDAP server that is unavailable:

dn: cn=Proactive Monitor for testds06:389/,cn=Monitor Thread,cn=Resource,
 cn=testdps01:/opt/dsee/instances/dps,cn=Instance,cn=DPS6.0,cn=Product,cn=monitor
objectClass: top
objectClass: extensibleObject
cn: Proactive Monitor for testds06:389/
started: true
running: true
startTime: [03/19/2009:12:20:36 -0700]
operationalStatus: OK
statusDescription: The monitor thread is fully operational
threadId: 19
threadStack: java.lang.Thread.sleep(Native Method) /  com.sun.directory.proxy.server.ProactiveMonitorThread.runThread(ProactiveMonitorThread.java:122) /  com.sun.directory.proxy.util.DistributionThread.run(DistributionThread.java:225) /
backendServer: testds06:389/
serverAvailable: false
checkInterval: 30000
additionalCheckType: op connection
totalChecks: 594
availabilityChecksFailed: 2
additionalChecksFailed: 0

Command line completion in bash for DSEE and ZFS

Tuesday, March 17th, 2009

I’m working on an environment for a customer where we are using Directory Server Enterprise Edition (DSEE) and ZFS.   On the DSEE side, my co-worker Mitch and I were inspired by Ludovic’s post a while back about setting up command line completion for  dsconf and dpconf.   One small item Mitch noticed was that in the original examples, if you had a command name that didn’t contain a hypen (like dsconf import), it wouldn’t be completed (but command like dsconf get-server-prop would be).

Here is what Mitch came up with:

for cmd in dsconf dsadm dpconf dpadm; do
  complete -W "`$cmd --help | \
    perl -lane 'print $F[0] if \
      (/^The accepted values for SUBCMD/ .. \
       /^The accepted values for GLOBAL_OPTS/ \
       and not /^The /)'`" $cmd
done

For ZFS, check out this script on Big Admin by Mark Musante.
Mitch did a small update to the script which made the list of sub-commands on the fly to account for additions. Mitch’s updated version is available here.

Sun Directory Server users – patch 6.3.1 is out

Monday, March 9th, 2009

There are quite a bit of fixes in the DSEE 6.3.1 patch that was released in the last few weeks.  If you use Sun’s Directory Server or Directory Proxy Server, you should definitely check out the release notes.

Creating an LDAP environment to test a tool

Thursday, March 5th, 2009

Yesterday I spent some time helping a developer who is creating a tool for synchronizing accounts between a RDBMS and an LDAP server and thought I would document the process.  The tool basically makes a request to the RDBMS for all the accounts sorted by a specific attribute, then makes a similar request to the LDAP server.  The customer expected the number of records to max out at about 200,000 entries.

The first thing we did was spin up local copies of Mysql and the LDAP server.  I’m not going to document the mysql part since there are a million pages available on that.

Note that the DSEE 6.3 binaries were already installed on my test machine under /opt/dsee6.  I personally prefer the zip based distribution.

Here are the steps for the LDAP server:

Step 1 – create a new instance and add a suffix for the data

# export PATH=$PATH:/opt/dsee63/ds6/bin

# dsadm create -w /tmp/dspassword /data/ds3

# dsadm start /data/ds3

# dsconf create-suffix dc=example,dc=com

Step 2 – create an sample LDIF with 200k entries

# cd /opt/dsee63/dsrk6/bin/example_files

# cp example.template 200k.template

# vi 200k.template (change numusers value to be 200000 and added employeeNumber as a sequentially valued attribute)

 # ../makeldif -t 200k.template -o 200k.ldif

Step 3 import the sample data

# dsadm stop /data/ds3

# dsadm import -i /data/ds3 /opt/dsee63/dsrk6/bin/example_files/200k.ldif

 # dsadm start /data/ds3

Step 4 create an account with proper settings

We created an account uid=dbsync,ou=admins,dc=example,dc=com that will be used by the application to perform the search and updates.

Note that we had to adjust 2 attributes on the dbsync account. We added the following operational attributes/values:

nsSizeLimit: -1

nsLookThroughLimit: -1

We also added an ACI to the ou=people,dc=example,dc=com branch giving the dbsync user  full permissions.

aci: (targetattr !=”aci”)(version
3.0;acl “db sync – full permissions”;allow (all)(userdn = “ldap:///uid=dbsync,ou=admins,dc=example,dc=com”);)

The tool was now able to pull back all 200,000 entries, but was not able to make server-side sort request.

To enable server-side sorting we had to create a VLV index.

Step 5 – VLV index creation

We used the following LDIF to create a VLV index sorting on employeenumber

dn: cn=people_browsing_index,cn=example,cn=ldbm database,cn=plugins,cn=config
objectClass: top
objectClass: vlvSearch
cn: Browsing ou=People
vlvBase: ou=People,dc=example,dc=com
vlvScope: 1
vlvFilter: (objectclass=inetOrgPerson)
aci: (targetattr=”*”)(version 3.0; acl “VLV for Anonymous”;
allow (read,search,compare) userdn=”ldap:///all”;)

dn: cn=Sort employeenumber,cn=people_browsing_index,
cn=example,cn=ldbm database,cn=plugins,cn=config
objectClass: top
objectClass: vlvIndex
cn: Sort employeenumber
vlvSort: employeenumber

We then had to use dsadm to create the index

# dsadm stop /data/ds3

# dsadm reindex -l  -t “Sort employeeNumber”  /data/ds3 dc=example,dc=com

# dsadm start  /data/ds3

After these changes the tool was now able to query all 200,000 entries and have the server return it as a sorted list.

We also ended up doing 2 small performance tweaks to the server, but these weren’t strictly required:

dsconf set-server-prop db-env-path:/tmp/ds_cache

dsconf set-server-prop db-batched-transaction-count:5

dsadm restart /data/ds3

Sun Directory Server – Replication over WAN

Wednesday, November 19th, 2008

Yesterday we had to modify a huge number of entries in our directory server environment.  The updates were all done in one data center, and they went extremely fast.  When I later went to check on the replication, I noticed  the data was replicated much slower to the remote data center than I expected.  Given that the other data center is a pretty decent WAN hop awa,  I decided to try changing some of the replication agreement parameters.  To do this you use:

dsconf set-repl-agmt-properties $suffix  $property:$value

You can see more information on the properties and suggested values at the Replication Over a WAN page of the DSEE Admin Guide.

In our case, I did some quick experimenting and found the values suggested for WANs seemed to work pretty well and gave us about a 3x-4x boost in performance versus the defaults.  The changes take place immediately, there was no need to restart the servers or replication agreements.

To measure how fast replication was going I would go to the remote server and run something like

grep 2008:10:23 logs/access | grep -c MOD

where 10:23 was the previous minute, to count how many MOD operations had come through in one minute.

At least it wasn’t 92

Monday, November 17th, 2008

 

Monday morning through Sunday night I worked 91.5 hours, which was my heaviest work week ever.  The team I am on was in a big sprint to get Sun’s DSEE software rolled out on a huge scale across multiple data centers and it all came together.   The software installation and configuration itself was easy to manage across dozens of hosts thanks to the fantastic CLI.   The x4600 servers performed very well.  Our biggest challenges were coordinating a group of people in multiple locations with differing levels of familiarity of the machines and software stack.  There were a few cases where tired fingers made a typo and wiped out some data, but using zfs rollback (and smart use of snapshots) made the recovery time in under a minute once the problem was detected.

The funniest moment of the crazy weekend was when my wife saw me working in my office at 8am Sunday morning and asked what time I went to bed.  I answered “about 6:30″.  The look of horror on her face as she realized that mean less than 1.5 hours of sleep was awesome.

I hope I don’t have craziness like the last week often, but I did feel a great sense of accomplishment when we were done.

Tip of the day:

If you get a coredump on Solaris, run

echo ‘$C’ | mdb $name_of_corefile

to get the stacktrace that actually caused the core.


Copyright © 2010 williamhathaway.com. All Rights Reserved.
No computers were harmed in the 0.440 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.