Archive for the ‘technology’ Category

Creating a slow operation log for OpenDS

Monday, November 2nd, 2009

For anyone that has spent much time looking at MySQL performance, you will be familiar with the ‘slow query log’.  This basically is a log where queries that took over some amount of time would get recorded.   For kicks, I tried implementing a similar hook for OpenDS.  My current version is in pretty rough shape (not very efficient or configurable), but seems to work.  I started from a copy of the TextAccessLogPublisher.java file and created a new one called TextSlowAccessLogPublisher.java.  My logic is basically:

  • create a hash table
  • emptied out all the log XYZIntermediateMessage and connect/disconnect methods
  • when a request comes in, store the text to log in the hash table (keyed off connectionID and opNumber) instead of outputting it (changed the logSearchRequest, logModifyRequest, … methods)
  • when a request is finished processing, we check the elapsed time (etime)
    • if the elapsed time greater  than our or equal to our threshold
      • print the request info we stashed in the hash table and delete it
      • print the response info
    • if the elapsed time is less than our threshold
      • delete the request info from the hash table, don’t print anything

There are a few more things I want to do:

  • Make the ‘slow operation threshold time’ dynamically changeable (looks like I will need to mess with configuration objects since I want to add an additional parameter not in the standard access log type)
  • Add extra information to the output format such as authorization DN (and potentially client connection info if not too hard to retrieve)
  • Instead of all the text formatting for every request, just put the Operation object into the hash table, since the majority of operations won’t ever get printed we shouldn’t burn CPU formatting them.  The operations would only be formatted to text if the operations end up being slow and printed.

Files

Central PA Open Source Conference

Saturday, October 17th, 2009

I had a good time attending the CPOSC event  today, which was held at Harrisburg University.  Got to see lots of old friends and acquaintances and enjoyed the speakers.  At only $35 (which included a t-shirt and food), it was an awesome bargain.  Thanks to John and Eric for doing such a great job organizing the event and all the presenters for sharing their knowledge.

Testing – please ignore

Sunday, September 6th, 2009

Just wanting google to start indexing my wiki.

Sun Messaging Server login hang

Sunday, June 28th, 2009

30 second summary for those that don’t want to read the troubleshooting details

When using replicated LDAP servers in a Sun Communications Suite deployment, it is important that every connection from a given Convergence (webmail component) instance go to the same LDAP server ,otherwise address book creation can partially fail causing some user logins to webmail to hang.  To fix this, use one of the following techniques:

1) Configure Convergence’s application level failover to point to individual LDAP servers (be sure to switch the host order on alternating Convergence instances to spread the load)

/opt/sun/comms/iwc/sbin/iwcadmin -u admin -W pwdfile  -o ugldap.host -v ldap1:$port ,ldap2:$port

(you will also need to restart the web container for this to take effect)

2) Use Directory Proxy Server to route writes to a preferred master

3) If pointing at a HW load balancer virtual IP, use a distribution algorithm that has backend server persistence based on originating IP.  Note that with a few machines this might not actually balance out well, so verify you aren’t overloading one LDAP instance.

Background

A customer of mine is deploying Sun’s Communications Suite (aka Messaging, Calendar, and IM servers) and was testing their custom provisioning tool.  A few accounts had been created that worked fine but one of the accounts would just hang when trying to login to webmail.  The screen would show the application initialization progress bar stuck at 84% and indicate it was dealing with the address book.

I verified that the account would hang and then took a look at the account’s main LDAP entry, which looked fine.  I then checked the account’s LDAP address book data.  The bad account had 3 LDAP entries in the address book branch, but a good account should have 4. Taking a look at the iwc.log from Convergence, I could see an error:

ADDRESS_BOOK: ERROR from com.sun.comms.client.ab.wabp.WABPEngineServlet  Thread httpSSLWorkerThread-80-4 at 2009-06-27 14:33:05,586 – pstore object couldn’t be created for user :baduser

At this site we have a pair of replicated LDAP servers behind a pair of load balancers that are used by the messaging components. Sun’s Directory Server has a loose replication model that usually works fine, but you can run into a rare race condition when applications are adding inter-related entries to different masters in a rapid fire succession.  Convergence was initially pointing at the load balancer virtual IP to reach the LDAP servers.

When I checked the logs on the LDAP servers, I could see that  Convergence had tried to create address book entries when the user first logged in, but had done so over several different LDAP connections which via the load balancer went to different LDAP servers. It created a parent entry on ldap1, then in a connection to ldap2 tried to create a dependent entry, which failed.  Convergence then created  another version of the parent entry on ldap2 (which worked, but caused a replication conflict).  Later attempts to login ended up adding some dependent entries, but it was still in an usuable state.

When things are working correctly, you will see a LDAP operation pattern that looks like:

[27/Jun/2009:16:21:39 -0400] conn=566625 op=5 msgId=948 – ADD dn=”piPStoreOwner=$user,o=$domain,o=PiServerDb”
[27/Jun/2009:16:21:39 -0400] conn=566636 op=1 msgId=950 – ADD dn=”piEntryID=random1,piPStoreOwner=$user,o=$domain,o=PiServerDb”
[27/Jun/2009:16:21:39 -0400] conn=566636 op=2 msgId=951 – ADD dn=”piEntryID=random2,piPStoreOwner=$user,o=$domain,o=PiServerDb”
[27/Jun/2009:16:21:39 -0400] conn=566636 op=3 msgId=952 – ADD dn=”piEntryID=random3,piPStoreOwner=$user,o=$domain,o=PiServerDb”

The fix

In order to fix the account and the problem in general, we ended up deleting the skeleton address book entries for the user in question and used iwcadmin to change Convergence to point to individual LDAP servers in a failover mode. Since we had two Convergence instances and two LDAP instances, it was easy to flip the perferred order so that LDAP load will be well-balanced.

Things that could be improved

1) Convergence could give a better error experience to the user instead of a just hanging.  Perhaps timing out after 30 seconds with a message “There is a problem with initializing your address book, please ask your administrator to investigate”.

2) Convergence could use a single LDAP connection when performing address book creation for any given user

3) Sun’s Directory Server could have an assured replication model (this is available in the OpenDS 2.0 release candidates)

OpenSolaris automated installs

Sunday, June 21st, 2009

I took a test drive of the OpenSolaris automated installer (AI) utility today.  This is the replacement for the venerable Solaris jumpstart technology and is the only way to install OpenSolaris in a hands-off approach.  Based off my 2 hours of so of perusing the documentation and working with it, I think it is still a work in progress (e.g. I didn’t see any way of having jumpstart-like custom finish scripts).

The first thing I did was read through the automated installer docs.  There isn’t a lot there yet, so it is a quick read, but it will help you get the basics.  Another good place to look for information is the OpenSolaris forum for installer technology (aka project Caiman).

There appear to be several components involved, at least for x86 based clients.  I haven’t yet tried SPARC so am not sure how it differs.

1) DHCP server – to hand out an address and the PXE boot parameters to a client

2) TFTP server – to serve the PXE boot image

3) Install server – an Apache instance that hands back the XML configuration files and the mini root.  In my case it was running on port 5555.

4) Package repository – for fetching the actual packages to install. By default it is pkg.opensolaris.org/release, but you could change it to a different repository (including a mirror hosted locally if you had one).

Note that there is no NFS service needed, this should make firewall admins very happy.

The lab

I built a lab environment consisting of two virtual machines inside VMWare on my desktop.

To keep things simple, the first VM was called “server”, and the second “client”.  The purpose of my lab environment was to configure the AI environment on the server machine and complete a hands-off install on the client machine.

The VMs were configured as follows:

Server

  • RAM – 800M
  • Disk – 16GB
  • NIC1 (e1000g0) – bridged to public network
  • NIC2 (e1000g1) – host-internal network

I also went into the VMWare networking tool and disabled VMWare’s built-in DHCP server on the host-internal network to ensure that my server would be handing out any DHCP responses.

Client

  • RAM 800M
  • NIC1 (e1000g0) – host-internal network
  • Disk – 8GB   Note: when I first tried an 8 GB disk  AI complained that it couldn’t find any suitable disks because  it wanted at least a 12.5GB disk.  You can work around it by explicitly specifying which disk you want to install on, in which case the default minimum size limit won’t be  triggered.

OpenSolaris Auto Installer Lab

Setting up the server

1) Installed OpenSolaris 2009.06

2) Installed the automated install software

I saw from the docs that I needed the installadm utility, which wasn’t on my system.  I wasn’t sure which package this was from, so I ran:

pkg  search installadm

this told me I wanted the SUNWinstalladm-tools package.  I installed that using:

pfexec pkg install SUNWinstalladm-tools

3) Download the automated install ISO image

The installer needs an architecture (x86 or SPARC) specific ISO image for each type of client that will be supported.  Since I was going to install on x86, I downloaded the appropriate image from genunix: http://genunix.org/distributions/indiana/osol-0906-ai-x86.iso

4) Create an install environment under /auto-install named ai-x64 on the 192.168.72.0 network starting at .10 and using 5 addresses

pfexec installadm create-service -n ai-x64 -i 192.168.72.10 -c 5 -s /export/home/wdh/Downloads/osol-0906-ai-x86.iso /auto-install

5) Configure dhcpd to run on the appropriate interface

The installadm command configured dhcpd, but it was running on the e1000g0 interface by default.  For my environment, I needed to switch that to e1000g1 so it would see the requests from the client VM.

pfexec dhcpconfig -P INTERFACES=e1000g1
svcadm disable dhcp-server
svcadm enable dhcp-server

6) Install squid

We will need squid (or some other proxy) since we aren’t running a local repository server and the client machine will need to be able to fetch packages from pkg.opensolaris.org.  We will tell the client machine to use the proxy on the server.

pkg search squid

figure out the package name I am looking for is SUNWsquid

pfexec pkg install SUNWsquid

svcadm enable http:squid

I was pleasantly surprised how easy that was.  If you are on a non-NATed network, you will likely need to edit the squid configuration file to allow access to your clients.

7) Customize the default AI manifest (I’ll call mine ai_proxy.xml)

cd /auto-install/auto_install

make a copy of the default manifest and name it something more specific

pfexec cp default.xml ai_proxy.xml

added <ai_target_device><target_device_name>c8t0d0</target_device_name> </ai_target_device>

so I could use a disk that was smaller then the auto-installer default

added  <ai_http_proxy url=”http://192.168.72.2:3128″/> so it would use the proxy and be able to reach the internet

changed the ai_auto_reboot setting to true, and changed the default user and password from jack to my normal values.

ran installadm to let the AI service know it should use the custom version of the file

pfexec /usr/sbin/installadm add -m ai_proxy.xml -n ai-x64

8) Register the target system as a client

Started the client virtual machine and retrieved the MAC address (  00:0c:29:b6:43:bf )

On the server use installadm to register the client

pfexec installadm create-client -e 00:0c:29:b6:43:bf -t /auto-install -n ai-x64

9) Started the client system in network boot mode

The install succeeded, but it took about 1.5 hours.  I suspect if I had a local repository and was installing on a non-emulated hard disk it would have gone substantially faster.

Overall thoughts

I was happy that it was relatively straightforward to get working, but I think it will be a while before the system has as much flexibility for customizing installs as Jumpstart.  Based on all the traffic I see on the forum, it seems like the AI project has a lot of momentum behind it, so I am looking forward to giving another spin in a few months.  I’d also like to try this with a local mirror of the pkg repository and see how quick the installer will run.

Update on June 24th

I saw this morning that a functional spec for the AI client has been submitted and the project team is asking for comments.  Please read the thread/document and give any feedback you might have.

What is your real support level?

Sunday, June 21st, 2009

A lot of times I hear “don’t worry, we have support for product X” when talking to customers and I find a lot of people consider the ability to get support for a product to be more of a  yes/no checkbox versus a what I feel tends to be more a continuum.

To start off with, I’d like to layout my primary definition of product support as “the ability to resolve problems encountered with the product”.  I know some people may add “the ability to get new versions”, but I’m putting that to the side for now, since that isn’t a very nuanced problem (essentially solved with $ alone for a support agreement if commercial-only software).  I also know some people just use support as “someone to blame so I can slip out of all responsibility despite my job”, but I’ll be ignoring those folks.

For hardware products where you might need to get a replacement for failing disks, motherboards, etc, having support can certainly be a yes/no type answer (although another option might be just buying some extra components that you know are likely need replacing).

For software, besides the standard vendor support contracts available, I think of the following aspects of support:

  • internal employees/contractors – their knowledge of the overall problem space as well as the specific products used.
  • software maturity – the important thing here isn’t just how long it has been around, but how widely used is the functionality you want to implement.
  • platform choice – this doesn’t matter for some generic Java services, but if the product is a compiled program that interacts with the operating system a lot,  make sure there are enough other customers using the same operating system as you.  I’ve seen cases where a product was extremely widely used, but if you were one of the handful running the product on an unpopular platform (e.g. AIX), your chance of having an undiscovered problem and your time to problem resolution was on average much higher than if you were using one of the more popular platforms.  If you can’t find much evidence of people running a product on your platform of choice, consider choosing a different product or if the specific product is key to your organization, consider running it on the vendor’s platform of choice.
  • community – are there active mailing lists/forums/blogs/irc channels for people using this product?
  • documentation (both official and unofficial) – how comprehensive in breadth and depth?
  • boutique consultants – smaller organizations like Percona (MySQL) or people like Richard Elling (ZFS/Storage) may be able to provide expert assistance in a targeted and quick manner, often going much deeper than typical vendor support and much faster to engage than  larger professional service organizations

I think your real support level is made up by all the factors above and they should be considered when evaluating a product’s overall suitability for your organization, don’t think of it just as a checkbox.

Garmin Connect finally coming along

Wednesday, June 10th, 2009

It has been nice watching connect.garmin.com finally start taking shape. Their team has been struggling for the last year and a half with missed deadline after missed deadline, lame functionality and weak excuses on their blog but it looks like they have recently been getting their act together. I noticed you can now upload from all the Garmin fitness products and they have working RSS feeds. There are still a few rough edges, but I am glad to see them making significant progress.

My Garmin RSS feed

Getting the memcached service to work in OpenSolaris 2009.06

Monday, June 1st, 2009

On my default OpenSolaris 2009.06 image I installed memcached with:

pkg install SUNWmemcached

and then tried to get it running with:

svcadm enable memcached

It kept on dieing and respawning and I saw the log file (/var/svc/log/application-database-memcached\:default.log) growing with lines like the following

[ Jun  1 14:42:34 Enabled. ]
[ Jun  1 14:42:34 Executing start method ("/lib/svc/method/memcached start"). ]
can’t run as root without the -u switch
[ Jun  1 14:42:35 Method "start" exited with status 0. ]

So basically memcached was complaining that it was starting as root and not being told to switch to another user.  To fix this I had to tell it to switch to the ‘nobody’ user when starting.  Here are the steps I used:

# tell memcached to run as the user ‘nobody’ and set the max memory to 1024M, you need to at least have

# -u nobody (or some other account that exists on the system)

svccfg -s memcached setprop memcached/options= ‘(“-u” “nobody” “-m” “1024″)’

svcadm refresh memcached

svcadm disable memcached

svcadm enable memcached

The memcached man page in OpenSolaris sort of mentions you need to do this towards the end, but I think it is poor form for the server to be unable to run by using the default SMF configuration.  I will try to get an RFE to at least have the ‘-u nobody’ option set by default.

OpenSolaris 2009.06 is out

Monday, June 1st, 2009

I downloaded the OpenSolaris 2009.06 release and installed it on top of VirtualBox over lunch.   The previous release (2008.11) had a lot of good desktop support, this version has added a lot of enterprise-class features like automated installations, UltraSPARC support, multi-protocol SCSI target (COMSTAR), crazy-cool network virtualization (Crossbow) and much more.  You can check out the full set of new features at: http://www.opensolaris.com/learn/features/whats-new/200906/

While there is always room for improvement, I think given OpenSolaris’ design, feature set,  and maturity it is now in a place where I’d consider it a viable option for production deployments on x64 systems.  I’d still hold off for a little while on SPARC since I think it may take a bit for all the auto-install and boot-related code to gain maturity there.

Central PA New Technology Meetup

Tuesday, May 5th, 2009
Zerion Software demoing their iPhone framework

Zerion Software demoing their iPhone framework

I attended a session of the  Central PA New Technology meetup group last night.  They hold monthly meetings in a location that shifts among Hershey, Lancaster, and Harrisburg depending on what facilities are available.   About 40 people were  registered for the event although I think closer to 20 actually showed up.  We got to see a demo of Zerion Software’s iPhone form builder framework and Jake Ray from Penn State’s Smeal School of Business gave a talk about the current venture capital environment.    After Jake’s talk there was some discussion about the Startup Weekend idea.  The official meeting then closed around 8:15 and about a dozen of us moved to a nearby bar and continued talking.  There was a nice mix of people who attended:  IT workers, technical hobbyists,  technical marketing, an intellectual property lawyer and some small business owners.  I enjoyed the experience and am definitely planning on attending additional sessions.


Copyright © 2012 williamhathaway.com. All Rights Reserved.
No computers were harmed in the 0.246 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.