2009 in review
So much for updating this blog more regularly. I guess this blog simply doesn't rank high on my list of priorities. Work has been, for lack of a better word, intense! Fortunately it has been so in a good way. There has been no shortage of challenges, most of which were technical, challenging and generally interesting. And the really good news is the work continues.
From a personal projects perspective, I've had to pick and choose which to run with. Failure to do so would have left me with a hollow list of ideas that never amounted to anything beyond entry level academic exercises. I have played around virtualization some early in the year using OpenVZ. I like the results, but have found it was not the good bet for my future use, as my work has started to turn to VMWare and Solaris Zones in a big way. Shortly after my post in February, I was able to complete an upgrade of the Django based websites. I am still quite pleased with Django, though I've not been keeping up to date on that scene very well.
I've spent most of my free time focused on Scala Bazaars (sbaz). I've been improving the sbaz code base in earnest only since June or so. I am a bit ashamed it took me so long. This is in part due to my needing to better familiarize myself with the existing code. The bigger reason was lack of vision, and therefore unsure direction. I spent more time contemplating what to do instead of doing. Sbaz has lost ground as a tool used by the public due to a variety of reasons. First and foremost has been the waning development of both code base and public repositories. I am attacking the former head on, and the later soon after. A more basic question is what use cases we should target. Most prospective users today are developers who have other tools that are more deeply integrated with IDEs, build tools, version management, etc. Should sbaz adapt to compete in the development space? Sbaz was modeled after the Linux package management tools, and I'm not convinced such a design can satisfy both the flexibility and consistency needs of developers unless we can establish some large scale release process similar to Linux distributions. My next post, if it happens within the next month or so, will likely be ramblings around some of these thoughts, and how I would like to see a sub-community start to form around sbaz.
Ultimately, I decided on a path of incremental improvements to existing functionality. This way I could contribute as little or as much as my existing family responsibilities and increasing work commitments would allow. My primary goals this far have included:
- Update to scala 2.8 and eliminate all compile warnings, including Java deprecations
- Introduce no backward incompatible changes (beyond minimum Java5 per 2.8 upgrade)
- Improve dependency audits to prevent a broken managed directory from ever appearing
- If an audit fails, make it clear to the user why
- Keep the user informed (specifically when downloading)
- Expand the body of automated testing
- Improve Windows support
These are still works in progress, but there has been good progress made. I have also added a few features like asynchronous downloads and support for pack200, the latter I'm amazed I haven't seen used more. I suppose the scala requirement of java5 or later makes such support easier..? At the time of this writing, there is still more to be done; however, the code base isn't far from being ready for the next major distribution release of Scala. I'd be working on it now if I wasn't away from my primary development machine.
So many projects...
... and so little time. I really should make a point of updating this blog more frequently, as I've been doing a lot of new and interesting (at least to me) technical stuff in the *nix space. Up until the end of last year, my day job has primarily focused on application capability. What I mean is I spent most of my time working on application specific configuration and code changes to add features or fix bugs to better support the business's needs. Most of these applications are large, complex enterprise tools that fall into the "one size fits all because you can customize through configuration" category. Granted, the line between code and configuration is unclear at times. On occasion, I would get down to the application framework or operating system level. It was always a treat.
Late last year, a key person for the company website's site operations team moved on, and I was pulled in to fill a fraction of the void he left behind. I'm only working there at half capacity because I still need to support my previous areas, but it has been a refreshing deviation from the work I've been doing for the past 7 or so years. And when I get interested and excited about something, I begin to explore.
Virtualization
One area I've spent some time exploring is system virtualization software, and this is for several reasons. I only have one machine at home to do all my exploratory work. This poses a problem when playing around with load balancing configurations or testing network software. Virtual servers allow me to simulate many machines on a single box, so while it isn't perfect for emulating a production environment, it makes it possible with a non-existent budget. Trying out different flavors of applications can also have a negative effect (sometimes damaging) on a box. Installing, configuring, testing and then uninstalling can lead to garbage being left behind. If the install is invasive to the basic services of a box and you don't fully understand all aspects of these changes, it can interfere with the health of the system in the long term. Using a virtual environment for this kind of playing around means you can simply discard the environment if you decide not to go with that solution. One added benefit of installing critical services into a virtual environment is you can easily back-up and transport that environment into another machine if the host box goes belly-up. Maybe I should get another machine...
Website - the whole package
I've had a few websites outside of work that I've developed (with someone else's code as a starting pointing) and maintain, this site being one of them. I've never put a proper backup/restore process into place for full, rapid restores. In the event of a failure, I would be able to restore most, but it would take a long time and some stuff may fall through the cracks.
Django 1.0 came out a while ago. I've successfully upgraded this site, but have some more upgrade work to do yet. This will likely take about 3 days to complete, if I could work on it full time. Unless I can use some dedicated time over a long weekend, it will probably grow into 3 weeks.
This site has only held my blog this far. I think I want to expand it to serve as my project site as well (of which I currently have none), as I've got bandwidth and storage space to spare. Source code will be hosted via mercurial, possibly through hgweb. I haven't decided if I want to use Trac. This partly depends on its RAM consumption because I need to keep several apps running on this hosting service, and memory is my limiting factor. I've already done some promising prototyping of a move from mod_python to mod_wsgi, so I appear to be making myself some wiggle room.
My Scala + db4o + Swing personal project
I've blogged about this project previously. I've succeeded in putting together a lot of the domain specific code this far, but I'm now toiling with the application framework (API for managing the main window, opening/closing views, starting and stopping services). I haven't decided on the user experience yet: single tabbed window, multi window, etc. The options are plentiful, but my goal is to keep it light weight with the potential for using it in a smart phone. I've started going down the path of OSGi (another learning experience). That may or may not continue. This is one project I would like to host out of this web site.
sbaz - Scala's package manager
I am presently the named maintainer for sbaz. Sbaz will start to become a bigger part of this blog. Presently, I am still struggling with defining a path forward for sbaz. I'm considering hosting some sbaz universes out of this site (e.g. my personal project above, a managed developer release universe, etc.). My biggest obstacle with this is my limited memory resources, so deploying the existing Java servlet won't be possible. Perhaps using PHP, Python or flat files to host a universe should be my first enhancement.
General system administration notes
I need a reliable location to document many of my system administration learnings, and this website seems to be a logical location. To date, I've peppered README files throughout my machine's filesystem, but the point behind the README files is to repeat the steps I need to perform on a new (e.g. replacement) machine. I've also failed to document some good stuff because I hope to, at some point, have a central repository. So, I delay documenting until I forget to do it all together, a pattern I've recently recognized and am changing... leading to the before mentioned README files. Some of the things I intend to document include:
- MySQL, rsync, etc. commands (scripts) needed to back-up and restore websites
- Configuration for monitoring mdadm, SMART, capacity and other hard drive details
- Virtualization via OpenVZ, KVM and VMware
- Remote monitoring tool (mon or nagios) for virtual machines, website, hdhomerun and services
- Local mail server configuration (including security concerns)
- Apache configuration details (including security concerns)
- DNS configuration for third party hosting, local network, dynamic IP, etc.
- I may even consider documenting SAP transaction codes and technology details (BADI, BAPI, BDOC, BSP, complex ABAP selects, OO vs. Procedural ABAP details, etc.) as these are starting to fade a bit.
- Just about anything I've had to research when solving a problem, and don't address often enough to know off hand.
A new personal project
After coming to the disappointing conclusion that I cannot, at this time, pursue building my NAS device from scratch, I've turned toward another long contemplated and frequently revisited project. For quite some time I've dabbled with various Java based functionality, such as db4o, Scala and RCP frameworks, and have been toying with some ideas to implement a useful application from scratch.
I know it is pretty sad that a professional software developer of 7 years is psyched about creating an application from scratch, but most of my development work involves configuring and/or extending existing applications purchased from third party vendors, identifying and fixing bugs in code others have written, addressing performance issues, and interfacing multiple enterprise applications. I have done very little UI development, and most of that has been for web applications.
Standing at the start of my desktop application exploration, I realize my first goal is to implement something simple and useful with an expandable design. The tool needs to be useful to keep my interest, otherwise the application will be yet another academic exercise. The simple but useful feature provides the bait that leads me along, but by having an extensible design, adding additional features becomes incremental growth instead of titanic rewrites and redesigns. So, I have the following thoughts:
- The application should run on the JRE and use the Scala programming language for at least part of the functionality. I am learning to love functional programming, but haven't applied it to any practical application yet.
- If using a database, I would like to use db40 instead of a relational database. A lot of the stuff that slows down development with a database goes away when the Java class IS the database schema. No dual implementation (Java class and relational schema) or mappings are required. Of course, I may be better off persisting my data as flat files.
-
I should standardize on a single IDE (Eclipse or NetBeans), as this is primarily a single developer effort, and trying to develop in two different IDEs would require a lot of overhead. Unfortunately, this adds limitations to the technology I can use for development. For example, Scala only has a plug-in supporting Eclipse, I am more familiar with Eclipse's features, I like the idea of learning more about OSGI, and I like the responsiveness of SWT vs. Swing. On the flip side, only NetBeans has Matisse (for free anyway), the profiler seems nicer, RCP appears to be easier, I'm more familiar with the Swing API, I like the idea of a single distribution for all machines (SWT requires different DLLs), and I LOVE the mercurial plug-in.
- I don't expect this code will be opened for general availability, but I should take legal considerations (GPL, LGPL, EPL, BSD, etc.) into account when choosing libraries. Otherwise, the conflicting licenses could become prohibitively difficult to work around.
- I would like to leverage a multi-document interface (MDI), at least optionally, with tabs that can be reordered, maximized to full screen, closed with middle button click, and possibly minimized to border buttons.
- Ideally, data could be replicated between machines with ease, as I would like to use the application on my home and work machines. This, of course, requires multi-platform functionality, as my work laptop runs Windows XP while my home machine runs Ubuntu Linux. (Gosh how I wish I could run Ubuntu at work too...)
Clearly, I have more to think about before choosing an IDE. As for the applicaiton's features, I've considered the following:
- A secure password store
- A wiki style notebook similar to Tomboy Notes for Linux. I found that taking notes in ASCII with a markup "language" like markdown worked quite well. The file's name contains the timestamp of when the file was created, and all context of what the note was about is contained within. When the file is stored in the file system, the operating system's indexing tool can be leveraged for searches, and the application can manage categorization in a flexible way. I'd like to include IDE style tab completion for internal links, syntax highlighting, daily todo list feature, hyperlinks outside the application (e.g. web pages, lotus notes documents, network share locations, etc.) and more.
- A batch image resize tool for high quality images. I've actually already implemented this using the ImageJ library available in the public domain, but the GUI is a bit hackish and not robust.
- A contact management application that leverages low level relationships. For example, a family of five may all have the same contact information while the kids are young, but as each kid gets his/her own email address, mailing address (e.g. goes to college), cell phone, etc., each individual's contact information can be updated accordingly. Or, if the entire family moves to a new location, a single change to their mailing address node would update all family members at once. This would be a perfect application for db4o.
- An invoicing and general business tracking application for my wife to use for her photography business. This is a nontrivial piece of functionality with the potential to grow significantly. It could be used to drive marketing, schedule photo shoots, track income and expenses and so on.
I really need to have an IDE that makes the GUI layout and interaction with the business logic clear and easy to do in order to implement all that I would like. I suspect this seem daunting mostly because it is new to me. I'm still looking for some best practices on how to layout the MVC design of the GUI, though I suspect I'll come to a solid conclusion only after I've tried it.
NAS put on hold
As it turns out, life really is what happens when you are making other plans. Since the last time I posted, I've had a few things come along requiring big money... bummer. I guess I'll have to wait a little longer before I build this machine. I suppose this isn't all bad, as chances are the cost of these components go down over time, assuming the economy doesn't throw a wrench into that theory.
I did get close to a finalized configuration of components with a slight change to the machine's application. Originally, I wanted to make the Network Accessible Storage device double as the firewall and gateway for my home network to the internet. For security purposes, combining the firewall and storage isn't recommended, as a security failure in the firewall makes for easy access to all your data. Beside that is the fact I already have a wireless Netgear router that can also serve as a gateway for a local wired network.
I currently have a 31" LCD television, a HDHomeRun (network enabled digital TV receiver), and no way to connect the two. A simple and lightweight media PC would nicely bridge this gap, and many of my NAS requirements would also apply here. With a media PC, I would finally have the convenience of TiVo style TV watching, and I could create backup copies of the kids' videos that are at high risk of death by scratching. Of course, the sheer volume of data is a concern when dealing with multimedia, particularly video. To get the most out of my hard drive space and back-up media (CDs and DVDs), the machine would need to support efficient MPEG4 decoding at the very minimum. I have a Core2 duo workstation that could offload the conversion processing from MPEG2 to MPEG4 (or better), so I'm not too concerned with the media PC's processor power as long has it supports hardware decoding of MPEG4. Given these thoughts, I've put together the following list of components:
- JetWay J7F5M1G2E-VHE-LF CX700M VIA CX700M Mini ITX Motherboard/CPU Combo - This motherboard has integrated video capable of MPEG-2, MPEG-4, and WMV9 hardware decoding, high definition sound, gigabit Ethernet, HDMI output (not sure if this is video only or includes sound), and a 1GHz C7 processor that consumes a miniscule 9 watts of electricity. The board only supports 2 SATA 3.0 interfaces, so I would need to use the only PCI expansion slot to add support for disks 3 and 4. This board is fanless and therefore silent.
- picoPSU-120 Power Kit - A fanless and very space efficient power supply capable of providing 120 watts of power at 12 volts. This is probably overkill for this machine, so I may consider scaling back to a 90 watt power kit, but I would have to run the numbers a bit more.
- 1GB 240-Pin DDR2 SDRAM 533 (PC2 4200) Desktop Memory - I would order the cheapest respectable brand name memory on Newegg at the time I order. At present, this is the Kingston ValueRAM. The maximum memory supported by the motherboard is 1GB in a single memory card.
- Scythe S-FLEX SFF21D 120mm Case Fan - A single, very quiet 120mm fan that could completely replace the air in my custom case roughly one time per second, which should be sufficient for the passively cooled components within.
- Pioneer DVR-115DBK DVD Burner - This is an inexpensive and popular IDE drive. The motherboard supports up to 2 IDE devices, and I want to ensure the SATA connections are used for hard drives only.
I already have the hard drives and PCI to SATA 1.5 expansion card. These will need to be moved from their existing machines into the new one once it is built.
- Western Digital Caviar GP WD5000AACS 500GB 5400 to 7200 RPM 16MB Cache SATA 3.0Gb/s Hard Drive - One of Western Digital's new "green" line of hard drives. I have two of these configured in a Raid1 array.
- Rosewill RC-201 PCI SATA x2 Silicon Image, RAID 0/1/JBOD, Normal and Low Profile Host Controller Card - This is currently not used, as I have decommissioned the older machine that only had IDE ports on-board. It is a fairly inexpensive card that works nicely with Linux. I have no intention to use the card's RAID feature, as Linux has powerful software based RAID that I have fallen in love with :) I bought it for its support for 2 SATA 1.5 drives.
I haven't yet decided how I want to boot the machine. Idealy, the hard drives would exist solely for storage. No applications would be installed on them at all. There is one more IDE device supported (in addition to the optical drive), so I could install another hard drive. However, I would like to avoid using another hard drive for space and power reduction. I was thinking something like a Compact Flash card installed as a non-removable IDE device (requires adapter card) or a USB flash drive wired directly to the motherboard and rigged inside the case. I suspect the compact flash card would be more performant, but it would also cost a little more.
The parts listed here aren't super expensive. Unfortunately, this isn't the only costs involved. To build the machine I really want, I need to build my own case from scratch. I not only need to buy the materials (not sure what I want to use yet), but I also have to purchase some tools, like a dremel. I have some rough ideas on how I would like to set-up the inside of the case. Maybe I'll put together some pictures in the future, but until then, here is a brief description:
- Four 3.5" hard drive bays will be located in the bottom front of the case. Instead of configuring them horizontally, I want to take advantage of the natural airflow advantage of a vertical configuration. Proper venting is needed (e.g. modder's mesh) to allow airflow to pass directly over the hard drives. Maybe the bottom of the case could be vented for maximum intake.
- Just above the hard drives is where the optical drive will be installed in the traditional horizontal position. This makes all the drives fit into a single brick-like unit in a dense but fairly well vented layout.
- The motherboard will then be configured (looking at the front of the case) to the right of the drives. To do this, the drives, and therefore the optical drive's tray, will be off-center to the left. Depending on the PCI expansion slot's location, the case may need to be longer (move the motherboard further toward the back), taller (place card below drives), or a riser could be used to move the PCI card out of the way. Regardless, the motherboard needs to be flush with the back of the case to give access to the on-board ports, so this will require some designing finesse. Again, special care will be needed to ensure the passive cooling on the motherboard has sufficient air flow. This may require additional venting on the front of the case in front of the motherboard.
- The 120mm fan will be located on the back of the case near the top. This would be located where a traditional ATX case would place the power supply. One benefit to using a very low profile power supply inside the case. Of course, this does require the power converter to be outside the machine, similar to a laptop. I did want to include a laptop battery inside the machine for backup power, but this was prohibitively expensive.
With this layout, I'd like to get an overall size of 20cm wide x 20cm tall x 25cm deep, or roughly 8in wide x 8in tall x 10in deep.
I hope to sometime return to this project, as I suspect this would be a good little machine that is used a lot. Until then, maybe someone else out there could use some of my thoughts to build something similar. If you do or already have, please share your experience. I would love to hear about your successes and growing pains.
NAS - hard drive and transfer speeds
Okay, so I've got the general idea of what I want, now it is time to hash out some of the specs. Where to start?
The working parts of the device should be modern or at least based on modern standards. Since I will be going to all the trouble of building this device from scratch, I don't want to be put into a position where a significant redesign is needed in the event of a hardware failure of some kind.
- The primary hard drive interface should be SATA. The old PATA (a.k.a. IDE) hard drive standard isn't gone yet, but it is certainly fading away into the sunset, not to mention its technical inferiority. I won't rule out PATA completely for one or maybe two drives if the motherboard provides it. I'll need to be more careful when integrating PATA drives into the RAID configuration to prevent their technical limitations from killing the performance of the entire device.
- The motherboard would ideally use the ATX power connector standard. This dramatically increases the number of power supply options, as ATX is the most common standard available.
- The motherboard needs to support multiple SATA hard drives. The preferred number is 6 on-board connections, but if this is not possible in a small, power efficient machine, it should at least be upgradeable via an expansion card or two.
Generally speaking, a NAS device doesn't require much processing power. The largest bottleneck in performance will most likely be the network itself. Most home networks will use wired fast Ethernet (100 Megabits/sec), wired gigabit Ethernet (1000 Megabits/sec), wireless g (54 Megabits/sec), or wireless n (248 Megabits/second). Note, however, that these measurements use the bit as the base unit of measure, not the byte (8 bits) that most of us are comfortable with.
Fortunately, Wikipedia has an excellent collection of device bandwidths available at http://en.wikipedia.org/wiki/List_of_device_bandwidths that shows data transfer rates in both bits and bytes. It also appears to be a fairly complete list of device interfaces that may be used in this system.
One important point to keep in mind is the perceived performance of a NAS device will be effected by the slowest point in the data transfer chain. This includes the network adapter on the machine using the storage device and everything between the two machines. I've configured my home network to support gigabit transfer rates, so my NAS device should take full advantage of this. This means I should be able to get a theoretical top transfer rate of 1000 Mb (megabits) per second, which is equal to 125 MB (megabytes) per second.
Here is a breakdown of possible hard drive transfer rates for this device. SCSI is a bit out of my price range and overkill for a home solution.
- Ultra DMA ATA 66 - 528 Mbit/s = 66 MB/s
- Ultra DMA ATA 100 - 800 Mbit/s = 100 MB/s
- Ultra DMA ATA 133 - 1064 Mbit/s = 133 MB/s
- SATA 150 hard drive - 1500 Mbit/s = 187.5 MB/s
- SATA 300 hard drive - 3000 Mbit/s = 375 MB/s
The last three interfaces are all faster than the maximum transfer rate possible over a gigabit network and should be suitable for such an application. As stated before, other technical merits (hot pluggable, no restrictions on writing to multiple devices at once, newer standard with stronger future) make SATA the ideal option. In the current market, you would be hard pressed to find large SATA hard drives that don't conform to the 300 standard, and the price differences between 150 or 300 aren't significant. You are more likely to find disk controllers on the motherboard or expansion cards (e.g. PCI cards) that use the slower SATA 150 interface. Fortunately, virtually all SATA 300 hard drives are backward compatible with the SATA 150 controllers, so compatibility is virtually a non-issue.
Speaking of expansion cards, it is quite likely one will be needed in my device. This adds a new interface to take into account when assessing bandwidth bottlenecks. According to the Wikipedia page, the 32 bit PCI expansion slot running at 66 MHz (the most common kind of expansion slot today) has a theoretical maximum transfer rate of 2133 Mbit/s, or 266.7 MB/s. This means that one should not expect the best performance from a single SATA 300 or multiple SATA 150 drives connected to such an expansion card. Still, for the needs of a NAS device, a PCI expansion card supporting two SATA 150 drives should work nicely for expanding a RAID configuration. Possibly using a PATA style configuration (pairing a motherboard controller with a PCI controller for RAID1 mirroring arrays) would optimize reads and writes when dealing with large files.
(0)