24 Dec 2014

Commerce, the project

Most of my professional career has been working in the area of commerce information systems, primarily in the “order to cash” and E-commerce space. In this time, I’ve used and heavily extended a few popular proprietary applications to conform to unyielding and particular business requirements. One could argue that purchasing a packaged product just to customize it to the nth degree is missing the point of packaged solutions. Certainly there are complexities to consider in this argument. Regardless, software exists for a reason, and that reason isn’t to stand in the way of how a business believes it should function. While I have been able to bend these popular software packages to the will of the business, doing so can be fraught with challenges, clever (i.e. unintuitive and often brittle) design, and an unusually high cost both in initial implementation and long term technical debt… the cost that keeps on costing. Along the way I’ve gotten somewhat opinionated about what I deem to be “good software”.

Early this year I found myself looking for a job because my whole team was outsourced. Dealing with the loss of a truly exceptional team, the kind you dream to working with, plus frustrations with feeling like we were forced to leave the business high and dry, I needed to find a distraction and creative outlet to feel good about. Something to do in my free time to push my design skills and grow my development skill in Scala. I wanted to develop something nontrivial and real, not just an academic project like I’ve done so many times before. Since my emotional state was partially rooted in the technical limitations of a domain I was intimately familiar with, I decided to try my hand at creating “good software” in support of E-commerce.

Of course E-commerce is a big topic, and from the start I realized there is simply too much ground for my personal project to cover in a meaningful way. There is a lot of existing software that is good at what they do, even the before mentioned packaged solutions. Re-implementing mediocre reproductions of what they deliver wouldn’t add any value, so I instead wanted to focus attention on areas I personally experienced frustrations. Plus, there is the additional challenge that my project isn’t driven with a specific business’s needs in mind. While I’ve been exposed to various different requirements from different businesses, writing opinionated business software in a generic and meaningful way requires deep and thoughtful abstractions around business process. I cannot claim to have this insight, but I can claim to have a reasonable understanding of basic business requirements and a deep understanding of technical approaches to addressing them. Since my forte is pricing, that seemed to be a natural place to focus my attention.

At first I thought implementing a pricing solution in a vacuum wouldn’t be possible. How can you price an order without knowing what a product is, let alone the context of the order? Don’t you need concrete units of measure? But then again, who is to say a different kind of pricing isn’t needed for a “sales opportunity”, lead or quote instead of an order. And what kind of units would every business use? Each? Linear feet? Kilograms? Floating vs. fixed license? In the end, the answer was not to build a system but to build a library. This way logic can be built using identity (a key can be simple or complex as long as it implements hashcode and equals), and areas of the solution that require detailed knowledge of the data model can be abstracted away, pushing ownership of this detail knowledge to code using the library. I wanted to make as few assumptions about the larger system as possible. Simply not having the larger system available to reference keeps this requirement obvious.

And so my commerce project was born. Yes the name is lame. I’m no good at coming up with that kind of thing. My own name is kind of uncommon, so it could be referred to as “matlik commerce” or “matlik pricing” in the same way people do “Joda time”? That would certainly appeal to my inner narcissist. Honestly, I don’t care what it is called if it is put to good use.

I do believe this library has a solid foundation to being a real, even “enterprise grade” (hopefully that isn’t considered a negative term), pricing solution. My goals include:

  1. Eliminate common hurdles to writing correct code, such as delicate mutable state opening the system up to unintended and difficult to debug side-effects, or inconsistent enforcement of rounding rules and unsafe types (i.e. never use double for money…ever.).
  2. Embrace composition of simple and reusable pricing rules to create more complex rules, possibly even through configuration.
  3. Minimize implementation requirements imposed by the library, and where such requirements exist, provide clear interfaces and hooks for mapping to/from types as needed.
  4. Enable full visibility into why and how a given price is determined. This information can be equally beneficial for developers, business (e.g. marketing), and customers depending on how the application uses this information.
  5. Avoid design decisions that could inhibit adoption into existing JVM or microservice based solutions.

This project may grow beyond pricing at some point, such as cart management, hierarchy management, etc. I’ll likely have to build a minimal feature set for some of this to set up a pricing demo, but for now I hope to keep my attention on the pricing features. For more details, please feel free to browse the project. I need to put more effort into documentation and example code. I know that making a project accessible is key to any kind of adoption, and I’d like to see this live beyond my personal repo. As such, I’ve given it the Apache 2 license and welcome feedback.

22 Dec 2014

Creating a Favicon

I’ve finally gotten around to creating a favicon for my site. I’ve always found this to be a finicky process, mostly because I’m the farthest thing from a graphic artist around. Since this is something I have tried to do a few times, never quite getting it right till now, and I’ll likely not try to do this again for years, I figured recording the process here would be good for future reference.

First, create the base image to be used for the icon. I found that doing this in a vector graphics program like the freely available Inkscape made life quite a bit easier. The trick is to create a square page and build your image filling that square. It does not matter how big the page is while you build it because you can specify the image size at export time, and Inkscape will take care of clean aliasing of lines at that time. Just don’t get too detailed because the smallest icon size is only 16px by 16px, which doesn’t give a lot of resolution to work with.

I created this SVG image as the base for all subsequent files: Original SVG Image

Yes it is small, but this is only because I sized the page to be 16x16 px, forgetting that the size doesn’t really matter when used as just a base image.

After creating this base image, I then used it to export PNG files of various sizes: 16x16, 24x24, 32x32, 64x64, 110x110 and 114x114. The 16, 24, 32 and 64 are only used for the favicon.ico, but the other two larger formats are used to create facebook.png and apple-touch-icon-precomposed.png respectively. Exporting these files using Inkscape involves choosing Export Bitmap... and filling out the appropriate fields in this dialog:

Export Sized Image from Inkscapee

And the resulting images:

favicon - 16x16 px:

16x16 px

favicon - 24x24 px:

24x24 px

favicon - 32x32 px:

32x32 px

favicon - 64x64 px:

64x64 px

facebook.png - 110x110 px:

facebook.png

apple-touch-icon-precomposed.png - 114x114px:

apple-touch-icon-precomposed.png

The facebook and apple icons can be used as-is, but an additional step is needed to turn the separate favicon images into a single *.ico image. This is where Gimp comes into play. With Gimp, open the largest 64x64px image. Then choose the menu option to Open As Layers... to open the other three images of different resolutions. This will pull all of your favicon images into the same Gimp layered image.

Layers

Finally, you can export the layers to a single Icon file containing all resolutions. Choose the Export As... menu option and in the resulting dialog, enter the file name favicon.ico and choose Microsoft Windows icon (*.ico)

Export Image

On clicking the Export button, you are presented with a dialog for setting options on all available formats.

Export Options

I did try using a smaller palette size to reduce the resulting file size, but I needed to keep the 8-bit alpha channel. Otherwise, the circle is rendered with a jagged boarder. If it weren’t for that, I could drastically reduce the palette size, making the icon 14 of its current size.

Some reference I used while working on this:

29 May 2013

SSH Tunneling

SSH is quite the Swiss Army Knife of networking. One powerful feature that I can never remember is securely tunneling from one machine to another, possibly through a firewall. To do this:

ssh -f matlik@home -L 3000:www.google.com:80 -N

In this example, a proxy connection is established between my local machine’s port 3000 and my home machine (an alias in ~/.ssh/config) on the normal SSH port. Then my home machine relays any network traffic on to google.com on port 80.

Here is another example of making a HTTP server listening only to localhost (not 0.0.0.0) and therefore not handling requests from the outside world accessible from another machine in the “outside world”. Note that the “localhost” in the below command is from the perspective of the remote server.

ssh -f matlikj@192.168.1.4 -L 8000:localhost:8000 -N
  • The -f flag requests ssh to go to background just before command execution.
  • The -L flag binds a local port to a remote port.
  • The -N flag tells ssh not to execute any commands on the remote server, which is useful when forwarding traffic from one machine to another is the goal, as in these examples.

Maintaining such a connection for an extended period of time, particularly if the connection goes through periods of idleness, may be problematic. It is common practice for connections to be dropped when not actively used. If you find that you need to keep the connection alive, you can do a few things:

  1. Update your ~/.ssh/config file to contain the settings ServerAliveInterval 180 and ServerAliveCountMax xxxx where the ServerAliveInterval defines how frequently a keep alive ping should be sent over the open connection, and the ServerAliveCountMax defines how many pings should be performed without real traffic before closing.
  2. Use another program like autossh
  3. Use a shell script to establish the SSH connection in an infinate loop (hackish and messy).

28 Mar 2013

Choosing a CSS Preprocessor

I’ve been needing to push my browser side development learning a bit more recently, and have started to dig into options surrounding modern CSS and JavaScript tools that are now available. In the case of CSS, there is a common trend of using CSS preprocessers to allow developers to maintain more concise source files to be generated into robust CSS.

There are multiple options for CSS preprocesors available today, but the two most popular at the moment appear to be Sass and Less. I am no CSS expert to difinitively state which is the “best” solution. Many blogs and conversations I’ve had recently point to Sass as having a far mor feature rich solution with powerful plugins that make your life much easier.

If all other things were equal, I’d most likely opt for Sass; however, the biggest drawback for me (at the moment) is how much overhead the tool brings to your project. My personal project I’m looking to use this tool within is based on the Java ecosystem. The build (SBT) and runtime environments will require the JVM, is primarily Linux based but don’t want to limit platform options, and is striving to remain relatively small, fast, easy and self contained.

Sass and some of the more notable plugins (compass and lucy) are implemented in Ruby. Making my project self contained would involve embedding JRuby and all the appropriate gem sources within my project’s build, as I don’t want to depend on an external C Ruby as a dependency. This is a decent sized footprint, but is also a performance overhead to my dev cycle that I’d prefer not to have.

Less is a little better in this regard. The preprocesser code is implemented in JavaScript, so embedding in the JVM build system requires a JavaScript interpreter like Rhino. It may not be blazingly fast… I don’t know as I’ve not used Rhino much, but I’ve heard annecdotal evidence that it is, possibly due to the simpler design. The overhead of getting this set up is also a bit less painful within the build tool (SBT) because others have already implemented plugins that can be referenced in the build definition and downloaded.

24 Mar 2013

ptree on Linux

While this isn’t a perfect replica, it does work relatively well:

bash> ps axf

This output generates a process tree for everything running on the system, not just a subtree for a specified process, so you are best off piping this to less and searching within. It can be a bit more clunky as a result, but the end result is the same.

19 Dec 2012

Scalate Template Inheritance

I’m looking into Scalate as the templating engine for my web site rewrite, and I’m doing this for several reasons.

  • It natively supports Scala
  • I like the Jade syntax for DRY (Don’t Repeat Yourself) HTML page creation
  • Pages are generally compiled into bytecode for high performance execution
  • It has the ability to perform CSS selector driven DOM manipulation similar to what a browser does, only on the server side

While playing around with this last night, I did encounter one detail I’m not too happy about: template inheritance. It would appear that inheritance in Scalate does not work as advertised in the Jade syntax spec. The block and extends key words look to be treated like regular tag generaters vs. keywords that serve a special compositional purpose. I may be missing something, but the more I think about it, the more it seems reasonable this is the case.

Scalate is designed as a templating engine that supports many different syntaxes. I imagine wiring all these different syntaxes together within the render pipeline of the template engine would yield a half baked and quirky solution. Instead, composition of pages can be accomplished via a more dependency injection approach. Instead of composing inheritance trees in the templating layer, each template should provide places where Scala variables are output into the page. The Scala variable can be, for example, simple text for output or another templated component. Composition is accomplished from the outside in using Scala code.

The trick to making this a successful and pleasurable solution is figuring out how to make this compositional approach concise and simple. The last thing I want to do is have some involved declaritive structure for every page.

UPDATE: I have confirmed Scalate’s Jade template system does not support block and extends, though not for the reason I originally thought. It just hasn’t been implemented yet. See [https://groups.google.com/forum/?fromgroups=#!topic/scalate/A-NOeZazjIQ]

07 Jan 2012

Thoughts On A New Web Site Design Part 2

The goal of the new design is to support several reusable components with the intent of keeping them light weight and modular. Some desirable features include:

1. A JavaScript based navigation widget

This could also include common header, footer, left-hand navigation, etc. The idea is to use the browser to apply common UI components into the page. This appeals to me as a software designer because it enables one solution to be used consistently across all application components within a site having nontrivial architecture, regardless of the technology or templating mechanism used. As long as the UI is rendered in a browser and each application can provide the same hooks (possibly just insert at start or end of page) then this will cut down on re-implementation of the same functionality leading to inconsistencies and annoying maintenance issues.

At first, the design would focus on a simple and static navigational structure, possibly leaving a second tier navigation for each participating application to manage on its own (probably won’t have that complicated of a site myself). The more static the JavaScript, the better the user experience from a performance perspective.

Eventually, the navigation will also be concerned with customized navigation elements. Custom navigation assumes a user has been authorized or has selectively chosen to view a subset of the overall site. This user experience feature becomes more important if you have private areas of your site or the site becomes so large, the navigation cannot comfortably fit within a single view. I personally had a revolutionary thought a while back (tunnel vision, I suppose) when I realized a Java Servelet (or any other dynamic web code) is just as capable of rendering JavaScript dynamically as it is rendering HTML.

Note that custom navigation requires some concept of identity for the current user. Identity will not be within scope of this widget, but will need to consume that identity information, which leads me to the next component.

2. Identity Service

Identity management is a nontrivial issue to tackle when dealing with distributed applications. There have been many attempts at solving the problems in this domain, and even the simplest solutions are not as simple as one would hope, particularly when implementing both the identity authority and consumers for a single site. Since my goal is to aggregate many technologies under one web experience, I will need to have them relay identity seamlessly for Single Sign On, as well as share basic information between them… if identity is needed. One technical detail in my favor is I intend to run much of this under a single domain, which makes cookies a viable option for relaying some critical details (e.g. a cross-application session cookie containing a token that can be used to look up identity and session status info from a centralized RESTful web service). Of course, since there are already standards surrounding identity distribution, it would also be nice to leverage some existing libraries for supporting this. Since I’ve already had experience with OpenID, perhaps a light-weight OpenID Connect server would be a good path forward, though a bigger investment in time and brain power that may not be feasible in the near future.

This feature is of particular interest to me because I’ve been working in this space professionally for the past few years. I have some ideas on how to simplify this given this specific use case, and have a desire to explore the use of Akka’s actors to support this. Ideally, the server would be extremely easy to start up and configure, and is designed out of the box to support high availability if not global distribution; however, I’ve not been able to get a global distribution under my belt yet. The biggest threat to this not happening is I will not be implementing and/or using this in my day job. This may also turn out to be a blessing, as it will force the implementation to be simple due to time constraints and a “just get it done” mentality. Using a terse language like Scala may help as well, but running in the JVM does somewhat ding my goals for minimal resource usage. Right now, I am taking the Hudson/Jenkens deployment model as inspiration, and I’m hoping that if I can implement this once, integration with other systems should be low hanging fruit.

3. A JavaScript based commenting widget

The design principal is pretty much the same as the navigational widget, but this is intended for distributing a comments & collaboration system that is centrally managed. The thought for this widget came to mind because I intend to serve most of my content in a static format going forward rather than running an application server to serve up the same stuff over and over. This has benefits from a resource usage perspective, as well as an operational support perspective. The fewer moving parts, the more reliable and less critical the issues.

The django server I’ve been running in “production” (with extremely low traffic) has not given me much grief at all, and it already has a pretty good set of administrative functions for comments, so maybe I’ll build this feature on top of what I already have running.

One potential wrench in the works I’ll have to work around is cross site JavaScripting support. Keeping all applications running as application contexts under a common domain will make this a non-issue, but I don’t want to design myself into a corner that prevents sites with differing domains from using the commenting tool. This may require a kind of server-side proxy that lives on the different domain, but relays data via a server to server communication… food for thought.

The concept of identity also plays a part in posting comments, though it isn’t absolutely critical. The initial implementation may function anonymously but allowing the user to provide a name via a form field. Then as identity gets rolled out, a signed in user would automatically have their name populated and fewer limitations would be applied on what the user can post.

05 Jan 2012

Thoughts On A New Web Site Design

I’ve recently learned that I have a bunch of options available to me to play with my web site… and so much more. My interests in keeping my blog up are more for playing with web technology than anything else. I had learned quite a lot in building it, which turned out to be highly beneficial when taking on new responsibilities at my work. A few years later, I am now ready to grow a bit more. Given that my budget is modest for, my goals will be to pursue some of my nontrivial interests (time permitting as usual) in as robust and minimalistic as I can manage.

  1. My shared hosting provider Webfaction has generously upped the resources provided to their base shared hosting offering such that there is now a LOT of memory made available to run applications. Now I can entertain running things like a JVM if I so desire. I will not complain for getting more for my money!

  2. I have had to change my ISP provider from DSL to cable modem due to insufficient bandwidth. I didn’t get remotely near the advertised bandwidth from the DSL provider because my residence is too far away from the switch. So, for a little more money, I now have a 13x faster network connection and as an added bonus, am able to connect to my machine from the outside world. Maybe I’ll be able to set up some distributed computing between my home machine and my shared hosting provider. Now, if only I can find a reason to implement such a thing…

  3. I’ve learned of Scalate which is capable of generating web sites as static HTML. That is, you build your predominantly read-only site as you would for a dynamic one, and instead of deploying a runtime, you “compile” your sources into static content to be delivered cheaply. I’ve been looking for quite some time for a good solution for managing my personal notes. Tomboy isn’t as portable as I’d like, a wiki requires a network connection, I need to support both Windows and Linux, and would like good version management of my changes for historical purposes. Writing my notes in Markdown format and checking into a distributed VCS (Mercurial or Git) for distribution, versioning, and maybe even deployment.

  4. I now have some more experience with integrating web assets served by separate technologies into a unified site through some browser magic. This should allow different solutions to run at the same time instead of pulling/porting everything under a single technology as I have done in the past.

What is wrong with Django?

In short, there is absolutely NOTHING wrong with Django. In fact, I still find it to be an extremely flexible, powerful and not to difficult to grok framework. I still like it quite a bit, though I’ve not been able to keep up with the latest developments. I find that the way my site is currently set up, I have to intentionally spend time to add anything to the site, which I never do. I would prefer turning the site into a tool I can use on a daily basis for tracking things I find interesting. Then, maybe I’ll actually keep it up to date, and make use of the site I’m paying good money to play with. I’ve not fully committed myself to moving my blog off of Django just yet. Doing so would mean abandoning the comments feature, though admittedly I don’t get many comments beyond those of an undesirable nature. As a result, I’ve taken to disabling comments to cut down on moderator notifications and save database space.

Conclusion

So, my plan going forward is to pursue a more heterogeneous collection of components that are selected and/or designed to work together. I trend toward low maintenance solutions that either satisfy a need or simply tickle my fancy. After all, this is my playground for exploration and skill development.

02 Jan 2012

Samba Mount Command

There are obviously many parameters that may be required to support various security setings; however the below satisfies my need for a small home network where I am trying to mount a shared directory on my wife’s machine onto my local Ubuntu box.

bash> mount -t cifs -o username=guest,iocharset=utf8 $SMB_SHARE $MNT_POINT
  • The cifs filesystem type has replaced the smbfs predecessor
  • username=guest can only be used if the shared directory does not require authentication
  • Specifying iocharset=utf8 enables unicode characters in file names to render properly. This, of course, assumes both the Windows box and Linux box support the UTF8 encoding.
  • The SMB_SHARE shell variable would of a format similar to //192.168.1.5/Share
  • The MNT_POINT shell variable corresponds to the mount destination

02 Jan 2012

Bash Edit Modes

Bash supports both emacs and vi edit modes, allowing users familiar with those editors to use the commands they are most familiar with. The emacs mode is active by default, but this can be changed using the following command:

bash> set -o vi

This can subsequently be changed back to emacs using the obvious alteration:

bash> set -o emacs

01 Jan 2010

Cultivating Scala Bazaars

As mentioned in my previous post, some progress has been made on the sbaz code base. However, recent progress has mostly been due to my individual effort. In short, this is not ideal. One of the short term goals, if sbas is to be more than a forgotten piece of bloat, is to develop a sub-community around the tool, and more importantly, related services.

The way I see it, the command line tool is the foundation upon which a thriving application sharing, productivity enhancing community can be built upon. Before I started working on it, the foundation had a sound start. I’ve only been filling some cracks that have weakened or made the tool unsightly in areas. I am sure there is more to do, but I hope the big ones have been taken care of.

.

30 Dec 2009

2009 in review

So much for updating this blog more regularly. I guess this blog simply doesn’t rank high on my list of priorities. Work has been, for lack of a better word, intense! Fortunately it has been so in a good way. There has been no shortage of challenges, most of which were technical, challenging and generally interesting. And the really good news is the work continues.

From a personal projects perspective, I’ve had to pick and choose which to run with. Failure to do so would have left me with a hollow list of ideas that never amounted to anything beyond entry level academic exercises. I have played around virtualization some early in the year using OpenVZ. I like the results, but have found it was not the good bet for my future use, as my work has started to turn to VMWare and Solaris Zones in a big way. Shortly after my post in February, I was able to complete an upgrade of the Django based websites. I am still quite pleased with Django, though I’ve not been keeping up to date on that scene very well.

I’ve spent most of my free time focused on Scala Bazaars (sbaz). I’ve been improving the sbaz code base in earnest only since June or so. I am a bit ashamed it took me so long. This is in part due to my needing to better familiarize myself with the existing code. The bigger reason was lack of vision, and therefore unsure direction. I spent more time contemplating what to do instead of doing. Sbaz has lost ground as a tool used by the public due to a variety of reasons. First and foremost has been the waning development of both code base and public repositories. I am attacking the former head on, and the later soon after. A more basic question is what use cases we should target. Most prospective users today are developers who have other tools that are more deeply integrated with IDEs, build tools, version management, etc. Should sbaz adapt to compete in the development space? Sbaz was modeled after the Linux package management tools, and I’m not convinced such a design can satisfy both the flexibility and consistency needs of developers unless we can establish some large scale release process similar to Linux distributions. My next post, if it happens within the next month or so, will likely be ramblings around some of these thoughts, and how I would like to see a sub-community start to form around sbaz.

Ultimately, I decided on a path of incremental improvements to existing functionality. This way I could contribute as little or as much as my existing family responsibilities and increasing work commitments would allow. My primary goals this far have included:

  1. Update to scala 2.8 and eliminate all compile warnings, including Java deprecations
  2. Introduce no backward incompatible changes (beyond minimum Java5 per 2.8 upgrade)
  3. Improve dependency audits to prevent a broken managed directory from ever appearing
  4. If an audit fails, make it clear to the user why
  5. Keep the user informed (specifically when downloading)
  6. Expand the body of automated testing
  7. Improve Windows support

These are still works in progress, but there has been good progress made. I have also added a few features like asynchronous downloads and support for pack200, the latter I’m amazed I haven’t seen used more. I suppose the scala requirement of java5 or later makes such support easier..? At the time of this writing, there is still more to be done; however, the code base isn’t far from being ready for the next major distribution release of Scala. I’d be working on it now if I wasn’t away from my primary development machine.

22 Feb 2009

So many projects...

… and so little time. I really should make a point of updating this blog more frequently, as I’ve been doing a lot of new and interesting (at least to me) technical stuff in the *nix space. Up until the end of last year, my day job has primarily focused on application capability. What I mean is I spent most of my time working on application specific configuration and code changes to add features or fix bugs to better support the business’s needs. Most of these applications are large, complex enterprise tools that fall into the “one size fits all because you can customize through configuration” category. Granted, the line between code and configuration is unclear at times. On occasion, I would get down to the application framework or operating system level. It was always a treat.

Late last year, a key person for the company website’s site operations team moved on, and I was pulled in to fill a fraction of the void he left behind. I’m only working there at half capacity because I still need to support my previous areas, but it has been a refreshing deviation from the work I’ve been doing for the past 7 or so years. And when I get interested and excited about something, I begin to explore.

Virtualization

One area I’ve spent some time exploring is system virtualization software, and this is for several reasons. I only have one machine at home to do all my exploratory work. This poses a problem when playing around with load balancing configurations or testing network software. Virtual servers allow me to simulate many machines on a single box, so while it isn’t perfect for emulating a production environment, it makes it possible with a non-existent budget. Trying out different flavors of applications can also have a negative effect (sometimes damaging) on a box. Installing, configuring, testing and then uninstalling can lead to garbage being left behind. If the install is invasive to the basic services of a box and you don’t fully understand all aspects of these changes, it can interfere with the health of the system in the long term. Using a virtual environment for this kind of playing around means you can simply discard the environment if you decide not to go with that solution. One added benefit of installing critical services into a virtual environment is you can easily back-up and transport that environment into another machine if the host box goes belly-up. Maybe I should get another machine…

Website - the whole package

I’ve had a few websites outside of work that I’ve developed (with someone else’s code as a starting pointing) and maintain, this site being one of them. I’ve never put a proper backup/restore process into place for full, rapid restores. In the event of a failure, I would be able to restore most, but it would take a long time and some stuff may fall through the cracks.

Django 1.0 came out a while ago. I’ve successfully upgraded this site, but have some more upgrade work to do yet. This will likely take about 3 days to complete, if I could work on it full time. Unless I can use some dedicated time over a long weekend, it will probably grow into 3 weeks.

This site has only held my blog this far. I think I want to expand it to serve as my project site as well (of which I currently have none), as I’ve got bandwidth and storage space to spare. Source code will be hosted via mercurial, possibly through hgweb. I haven’t decided if I want to use Trac. This partly depends on its RAM consumption because I need to keep several apps running on this hosting service, and memory is my limiting factor. I’ve already done some promising prototyping of a move from mod_python to mod_wsgi, so I appear to be making myself some wiggle room.

My Scala + db4o + Swing personal project

I’ve blogged about this project previously. I’ve succeeded in putting together a lot of the domain specific code this far, but I’m now toiling with the application framework (API for managing the main window, opening/closing views, starting and stopping services). I haven’t decided on the user experience yet: single tabbed window, multi window, etc. The options are plentiful, but my goal is to keep it light weight with the potential for using it in a smart phone. I’ve started going down the path of OSGi (another learning experience). That may or may not continue. This is one project I would like to host out of this web site.

sbaz - Scala’s package manager

I am presently the named maintainer for sbaz. Sbaz will start to become a bigger part of this blog. Presently, I am still struggling with defining a path forward for sbaz. I’m considering hosting some sbaz universes out of this site (e.g. my personal project above, a managed developer release universe, etc.). My biggest obstacle with this is my limited memory resources, so deploying the existing Java servlet won’t be possible. Perhaps using PHP, Python or flat files to host a universe should be my first enhancement.

General system administration notes

I need a reliable location to document many of my system administration learnings, and this website seems to be a logical location. To date, I’ve peppered README files throughout my machine’s filesystem, but the point behind the README files is to repeat the steps I need to perform on a new (e.g. replacement) machine. I’ve also failed to document some good stuff because I hope to, at some point, have a central repository. So, I delay documenting until I forget to do it all together, a pattern I’ve recently recognized and am changing… leading to the before mentioned README files. Some of the things I intend to document include:

  • MySQL, rsync, etc. commands (scripts) needed to back-up and restore websites
  • Configuration for monitoring mdadm, SMART, capacity and other hard drive details
  • Virtualization via OpenVZ, KVM and VMware
  • Remote monitoring tool (mon or nagios) for virtual machines, website, hdhomerun and services
  • Local mail server configuration (including security concerns)
  • Apache configuration details (including security concerns)
  • DNS configuration for third party hosting, local network, dynamic IP, etc.
  • I may even consider documenting SAP transaction codes and technology details (BADI, BAPI, BDOC, BSP, complex ABAP selects, OO vs. Procedural ABAP details, etc.) as these are starting to fade a bit.
  • Just about anything I’ve had to research when solving a problem, and don’t address often enough to know off hand.

06 Jun 2008

A new personal project

After coming to the disappointing conclusion that I cannot, at this time, pursue building my NAS device from scratch, I’ve turned toward another long contemplated and frequently revisited project. For quite some time I’ve dabbled with various Java based functionality, such as db4o, Scala and RCP frameworks, and have been toying with some ideas to implement a useful application from scratch.

I know it is pretty sad that a professional software developer of 7 years is psyched about creating an application from scratch, but most of my development work involves configuring and/or extending existing applications purchased from third party vendors, identifying and fixing bugs in code others have written, addressing performance issues, and interfacing multiple enterprise applications. I have done very little UI development, and most of that has been for web applications.

Standing at the start of my desktop application exploration, I realize my first goal is to implement something simple and useful with an expandable design. The tool needs to be useful to keep my interest, otherwise the application will be yet another academic exercise. The simple but useful feature provides the bait that leads me along, but by having an extensible design, adding additional features becomes incremental growth instead of titanic rewrites and redesigns. So, I have the following thoughts:

  1. The application should run on the JRE and use the Scala programming language for at least part of the functionality. I am learning to love functional programming, but haven’t applied it to any practical application yet.
  2. If using a database, I would like to use db40 instead of a relational database. A lot of the stuff that slows down development with a database goes away when the Java class IS the database schema. No dual implementation (Java class and relational schema) or mappings are required. Of course, I may be better off persisting my data as flat files.
  3. I should standardize on a single IDE (Eclipse or NetBeans), as this is primarily a single developer effort, and trying to develop in two different IDEs would require a lot of overhead. Unfortunately, this adds limitations to the technology I can use for development. For example, Scala only has a plug-in supporting Eclipse, I am more familiar with Eclipse’s features, I like the idea of learning more about OSGI, and I like the responsiveness of SWT vs. Swing. On the flip side, only NetBeans has Matisse (for free anyway), the profiler seems nicer, RCP appears to be easier, I’m more familiar with the Swing API, I like the idea of a single distribution for all machines (SWT requires different DLLs), and I LOVE the mercurial plug-in.
  4. I don’t expect this code will be opened for general availability, but I should take legal considerations (GPL, LGPL, EPL, BSD, etc.) into account when choosing libraries. Otherwise, the conflicting licenses could become prohibitively difficult to work around.
  5. I would like to leverage a multi-document interface (MDI), at least optionally, with tabs that can be reordered, maximized to full screen, closed with middle button click, and possibly minimized to border buttons.
  6. Ideally, data could be replicated between machines with ease, as I would like to use the application on my home and work machines. This, of course, requires multi-platform functionality, as my work laptop runs Windows XP while my home machine runs Ubuntu Linux. (Gosh how I wish I could run Ubuntu at work too…)

Clearly, I have more to think about before choosing an IDE. As for the applicaiton’s features, I’ve considered the following:

  1. A secure password store
  2. A wiki style notebook similar to Tomboy Notes for Linux. I found that taking notes in ASCII with a markup “language” like markdown worked quite well. The file’s name contains the timestamp of when the file was created, and all context of what the note was about is contained within. When the file is stored in the file system, the operating system’s indexing tool can be leveraged for searches, and the application can manage categorization in a flexible way. I’d like to include IDE style tab completion for internal links, syntax highlighting, daily todo list feature, hyperlinks outside the application (e.g. web pages, lotus notes documents, network share locations, etc.) and more.
  3. A batch image resize tool for high quality images. I’ve actually already implemented this using the ImageJ library available in the public domain, but the GUI is a bit hackish and not robust.
  4. A contact management application that leverages low level relationships. For example, a family of five may all have the same contact information while the kids are young, but as each kid gets his/her own email address, mailing address (e.g. goes to college), cell phone, etc., each individual’s contact information can be updated accordingly. Or, if the entire family moves to a new location, a single change to their mailing address node would update all family members at once. This would be a perfect application for db4o.
  5. An invoicing and general business tracking application for my wife to use for her photography business. This is a nontrivial piece of functionality with the potential to grow significantly. It could be used to drive marketing, schedule photo shoots, track income and expenses and so on.

I really need to have an IDE that makes the GUI layout and interaction with the business logic clear and easy to do in order to implement all that I would like. I suspect this seem daunting mostly because it is new to me. I’m still looking for some best practices on how to layout the MVC design of the GUI, though I suspect I’ll come to a solid conclusion only after I’ve tried it.

31 May 2008

NAS put on hold

As it turns out, life really is what happens when you are making other plans. Since the last time I posted, I’ve had a few things come along requiring big money… bummer. I guess I’ll have to wait a little longer before I build this machine. I suppose this isn’t all bad, as chances are the cost of these components go down over time, assuming the economy doesn’t throw a wrench into that theory.

I did get close to a finalized configuration of components with a slight change to the machine’s application. Originally, I wanted to make the Network Accessible Storage device double as the firewall and gateway for my home network to the internet. For security purposes, combining the firewall and storage isn’t recommended, as a security failure in the firewall makes for easy access to all your data. Beside that is the fact I already have a wireless Netgear router that can also serve as a gateway for a local wired network.

I currently have a 31” LCD television, a HDHomeRun (network enabled digital TV receiver), and no way to connect the two. A simple and lightweight media PC would nicely bridge this gap, and many of my NAS requirements would also apply here. With a media PC, I would finally have the convenience of TiVo style TV watching, and I could create backup copies of the kids’ videos that are at high risk of death by scratching. Of course, the sheer volume of data is a concern when dealing with multimedia, particularly video. To get the most out of my hard drive space and back-up media (CDs and DVDs), the machine would need to support efficient MPEG4 decoding at the very minimum. I have a Core2 duo workstation that could offload the conversion processing from MPEG2 to MPEG4 (or better), so I’m not too concerned with the media PC’s processor power as long has it supports hardware decoding of MPEG4. Given these thoughts, I’ve put together the following list of components:

  • JetWay J7F5M1G2E-VHE-LF CX700M VIA CX700M Mini ITX Motherboard/CPU Combo - This motherboard has integrated video capable of MPEG-2, MPEG-4, and WMV9 hardware decoding, high definition sound, gigabit Ethernet, HDMI output (not sure if this is video only or includes sound), and a 1GHz C7 processor that consumes a miniscule 9 watts of electricity. The board only supports 2 SATA 3.0 interfaces, so I would need to use the only PCI expansion slot to add support for disks 3 and 4. This board is fanless and therefore silent.
  • picoPSU-120 Power Kit - A fanless and very space efficient power supply capable of providing 120 watts of power at 12 volts. This is probably overkill for this machine, so I may consider scaling back to a 90 watt power kit, but I would have to run the numbers a bit more.
  • 1GB 240-Pin DDR2 SDRAM 533 (PC2 4200) Desktop Memory - I would order the cheapest respectable brand name memory on Newegg at the time I order. At present, this is the Kingston ValueRAM. The maximum memory supported by the motherboard is 1GB in a single memory card.
  • Scythe S-FLEX SFF21D 120mm Case Fan - A single, very quiet 120mm fan that could completely replace the air in my custom case roughly one time per second, which should be sufficient for the passively cooled components within.
  • Pioneer DVR-115DBK DVD Burner - This is an inexpensive and popular IDE drive. The motherboard supports up to 2 IDE devices, and I want to ensure the SATA connections are used for hard drives only.

I already have the hard drives and PCI to SATA 1.5 expansion card. These will need to be moved from their existing machines into the new one once it is built.

I haven’t yet decided how I want to boot the machine. Idealy, the hard drives would exist solely for storage. No applications would be installed on them at all. There is one more IDE device supported (in addition to the optical drive), so I could install another hard drive. However, I would like to avoid using another hard drive for space and power reduction. I was thinking something like a Compact Flash card installed as a non-removable IDE device (requires adapter card) or a USB flash drive wired directly to the motherboard and rigged inside the case. I suspect the compact flash card would be more performant, but it would also cost a little more.

The parts listed here aren’t super expensive. Unfortunately, this isn’t the only costs involved. To build the machine I really want, I need to build my own case from scratch. I not only need to buy the materials (not sure what I want to use yet), but I also have to purchase some tools, like a dremel. I have some rough ideas on how I would like to set-up the inside of the case. Maybe I’ll put together some pictures in the future, but until then, here is a brief description:

  • Four 3.5” hard drive bays will be located in the bottom front of the case. Instead of configuring them horizontally, I want to take advantage of the natural airflow advantage of a vertical configuration. Proper venting is needed (e.g. modder’s mesh) to allow airflow to pass directly over the hard drives. Maybe the bottom of the case could be vented for maximum intake.
  • Just above the hard drives is where the optical drive will be installed in the traditional horizontal position. This makes all the drives fit into a single brick-like unit in a dense but fairly well vented layout.
  • The motherboard will then be configured (looking at the front of the case) to the right of the drives. To do this, the drives, and therefore the optical drive’s tray, will be off-center to the left. Depending on the PCI expansion slot’s location, the case may need to be longer (move the motherboard further toward the back), taller (place card below drives), or a riser could be used to move the PCI card out of the way. Regardless, the motherboard needs to be flush with the back of the case to give access to the on-board ports, so this will require some designing finesse. Again, special care will be needed to ensure the passive cooling on the motherboard has sufficient air flow. This may require additional venting on the front of the case in front of the motherboard.
  • The 120mm fan will be located on the back of the case near the top. This would be located where a traditional ATX case would place the power supply. One benefit to using a very low profile power supply inside the case. Of course, this does require the power converter to be outside the machine, similar to a laptop. I did want to include a laptop battery inside the machine for backup power, but this was prohibitively expensive.

With this layout, I’d like to get an overall size of 20cm wide x 20cm tall x 25cm deep, or roughly 8in wide x 8in tall x 10in deep.

I hope to sometime return to this project, as I suspect this would be a good little machine that is used a lot. Until then, maybe someone else out there could use some of my thoughts to build something similar. If you do or already have, please share your experience. I would love to hear about your successes and growing pains.

05 Apr 2008

NAS - hard drive and transfer speeds

Okay, so I’ve got the general idea of what I want, now it is time to hash out some of the specs. Where to start?

The working parts of the device should be modern or at least based on modern standards. Since I will be going to all the trouble of building this device from scratch, I don’t want to be put into a position where a significant redesign is needed in the event of a hardware failure of some kind.

  • The primary hard drive interface should be SATA. The old PATA (a.k.a. IDE) hard drive standard isn’t gone yet, but it is certainly fading away into the sunset, not to mention its technical inferiority. I won’t rule out PATA completely for one or maybe two drives if the motherboard provides it. I’ll need to be more careful when integrating PATA drives into the RAID configuration to prevent their technical limitations from killing the performance of the entire device.
  • The motherboard would ideally use the ATX power connector standard. This dramatically increases the number of power supply options, as ATX is the most common standard available.
  • The motherboard needs to support multiple SATA hard drives. The preferred number is 6 on-board connections, but if this is not possible in a small, power efficient machine, it should at least be upgradeable via an expansion card or two.

Generally speaking, a NAS device doesn’t require much processing power. The largest bottleneck in performance will most likely be the network itself. Most home networks will use wired fast Ethernet (100 Megabits/sec), wired gigabit Ethernet (1000 Megabits/sec), wireless g (54 Megabits/sec), or wireless n (248 Megabits/second). Note, however, that these measurements use the bit as the base unit of measure, not the byte (8 bits) that most of us are comfortable with.

Fortunately, Wikipedia has an excellent collection of device bandwidths available at http://en.wikipedia.org/wiki/List_of_device_bandwidths that shows data transfer rates in both bits and bytes. It also appears to be a fairly complete list of device interfaces that may be used in this system.

One important point to keep in mind is the perceived performance of a NAS device will be effected by the slowest point in the data transfer chain. This includes the network adapter on the machine using the storage device and everything between the two machines. I’ve configured my home network to support gigabit transfer rates, so my NAS device should take full advantage of this. This means I should be able to get a theoretical top transfer rate of 1000 Mb (megabits) per second, which is equal to 125 MB (megabytes) per second.

Here is a breakdown of possible hard drive transfer rates for this device. SCSI is a bit out of my price range and overkill for a home solution.

  1. Ultra DMA ATA 66 - 528 Mbit/s = 66 MB/s
  2. Ultra DMA ATA 100 - 800 Mbit/s = 100 MB/s
  3. Ultra DMA ATA 133 - 1064 Mbit/s = 133 MB/s
  4. SATA 150 hard drive - 1500 Mbit/s = 187.5 MB/s
  5. SATA 300 hard drive - 3000 Mbit/s = 375 MB/s

The last three interfaces are all faster than the maximum transfer rate possible over a gigabit network and should be suitable for such an application. As stated before, other technical merits (hot pluggable, no restrictions on writing to multiple devices at once, newer standard with stronger future) make SATA the ideal option. In the current market, you would be hard pressed to find large SATA hard drives that don’t conform to the 300 standard, and the price differences between 150 or 300 aren’t significant. You are more likely to find disk controllers on the motherboard or expansion cards (e.g. PCI cards) that use the slower SATA 150 interface. Fortunately, virtually all SATA 300 hard drives are backward compatible with the SATA 150 controllers, so compatibility is virtually a non-issue.

Speaking of expansion cards, it is quite likely one will be needed in my device. This adds a new interface to take into account when assessing bandwidth bottlenecks. According to the Wikipedia page, the 32 bit PCI expansion slot running at 66 MHz (the most common kind of expansion slot today) has a theoretical maximum transfer rate of 2133 Mbit/s, or 266.7 MB/s. This means that one should not expect the best performance from a single SATA 300 or multiple SATA 150 drives connected to such an expansion card. Still, for the needs of a NAS device, a PCI expansion card supporting two SATA 150 drives should work nicely for expanding a RAID configuration. Possibly using a PATA style configuration (pairing a motherboard controller with a PCI controller for RAID1 mirroring arrays) would optimize reads and writes when dealing with large files.

29 Mar 2008

Custom NAS Device

I’ve been toying with the idea of making a Network Addressable Storage (NAS) device for home use from scratch. For all intents and purposes, I already have. I’ve got an old machine that I built several years ago assigned to the task, as well as serving the network router, firewall and DNS cache. It has also recently come to my attention (via a newly acquired Kill-A-Watt power meter) that this machine sucks in roughly 120 Watts of electricity by itself. At roughly 10 cents a Kilowatt-hour, this comes to over $100 just to leave the machine on. Here I thought I was doing well by consolidating several devices (USB hard drives, router/firewall) into a single machine, but the power consumption of this one machine significantly outweighs the total of the smaller devices.

This got me thinking about what features I would want in my ideal NAS device. At a very high level, I want a device that stores and protects large volumes of data and does not impose itself on my daily life.

Reading back over the above high level requirement, I’m surprised how simple it sounds, yet how broad and non-trivial it really is. To break this down a bit more, I will dissect this sentence into more specific desires.

A Device that Stores and Protects Large Volumes of Data

  • My and my family’s need for storage space is forever increasing. If this machine is going to satisfy my needs long term, its storage space will also need to grow.
  • Data redundancy across two or more hard drives is an important step toward safeguarding against data loss. As such, RAID should be a key feature.
  • “Dirty” or unreliable power can be damaging to any computer. A secondary power supply like a UPS or integrated battery similar to a laptop would help to remediate this risk.
  • A case with sturdy construction, low center of mass and low profile would help to protect the device from minor bumps in high traffic areas.

Certainly, there are many other concerns when protecting data, but I am talking about a NAS device, not a disaster recovery plan.

Does Not Impose Itself on My Daily Life

  • Minimal and easy maintenance. When something starts to go wrong, the system should notify me. Adding or swapping drives shouldn’t require a tool chest and manual.
  • File transfers to and from the device should be fast, even for large multimedia files.
  • Sharing files with Windows and Unix operating systems should be seamless.
  • Quiet and aesthetically pleasing. The device shouldn’t draw any special attention when walking into a room due to bulky, ugly and loud construction. This is my primary reason for placing my current machine in the basement.
  • Initial construction and running costs should be economical and environmentally green.

Devices similar to what I describe are already available on the market. For example, the Drobo “data robot” from Data Robotics, Inc. in combination with their DroboShare will do most of this for around $700 plus the cost of the hard drives. Still, I’m going to look into building one myself, even if only on paper. It would be fun to build, and I may even be able to one-up the Drobo by making my device more general purpose. For example, maybe I can make it a thin Linux machine that uses my 32 inch LCD TV as a monitor or a MythTV box using my HDHomeRun digital TV turners.

28 Nov 2007

Scala and db4o Native Queries

Recently, I’ve been playing with Scala, a multiple paradigm programming language that runs within the Java virtual machine. While searching for on-line resources describing practical applications of the language, how-tos and best practices, I ran across a blog entry by N. Chime showcasing the object database db40. I immediately felt a rush of excitement. I have been keeping an eye on db4o for years but have not had the opportunity to use it beyond a few short-lived pet projects or explorations of new-to-me technology (e.g. Wicket). The technology has always interested me, and I’ve been hoping to use it in a real application.

Using Chime’s blog as a road map, I started creating a contact management app with Scala and db4o. Yes, this is yet another exploratory project, but I hope to evolve it into something real. Anyhow, everything was coming together fine until I tried to implement db4o’s native queries. No matter what I did in the filter method of the Predicate class, I was unable to get the expected results. As it turns out, there is a bit of a Catch-22 when using this feature.

Here you can see an example of a native query taken directly from Chime’s blog entry. You can trust that the Pilot class and listResult(…) function work as expected. I will provide fully functioning code later, but I want to show the obvious/intuitive implementation and how it fails.

def retrieveComplexNQ(db : ObjectContainer) = {
  val result = db.query(new Predicate() {
    def `match`(point : Any) : boolean = {
      val p = point.asInstanceOf[Pilot];
      return (p.getPoints > 99) &&
             (p.getPoints < 199) &&
             (p.getName.equals("Rubens Barrichello"))
    }
  });
  listResult(result);
}

Note that I have changed the match method’s input parameter from All, as shown in Chime’s entry, to Any, the Scala equivalent to java.lang.Object. Without doing this, the Scala compiler complains because the Predicate class defines the abstract match method with a generically typed parameter. Scala does not support Java generics, so more restrictive typing is lost in translation. That is, Java has generics and Scala has generics, but they don’t work together yet. As a result, the Scala compiler will complain if you try to implement the match method with any type other than Any. I suppose more restrictive type checking of the compiler was introduced through the evolution of the Scala language, which is why the All type (includes null) no longer works.

My attempts at using this implementation failed. I started using the db4o-6.3-java5.jar, and found that my native queries failed at run time with a java.lang.IllegalArgumentException due to invalid predicate. With a little digging on-line, I found this error is generated when the Predicate does not have a properly defined filter method: the boolean returning method named “match” that accepts a single parameter. The above Predicate implementation does contain a “match” method that returns a boolean and accepts a single parameter, so I considered that Scala’s missing support of Java generics could be throwing a wrench into things. I switched the db4o implementation to db4o-6.3-java1.2.jar, which is intended for Java 1.2 through 1.4, to remove generics from the equation. This eliminated the exception; however, the queries failed to return correct results. No matter what I did to the “match” method, even hard coding boolean return values, the database behaved as if the method always returned the same value. Implementing the same queries in Java yielded the expected query results, so this issue was clearly a problem with the Scala code.

And the hero of the day is… open source! I finally got around to looking at the db4o source code to find the root cause of my issue, and it didn’t take long to find the problem within the Predicate.getFilterMethod() method. The code explicitly ignores any match method that accepts a single java.lang.Object parameter. While the db4o documentation clearly states that the match method must take one parameter, it fails to mention that this parameter cannot have the general java.lang.Object type.

So the Scala compiler requires that the match method accepts a parameter typed as Any (effectively an alias for java.lang.Object) while db4o explicitly ignores such a parameter. The workaround is to provide both.

def retrieveComplexNQ(db : ObjectContainer) = {
  val result = db.query(new Predicate() {
    def `match`(point : Any) : boolean = 
        throw new Exception("This should never be called!")
    def `match`(p : Pilot) : boolean = {
      return (p.getPoints > 99) &&
             (p.getPoints < 199) &&
             (p.getName.equals("Rubens Barrichello"))
    }
  });
  listResult(result);
}

The above implementation will generate the expected results because:

  1. The Scala compiler is satisfied by the stub method accepting the Any type
  2. The db4o reflection logic can find a match method with a more constrained parameter type

Certainly this code can better conform to the DRY principle by pulling the stub method into a custom abstract class. For your viewing and testing pleasure, here is the complete code for my working native queries test of the db4o object database with Scala. While I have improved DRY for the stub match method, I’m sure more could be done for converting the query returns into native Scala Iterators (Read: It’s late and I want to get this post up before going to bed).

package db4osc.chapter1;

import com.db4o._;
import com.db4o.query._;
import scala.collection.jcl.MutableIterator

object NQExample extends Application with Util {

val db = Db4o.openFile("chapter1.db");

  storePilots(db);
  retrieveComplexSODA(db);
  retrieveComplexNQ(db);
  retrieveArbitraryCodeNQ(db);
  clearDatabase(db);

  db.close();

  def storePilots(db : ObjectContainer) = {
    db.set(new Pilot("Michael Schumacher",100));
    db.set(new Pilot("Rubens Barrichello",99));
  }

  def retrieveComplexSODA(db : ObjectContainer) = {
    val query : Query = db.query();
    query.constrain(classOf[Pilot]);

    val pointQuery : Query = query.descend("points");
    query.descend("name").constrain("Rubens Barrichello")
      .or(pointQuery.constrain(99).greater()
        .and(pointQuery.constrain(199).smaller()));
    val result = query.execute();
    listResult(SIterator(result));
  }

  def retrieveComplexNQ(db : ObjectContainer) = {
    val result = db.query(new Filter() {
      def `match`(p : Pilot) : boolean = {
        return (p.points > 99) &&
               (p.points < 199) &&
               (p.name.equals("Rubens Barrichello"))
      }
    });
    listResult(SIterator(result));
  }

  def retrieveArbitraryCodeNQ(db : ObjectContainer) = {
    val points = Array(1,100);
    val result = db.query(new Filter() {
      def `match`(p : Pilot) : boolean = {
        return (p.points == 1) || (p.points == 100)
      }
    });
    listResult(SIterator(result));
  }

  def clearDatabase(db : ObjectContainer) = {
    val result : ObjectSet = db.get(classOf[Pilot]);
    while(result.hasNext()) {
      db.delete(result.next());
    }
  }
}

case class SIterator[A](underlying : ObjectSet) extends CountedIterator[A] {
  def hasNext = underlying.hasNext
  def next = underlying.next.asInstanceOf[A]
  def remove = underlying.remove
  def count = underlying.size
}


abstract class Filter extends Predicate() {
  def `match`(dummy:Any) : boolean = throw new Exception("Not supported") 
}


trait Util {
  def listResult(result : Iterator[Any]) : Unit = {
    println(result.counted.count);
    result.foreach(x => println(x))
  }
}


class Pilot(val name : String, var points : Int) {
  def points_+(apoints : Int) : Int = {
    points = points + apoints;
    return points;
  }
  override def toString() : String = return name+"/"+points;
}

Executing this should result in the following output. The integer indicates the first row of output for each result set, and corresponds to the number of hits from each query.

2
Michael Schumacher/100
Rubens Barrichello/99
0
1
Michael Schumacher/100

I suspect the evolution of the Scala programming language led to the original code that Chime implemented becoming invalid code. Hopefully, when Scala gains support for Java generics, the need for this workaround will go away, and a more intuitive solution will work as expected. This was successfully implemented with:

  • Eclipse 3.3.1.1
  • Scala plugin for Eclipse with compiler 2.6.9RC312812,
  • JRE 1.6.0.03
  • db4o-6.3-java5.jar

EDIT November 29, 2007 – I got to thinking that the above solution makes perfect sense for the Java 5 version of the db4o library, but makes no sense for the version that does not support generics. So, I revisited my implementation with db4o-6.3-java1.2.jar, and found that this will behave correctly when implemented in the obvious way. I was running into trouble because my match method was still implemented with the Any type. When I changed the code to use a more constrained type, it just worked. For example:

def retrieveComplexNQ(db : ObjectContainer) = {
  val result = db.query(new Predicate() {
    def `match`(p : Pilot) : boolean = {
      return (p.points > 99) &&
             (p.points < 199) &&
             (p.name.equals("Rubens Barrichello"))
    }
  });
  listResult(SIterator(result));
}

In this case, my trouble was due to the undocumented (at least in the API) requirement that the java.util.Object is an invalid parameter type for the match method. Still, I’m surprised I did not get an exception. If I get some time and motivation, maybe I’ll dig into it more.

24 Oct 2007

Nontrivial Hard Drive Partitioning for Linux

When looking to build an PC to replace my old 1.3 GHz AMD Athalon XP, I created a list of requirements (really just desires) and a price limit of $600. A few of those requirements were applied to storage including:

  1. The machine must support at least RAID1
  2. There must be plenty of storage space for multimedia with room for expansion.

I spent endless hours on the web researching hardware, looking for that perfect balance between features and cost. I finally settled on a motherboard that supports 4 SATA 3.0GB/s connectors (supports four higher performance drives), 2 PATA (4 older drives, ideal for reusing my existing DVD and CD drives), and built in RAID support.

Only after I started trying to assemble my machine did I learn that one of my requirements was misguided. Linux does not need special hardware for RAID support. If I had known this when choosing a motherboard, I could have saved tens of dollars, or at leased used that additional money to buy a better part elsewhere in the machine. Interestingly, the software based RAID is a better solution for my needs, as it is not restricted to the hard drive level. Instead, RAID arrays can be constructed with hard drive partitions, allowing for more flexible configurations even when there are fewer hard drives to work with. I bet RAID arrays could be built with media other than hard drives, such as flash drives.

After discovering software based RAID and the mdadm command, I configured my two 250GB hard drives (surprisingly affordable from http://newegg.com) to have identically sized (to the block) partitions as follows:

/dev/hda1 ext3 140MB - /boot
/dev/hda5 swap   3GB - with option pri=1
/dev/hda6 jfs   40GB - / (root directory)
/dev/hda7 jfs  200GB - /home in RAID1 array /dev/md0

/dev/hdb1 ext3 140MB - Not Used
/dev/hdb5 swap   3GB - with option pri=1 
/dev/hdb6 jfs   40GB - Not Used
/dev/hdb7 jfs  200GB - /home in RAID1 array /dev/md0

I was keeping /dev/hda6 open in case I wanted to play with a different linux distribution or a whole other operating system (e.g. http://JNode.org).

I am not sure what happened to the 2,3 and 4 partitions when I created these, as this was a fresh install. I know that one of them (I think it is the 4 partition) is used to create the extended partition for which the later numbered partitions are contained. Maybe the Gparted partition editor that comes with Ubuntu is smart enough to reserve the smaller numbers for the maximum number of partitions outside the extended partition.

I made sure that the /etc/fstab had the priority option set to 1 for both swap partitions, giving the swap space a performance boost due to twice the IO bandwidth of two hard drives. Since the swap partitions have the swap load distributed equally, this is similar to a RAID0 configuration. Keep in mind the performance gain exists only because the swap space is split between multiple hard drives that can perform input and output operations in parallel. If multiple swap spaces were configured on the same hard drive or on multiple hard drives attached to the same old-style IDE (PATA) connector, this would not have a positive effect.

Then, I used the mdadm command line tool to build a RAID1 array (/dev/md0) for the home directory. Since RAID1 mirrors the same data between hard drives, this added degree of assurance that I will not loose my important data in the event of a hard drive crash.

After a few months, I decided to order a digital TV tuner (the hdhomerun) for my PC hoping to use it like a TIVO on steroids. After recording one TV show, I realized my seemingly large 200GB of storage space really wasn’t that large. HTDV records at roughly 2MB per second. At this rate, I could record at most 28 hours of video before exhausting the space in my home directory. Considering this machine is also my personal computer for doing software development, image editing and just about everything else I do outside of work, I was afraid that the system could start to feel a bit crowded. I bought a half terabyte of hard drive space to get away from that constraint, and yet I was faced with it again.

Most of the data I will be storing on my hard drive will not be sensitive in nature. Software can be downloaded or reinstalled, and TV shows will have reruns. Yes, obtaining all that information will be time consuming and painful, but loosing custom code I’ve spent countless hours on, documentation, family photos and the like would be unrecoverable and a shame. Certainly, redundancy beyond a single machine is important, and for the most valuable data I do use removable media, but I haven’t gotten to the point of a disaster recovery plan for my personal files yet. Keeping this in mind, I reconfigured my hard drive partitioning to reduce my home directory to a 40GB RAID1 array, allowing me to combine most of my hard drive space into a single RAID0 partition for large file storage. This is what I am presently working with:

/dev/hda1 ext3 140MB - /boot
/dev/hda5 swap   3GB - with option pri=1
/dev/hda6 jfs   40GB - / (root directory)
/dev/hda7 jfs   40GB - /home in RAID1 array /dev/md0
/dev/hda8 jfs  160GB - /storage in RAID0 array /dev/md1

/dev/hdb1 ext3 140MB - Not Used
/dev/hdb5 swap   3GB - with option pri=1 
/dev/hdb6 jfs   40GB - /home in RAID1 array /dev/md0
/dev/hdb7 jfs  200GB - /storage in RAID0 array /dev/md1

The RAID1 array requires that the partitions match in size exactly since one partition mirrors the other exactly, otherwise excess space in the larger partition is simply never used. The RAID0 array combines the two partitions into one larger partition. With RAID0, the partitions would ideally be the same size, allowing the workload to be evenly distributed across both drives (read performance), but this is not a firm requirement. I would be surprised if I tax my system to the point where I notice a performance difference. After all, this machine is not a high performance network server. I have unfortunately lost the flexibility of the unused 40GB partition, as it is now used in conjunction with part of the larger /dev/hda7 partition for the home drive. What this does do, however, is give me the redundant storage I want for my sensitive data while allowing a large 360GB partition for less valuable data. Using the df command now gives me something that looks like the following.

matlikj@hydra:~$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda6             40923460  14805828  26117632  37% /
varrun                 1037780       112   1037668   1% /var/run
varlock                1037780         0   1037780   0% /var/lock
udev                   1037780       116   1037664   1% /dev
devshm                 1037780         0   1037780   0% /dev/shm
lrm                    1037780     34696   1003084   4% /lib/modules/2.6.22-14-generic/volatile
/dev/sda1               194442     40590    143813  23% /boot
/dev/sdb1               194442      5664    178739   4% /media/sdb1
/dev/md0              40923460  15309264  25614196  38% /home
/dev/md1             358920168     47940 358872228   1% /storage

Getting to this point not at all easy, at least not with the automatic assembly feature of mdadm. After several hours of creating, destroying and reformatting the RAID arrays, I eventually gave up on the automatic assembly feature. I’m sure I’m missing something about how the auto assemble feature works, but I finally decided to take the easy way out and maintain the mdadm.conf file. I just have one more file that needs to be carefully maintained when a software update comes through. Since this post has gotten quite long, I will write the details on how the partitions were created later.

07 Oct 2007

Laptops for the Children

While browsing the news summaries a while back on Slashdot, which I do often, I ran across an entry for the “One Laptop Per Child” project. As with any large effort, particularly those falling into the humanitarian category, there are always conflicting vocal opinions on how the effort is conducted. Regardless of the political aspects, I find the idea intriguing and applicable to more than just developing countries.

The Possibilities

Here in the United States, one of the richest countries in the world, there are still poor communities. Some communities are so poor that their schools can’t afford textbooks or even paper and ink to print their own educational materials. The idea of providing laptops instead of “the basics” to the poorer communities may fly in the face of common sense, but could this common sense be wrong? At the root of this problem is money. Consumables (paper, ink for printing and copying, writing utensils, etc.), textbooks and workbooks all cost money.

The very nature of consumables requires a relatively constant cash flow just to continue the activities that require them. Paper is purchased, used, and disposed of. Eventually the paper runs out and needs to be replenished. If a school were to run off of laptops, it would be possible to eliminate much of these traditional costs. Of course, not all consumable expenses can or even should be eliminated. Penmanship is still of critical importance, and eliminating this kind of training would be irresponsible. However, tests, quizzes and other multiple choice homework assignments could be distributed, collected and graded digitally without putting penmanship at risk.

In my elementary and middle school years, I recall having around 2 or 3 textbooks that I kept in my desk along with roughly the same number of workbooks. While I haven’t participated in the selection of such materials, I would think it reasonable to assume textbooks run around $75 or more each and workbooks around $20 each. The catalog on the Addison Wesley and Benjamen Cummings web site are far pricier than these estimates, though I suspect volume sales may reduce the price some. If these assumptions are roughly accurate, the cost of the text books alone could enough to purchase the OLPC laptop or its competitors.

Of course, the value of the laptop isn’t in the machine itself; it is just the medium for which the same (or similar) content is made available to the student. In the most extreme cases, all this content could be created and maintained by the teachers. This is particularly feasible for math and language; however, this would be more difficult for science and the social studies, as these subjects continue to evolve. Here, public libraries as well as content available on the internet could fill the need. One example is Wikipedia, the online encyclopedia that I find to be a useful resource on a regular basis. There are also online communities of educators intended to facilitate the sharing of classroom material. Some content could still be purchased from traditional publishers in the electronic form. I don’t know if such a business model exists at present, but if there is demand, supply will certainly follow. This has the advantage of obtaining the same content without the cost of physical printing, plus the content can more readily be updated by the provider. Errata can be easily applied directly to the body of the material, and the history and science curriculum will no longer suffer from outdated print that schools cannot afford to replace.

Aside from reference materials, the laptop opens up a new world of possibilities. Multimedia (sound and video) can be used to supplement the written content with broad possibility. What would be a static image in a textbook could be replaced with an animation or movie clip. Audio dictation combined with reading exercises (think karaoke) could help students learning to read or even new languages. Instead of news letters, audio and video recordings of the teacher can be sent home to parents who themselves are illiterate. Even educational games can be used to make learning fun and interesting while away from the classroom. Just imagine a teacher telling his or her students, “Your homework for tomorrow is to play this video game for 30 minutes,” particularly if the game is fun!

The laptop is a computer, and computers were created to compute (i.e. process data). All interactive applications of the laptop (quizzes, homework, games) have the ability to collect data from the user. This too can be a useful feature for the teacher. The most obvious advantage is that the quizzes and homework can practically grade themselves. The less time a teacher spends grading each answer from each student in the classroom, the more time available to work on the lesson plan, creating new content, and have a life away from school (happy employees are more productive employees). This would also give students more immediate feedback, correcting misunderstandings before they are committed to long-term memory. On a “grander” scale, the potential exists for quantitative analysis of student performance for both the teacher and the software to use. Games and homework could automatically adapt to target each student’s weaknesses, and performance reports can be generated for the teacher to identify trends indicating areas needing work, including regression.

One additional benefit that I am particularly sensitive to is for children with poor eyesight. In elementary school (2nd or 3rd grade, I can’t remember), my optometrist informed my mother I couldn’t see the big E on the eye chart. Fortunately, my parents could afford the ridged contact lenses he recommended, and the degradation of my vision slowed significantly, as he had hoped. During the time my eyes were progressively getting worse, I was unable to live up to my potential at school. As an educational tool for me, the chalkboard was nearly worthless. Many children may not receive corrective lenses of any kind because they cannot afford them. A laptop could potentially prevent such vision problems from interfering with such a student’s education. A virtual chalkboard, or even a live feed of an actual chalk board, could be used to broadcast the teacher’s lecture material to each student’s laptop. Those with nearsighted vision can look at the screen while those with farsighted vision can watch the teacher up front. Of course, this doesn’t address farsighted student’s ability to use the laptop.

Concerns

The question of durability comes to mind. A textbook can generally be usable for years before it reaches its end of life. Economic feasibility depends on the laptops being usable as long if not longer than a printed textbook with minimal maintenance costs. This has been a particular design goal for the OLPC laptop, and as such, has been engineered to survive some fairly extreme conditions. With no moving parts, a water resistant housing, and environmental testing for both high temperature and high altitude, I wouldn’t be surprised if it is more resilient than the scientific calculator I’ve used for the past 10 plus years, and that my 4 year old son now thinks is his own personal computer. Take a look at the OLPC features for a summary on the laptop’s construction.

Classrooms, book bags and buses tend not to be “secure” places for objects of value, and many consider laptops as highly valuable. One might think that the laptop would be a sweet target for a crime of opportunity, and it might. However, the kind of laptop proposed here is not a general purpose laptop. Nor is it a powerful, flashy or particularly expensive laptop. The intent is to make the machine simple and specialized for use in an educational environment. Once the machines become familiar to the general public, I suspect that interest in these machines from a profiteering point of view will wane. Perhaps a “marketing effort”, for lack of a better term, could be launched before deployment calling the machines something other than laptops… like educational device. If usage becomes prevalent, they may even be viewed in the same light as text books. This certainly won’t stop theft from occurring, that is a way of life. However, it may eventually keep these machines from disappearing at an unacceptable rate.

I’ve read about concerns with laptops in the classroom due to the distraction factor. Computer games, instant messaging, browsing the web for Youtube videos, Myspace and other trendy social networking sites can become a problem if they are accessible during class time. Again, a machine specialized for an educational environment might help, such as a simple browser that does not support plugins (flash, Java, video players, etc). More importantly, a student’s activities should be readily viewable by the teacher either over the student’s shoulder or from the teacher’s desk. In a traditional classroom, a teacher could easily identify a student discarding the classroom activity in favor of, say a comic book. Digitally, this would not be so obvious unless the teacher could poll each student’s machine for the application and content they are actively using.

Along with the booming popularity of technology comes a new list of popular physical ailments. The increase in text messaging via cell phones led to an increase in thumb injuries now known as blackberry thumb. Laptops tend to place the user into an unnatural posture, leading to its own ailments. Both teachers and students would need to be trained how best to use the laptop to avoid such injuries. Are these new injuries enough to write off the laptop as a good idea? I would argue not. The act of writing more often than not places people into an unnatural posture. Both my wife and I suffer hand and wrist pain any time we put pen to paper for an extended period of time, partly due to the “death grip” we place on the pen despite conscious effort not to. I also find that reading for extended periods of time, particularly when the reading material is left flat on a table, strains my neck and upper back. In short, the potential for injury is a fact of life and is nothing unique to laptop usage. Recognizing injuries and how to prevent them is important in all scenarios.

Another significant concern is the ability of teachers to accept and adapt to using a laptop in the classroom. Even if all obstacles are overcome (financial, technical, content, distribution, maintenance, etc.) and the theoretical benefits are achieved (reduced operating cost, accurate and current content, greater ability to cater to individual student’s needs, increased interest of students both in and out of the classroom, and so on), any effort to use laptops in the classroom will fall flat on its face without the teachers committing themselves to the approach.

Conclusion

As you can see, my belief is the use of this kind of device in school has great potential, regardless of the community’s wealth. Inexpensive hardware will limit the capability of such a machine; however, it will lower the barrier of entry for lower income communities and may prove beneficial over general purpose laptops because of the restrictions. I am considering purchasing one or two of these OLPC machines for my own kids during the OLPC Give 1 Get 1 campaign for use at home.

03 Sep 2007

Hello World

I figured I would introduce myself in traditional programmer fashion via the infamous “Hello World”. For a very brief description of myself, I am a Business Systems Analyst by day, and a husband, father by night. In my free time, I like to watch TV series (preferably on DVD), read and lurk on open source blogs and websites. I also thoroughly enjoy software development; hence the existence of this blog. I intend to use this blog do document learnings, links to quality reading material, and my occasional ramblings on life in general. This post is going to be short and sweet, as I’m still trying to figure out some technical issues with my newly installed blog software.