Category Archives: Thinking IT

Evolution of IT

4GL Patterns #1 – Select or Add

A common pattern found in database applications is the “Select or Add” pattern.

The pattern allows a user to either

  • select an existing item (usually a foreign key), or
  • to add a new item to the foreign key table, then refresh the combo box, and setting the item to the newly added one

Example
The following is a screenshot from Django’s auto-generated administration screens.
The user is presented with a drop-down box, or a plus icon to add new items to the drop-down list.

Select or Add

Select or Add

The following shows an alternate approach, where an “Add” keyword is added to the bottom of a pick list. This approach can be harder to internationalize.

Other notes

When a new Create Data form is shown to the user, while it may be convenient to make this form modal, so that the parent form can refresh it’s pick list. It may be better to present a non-modal interface so that the user is not blocked from performing other work. This can be done by providing a callback whenever an item is created.

Related Ideas

  • Stable foreign keys
  • Caching

Mitigating the Bus-Factor

How much of your software code base is at risk if one of your developers leaves the project, gets sick, or joins another company?

In large team projects, although the entire project may not be at risk, how do we identify which modules / components are at risk? Usually, a team can exercise some gut-feel whether a component is more or less reliant on a single developer. Sometimes, management may even actively seek to mitigate this risk by getting another developer involved with the coding.

This leads to a question: How does management measure whether there is an improvement in the risk management, i.e. the second developer has started to have some ownership over the code-base?

In a commercial enterprise, this [ most project members make a relatively small number of commits, but an individual or small core group of individuals make a much larger set of commits. ] may be more of a concern (due to the time and cost implications of sudden replacement) than in open source projects, where by definition, the openess of the codebase and contribution process offers a buffer against adverse consequences of key man risk.

Source – Rory Winston, Research Kitchen Weblog

Rory goes on to propose the use of the Gini coefficient to measure how reliant a project is on a single contributor.

What I find interesting is that we can apply this metric to individual files, or components, or modules, and use it to guide us where developers should be deployed in the next project cycle. Furthermore, it allows risk management to be taken into account, as a deliverable KPI, and not only on the basis of efficiency and velocity of delivery. Having a developer work on an unfamiliar codebase is slow and is risky, while a developer who is upgrading code that he had written can be three to ten times faster than someone new. Of course, managers are tempted to go down the efficiency route.

It is said that you can’t improve what you can’t measure. Rory’s suggestion is one little step towards objectively measuring code-risk.

Pat Helland – Irrestible forces Meet the Moveable Objects

Transcript of presentation plus my comments in blue.

Computing models have to evolve with new pressures:

  • many tiny devices – low powered, cheaper, but not faster
  • many little flakey data centers (put on a truck and drive away) … we are not seeing this. The reasons is clear: the technology proposed in this talk is not easy to build.

Forces in Processors

Moore’s Law continues – number of transitors doubles every two years, but the voltages isn’t dropping as fast. This is because the high-end chips have huge power consumption.

Why isn’t the CPU frequency getting a lot faster? Pat spent a week reading up. The hardware guys call “the power wall”.  “Static Power” is consuming half of the power – reason: as transistors get smaller, they leak. Current technology that’s just about to emerge is 45 nm. When Pat graduated, they were working on 6 micron. Soon we are moving to 32 nm technology. As frequency goes up, the dynamic power goes up; As dynamic power goes up, chip gets hotter; As chip gets hotter, the leakage goes up! The hardware guys are fighting year after year against this. Hence, frequencies aren’t going to get much faster. Maybe 10%. But the chips will get a lot hotter.

Memory wall – access time to DRAM remains essentially flat. We’re getting more bandwidth to DRAM but not lower latencies. Chips may be faster but time to read memory isn’t becoming faster. Speculative execution in CPU is bringing execution into cache before it’s executed (out of order). This is five times as complicated. In-order CPU is cheaper in terms of transistor and power use.

Hardware will be moving to 500 processors on a single chip. But software guys don’t really know what to do. We are not ready.

Many-Core – On-chip shared memory. So we don’t have to go to DRAM (which is slower).

Forces in the Data Center

Computers are relatively cheap. Power is 40% of cost of running data center. Building shell represents 15%. Most buildings have a lot of empty room because they are limited by power. Reducing power means more computing capacity, less cooling, less battery backup and less diesel power generator back up.

e.g. Sun is selling premanufactured shipping container fitted out with computers – far more space, heat and power efficient. Just hook in chilled water and high voltage power. Expect to lose processors, let them fail in place. Just ship out the container after a few years.

Don’t use back up power. Just use many data centers.

This is only going to work when data is distributed between many data centers. OK, Google already does this, but then it’s a free service, and makes no promises about data consistency. Search results will vary from data center to data center.

Low End Devices

Cheap computer at 1 Watt. Increasingly stateful – holding partitioned data.

Pressures on Storage

The amount of time it takes to read and write on disks is getting longer. Jim Gray explained to Pat that capacity increases with areal density, but read/write time with linear density. 10 TB disk will take 5-15 hours to read sequentially but 15-150 days to read randomly.

“Disk is Tape” - we have to treat disks as cold storage. No offsite media.

“Flash is Disk” – Flash is getting cheaper because of phones, cameras. Flash will replace disk. The IO per GB of flash remains about 200 compared to 4 on SCSI disks. By 2012, flash will cost the same as SCSI. Furthermore, flash runs cool.

All this is true, but I can’t see how this would change the way we would write software.

Bandwidth and Latency

We are getting more bandwidth but latency is limited by the speed of light.

Data Center to Data Center have a special backbone, which is very fast. Total bandwidth triples every twelve months. Faster than Moore’s Law!

As bandwidth gets cheaper, it’s growing at a rate faster than Moore’s Law. The users are getting charged on the peak usage. There’s spare capacity (free) available between the peaks. [Chui: what can we do with those, background transfer of back up information, synchronization ?]

Wireless is increasingly available. But there are still dead spots. We still need to see offline behavior.

Increasing LAN, Increasing WAN, Increasing Wireless … Increasing Expectations!

Forces in the Cloud

Runs video showing how useful separating application state from the machine.

Need to have:

  • per-user-per-app state, the browser does this.
  • safety
  • sandboxing
  • controlled sharing across applications, safely [Chui: see BitFrost on OLPC]

Kinds of Parallelism

Pipelined parallelism – [Chui: like unix pipes, or python generators]

Partitioned parallelism.

To compute faster, we have to bring the data closer. i.e. waiting for DRAM is slow, waiting for network is slow, waiting while offline is slow. We have to make copies of data close by. [Chui: Is the the answer to Joel's architecture astronauts?]

Get used to the idea of not knowing what is truth. Gone is assurance of data consistency, since data hasn’t synchronized yet.

Pat had thought back in 1995 that internet was going to fail because the internet wasn’t transactionally consistent. But turns out people were prepared to accept the lack of consistency.

We need to

  • admit we’re confused – computers only have partial knowledge. They are separated from the real world, and separated from other replicas – so make guesses, but there’s no certainty in computing.
  • If there’s no data consistency, then it’s possible to be decisive at the cost of making mistakes. I think it’s time to start annotating database tables or columns with this kind of rule relaxation for better partitionability.

Summary

  • Smaller computers
  • Smaller data centers
  • Smaller data sets – small independent pieces of data

Transactions will be different. You will be doing transactions on a few unique objects, and server will move the current object all to the same machine.

Subjective consistency – be decisive before you can have all up to date information. Ambassadors had to do this hundreds of years ago! Eventual consistency when getting updated data. Make another decision this time.

C.A.P. rule – Consistency, Availability, Partitionability … pick two out of three. Pat argued that people want to relax the business rules in order to have lower power, more disconnectivity, etc.

Open Source Education Consulting at Middle School in Bronx

James Governor has talked about open source industry analysis, I attended a talk organised by the ACS Toowoomba chapter, from an open source consultant. No he doesn’t implement Linux servers. Pat Wagner does his consulting openly, and blogs publicly about how his consulting engagement is going over at the 339 middle school in New York.

Pat talked about how Google docs and spreadsheets collaboration features were used to devastating effect on helping teachers to work together with unit plans, meetings. Students had blogs to keep their daily reflection, and parents and other students were able to comment openly, providing feedback. The principal had a blog detailing events of the day. 

Pat consults from Australia, and made regular use of Skype and Google chat to keep in touch with the teachers. In addition, he is able to watch changes made to spreadsheets and word documents and comment on them as they are written.

An interesting application of Google docs is the use of version diff. For example, after an English lesson on the use of strong verbs, a teacher can use diff to see if students had applied the lessons and made changes to their original essays.

Not every student has been assigned a computer. Instead, they have work areas of 6 computers, and students rotate into the work areas during class to update their blogs and do computer based exercises.

This is a success story that bears repeating and shows how much more can be done with our schools. Over in Australia, there’s been a lot of discussion about how we can lift the standard of education in Aboriginal communities. There’s no reason if it can be done over in New York, it couldn’t be done in Aurukun or Alice Springs.

 

A misadventure with firewalls

There’s always been a problem for PCs in my home network to access files on my laptop. I could access shared folders on the others, but never the other way round.

My laptop is connected to a wireless access point, while the rest of the PCs are on a wired LAN. I had even turned off Windows firewall, and still couldn’t ping my laptop from other PCs. I double checked McAfee’s firewall rules, where I had ticked “trust all computers on the LAN”.

For the better part of the last two years, I thought that the wireless access point was faulty. But no amount of Googling turned up anything.

Exasperated, I fineally decided to give Network Magic a try. In less than half an hour, it had not only pointed out that McAfee’s firewall was the culprit, but detailed exactly how to navigate the treacherous menus to fix the problem. It was non-trivial.

Configure > 
  Internet & Network >
   Advanced ... >
     Trusted and Banned IPs ... >
        Apply a tick against checkboxes for the IP addresses which are trusted.

This level of menu maze madness is beyond insane, especially given the pretty network diagram showed all the PCs, but did not warn that these PCs have been firewalled. In contrast, Network Magic did it just right.

The moral? Don’t ever think one is so smart that one of those htmlMickey Mouse-looking software can’t help you. It just might be the right medicine.

Software Innovation

Bob Warfield writes on software innovation:

Startups need to be 180 degrees out of phase with what big companies are doing. When big companies are innovating, startups should be commoditizing. When big companies are commoditizing, small companies should be innovating

Andy Kessler would probably disagree.

In his essay “How We Got There”, wrote on how the commoditization of a technology in clothes-making (following expiry of a patent) leads to the next logical innovation point.1

In 1785, Edmund Cartwright sought to fix this problem [of hand-run looms] by applying mechanical power to hand looms. But first he had to wait for Arkwright’s patent on cotton spinning to expire. He knew that cotton mills would then be built, which would turn out an abundance of thread and yarn. Cartwright thought about starting his own cotton mill but figured, smartly, that everyone and his brother would start one of those. Instead, he wanted to leverage the abundance of yarn, not help create it. So he began working on a Power Loom. He was only two hundred years ahead of his time in innovative business thinking.

[1] More on it here

More VMWare Server Review

I had written up a VMWare Server review for the beta release earlier. Given that VMWare Server 2.0 Beta is already available, here’s a rundown on the differences between VMWare Server 2.0 and VMWare Server 1.x.

  • Web based management – this beats having to Remote Desktop to the host operating system to manage virtual machines.
  • Virtual guest OS can access 8 Gb RAM – assuming that your host is a 64bit OS (otherwise you are stuck with only 4Gb addressable memory).

Here’s a screenshot of the web-based management interface on VMWare Server 2.0. It prompts you to install an ActiveX control in order to view the server console. There used to be an annotation feature in 1.x server, so that you can write comments against a virtual machine. This useful feature seemed to have been dropped in 2.0.

VMWare Server 2.0 Beta

The performance of the viewer is not as good as Remote Desktop, although it’s comparable to VNC. There is a full screen option, but it needs more work before final release. It messes around with the resolution of my second monitor. The folks over at 4sysops don’t think too highly of the new VMWare web-based user-interface, and I’m afraid I too concur. It simply takes a little too much screen estate, especially with the frameset in the web pages, plus the space used by the browser chrome. I hope they fix that before the 2.0 release proper.

P2V assistant is now free

The physical machine to virtual machine converter is now free. It’s known as the VM Converter, and is available for download free. I haven’t had the need to use this yet, but I’ll look to using it when I migrate off the laptop I’m writing this post on.

Review of the Scripting Interfaces on VMWare Server

The virtual server now supports VIX API 1.2. The documentation for the VIX API is here. I’ll post more after having a play around with it. The API itself is quite useful, as it permits automation of processes running in the guest. The following function names give an indication how what you can do with the API:

Advertisement: Powerbuilder Australia