Archive for the ‘Thinking IT’ Category

Pat Helland - Irrestible forces Meet the Moveable Objects

Saturday, May 3rd, 2008

Transcript of presentation plus my comments in blue.

Computing models have to evolve with new pressures:

  • many tiny devices - low powered, cheaper, but not faster
  • many little flakey data centers (put on a truck and drive away) … we are not seeing this. The reasons is clear: the technology proposed in this talk is not easy to build.

Forces in Processors

Moore’s Law continues - number of transitors doubles every two years, but the voltages isn’t dropping as fast. This is because the high-end chips have huge power consumption.

Why isn’t the CPU frequency getting a lot faster? Pat spent a week reading up. The hardware guys call “the power wall”.  “Static Power” is consuming half of the power - reason: as transistors get smaller, they leak. Current technology that’s just about to emerge is 45 nm. When Pat graduated, they were working on 6 micron. Soon we are moving to 32 nm technology. As frequency goes up, the dynamic power goes up; As dynamic power goes up, chip gets hotter; As chip gets hotter, the leakage goes up! The hardware guys are fighting year after year against this. Hence, frequencies aren’t going to get much faster. Maybe 10%. But the chips will get a lot hotter.

Memory wall - access time to DRAM remains essentially flat. We’re getting more bandwidth to DRAM but not lower latencies. Chips may be faster but time to read memory isn’t becoming faster. Speculative execution in CPU is bringing execution into cache before it’s executed (out of order). This is five times as complicated. In-order CPU is cheaper in terms of transistor and power use.

Hardware will be moving to 500 processors on a single chip. But software guys don’t really know what to do. We are not ready.

Many-Core - On-chip shared memory. So we don’t have to go to DRAM (which is slower).

Forces in the Data Center

Computers are relatively cheap. Power is 40% of cost of running data center. Building shell represents 15%. Most buildings have a lot of empty room because they are limited by power. Reducing power means more computing capacity, less cooling, less battery backup and less diesel power generator back up.

e.g. Sun is selling premanufactured shipping container fitted out with computers - far more space, heat and power efficient. Just hook in chilled water and high voltage power. Expect to lose processors, let them fail in place. Just ship out the container after a few years.

Don’t use back up power. Just use many data centers.

This is only going to work when data is distributed between many data centers. OK, Google already does this, but then it’s a free service, and makes no promises about data consistency. Search results will vary from data center to data center.

Low End Devices

Cheap computer at 1 Watt. Increasingly stateful - holding partitioned data.

Pressures on Storage

The amount of time it takes to read and write on disks is getting longer. Jim Gray explained to Pat that capacity increases with areal density, but read/write time with linear density. 10 TB disk will take 5-15 hours to read sequentially but 15-150 days to read randomly.

“Disk is Tape” - we have to treat disks as cold storage. No offsite media.

“Flash is Disk” - Flash is getting cheaper because of phones, cameras. Flash will replace disk. The IO per GB of flash remains about 200 compared to 4 on SCSI disks. By 2012, flash will cost the same as SCSI. Furthermore, flash runs cool.

All this is true, but I can’t see how this would change the way we would write software.

Bandwidth and Latency

We are getting more bandwidth but latency is limited by the speed of light.

Data Center to Data Center have a special backbone, which is very fast. Total bandwidth triples every twelve months. Faster than Moore’s Law!

As bandwidth gets cheaper, it’s growing at a rate faster than Moore’s Law. The users are getting charged on the peak usage. There’s spare capacity (free) available between the peaks. [Chui: what can we do with those, background transfer of back up information, synchronization ?]

Wireless is increasingly available. But there are still dead spots. We still need to see offline behavior.

Increasing LAN, Increasing WAN, Increasing Wireless … Increasing Expectations!

Forces in the Cloud

Runs video showing how useful separating application state from the machine.

Need to have:

  • per-user-per-app state, the browser does this.
  • safety
  • sandboxing
  • controlled sharing across applications, safely [Chui: see BitFrost on OLPC]

Kinds of Parallelism

Pipelined parallelism - [Chui: like unix pipes, or python generators]

Partitioned parallelism.

To compute faster, we have to bring the data closer. i.e. waiting for DRAM is slow, waiting for network is slow, waiting while offline is slow. We have to make copies of data close by. [Chui: Is the the answer to Joel's architecture astronauts?]

Get used to the idea of not knowing what is truth. Gone is assurance of data consistency, since data hasn’t synchronized yet.

Pat had thought back in 1995 that internet was going to fail because the internet wasn’t transactionally consistent. But turns out people were prepared to accept the lack of consistency.

We need to

  • admit we’re confused - computers only have partial knowledge. They are separated from the real world, and separated from other replicas - so make guesses, but there’s no certainty in computing.
  • If there’s no data consistency, then it’s possible to be decisive at the cost of making mistakes. I think it’s time to start annotating database tables or columns with this kind of rule relaxation for better partitionability.

Summary

  • Smaller computers
  • Smaller data centers
  • Smaller data sets - small independent pieces of data

Transactions will be different. You will be doing transactions on a few unique objects, and server will move the current object all to the same machine.

Subjective consistency - be decisive before you can have all up to date information. Ambassadors had to do this hundreds of years ago! Eventual consistency when getting updated data. Make another decision this time.

C.A.P. rule - Consistency, Availability, Partitionability … pick two out of three. Pat argued that people want to relax the business rules in order to have lower power, more disconnectivity, etc.

Rich Metadata Mediated UI development

Saturday, May 3rd, 2008

What would you automate into your boilerplate code after having 10 years of writing database applications? Here are some links to promising projects/essays:

What are your thoughts here? Are there any other notable projects in a similar vein?

Open Source Education Consulting at Middle School in Bronx

Thursday, April 24th, 2008

James Governor has talked about open source industry analysis, I attended a talk organised by the ACS Toowoomba chapter, from an open source consultant. No he doesn’t implement Linux servers. Pat Wagner does his consulting openly, and blogs publicly about how his consulting engagement is going over at the 339 middle school in New York.

Pat talked about how Google docs and spreadsheets collaboration features were used to devastating effect on helping teachers to work together with unit plans, meetings. Students had blogs to keep their daily reflection, and parents and other students were able to comment openly, providing feedback. The principal had a blog detailing events of the day. 

Pat consults from Australia, and made regular use of Skype and Google chat to keep in touch with the teachers. In addition, he is able to watch changes made to spreadsheets and word documents and comment on them as they are written.

An interesting application of Google docs is the use of version diff. For example, after an English lesson on the use of strong verbs, a teacher can use diff to see if students had applied the lessons and made changes to their original essays.

Not every student has been assigned a computer. Instead, they have work areas of 6 computers, and students rotate into the work areas during class to update their blogs and do computer based exercises.

This is a success story that bears repeating and shows how much more can be done with our schools. Over in Australia, there’s been a lot of discussion about how we can lift the standard of education in Aboriginal communities. There’s no reason if it can be done over in New York, it couldn’t be done in Aurukun or Alice Springs.

 

A misadventure with firewalls

Sunday, February 24th, 2008

There’s always been a problem for PCs in my home network to access files on my laptop. I could access shared folders on the others, but never the other way round.

My laptop is connected to a wireless access point, while the rest of the PCs are on a wired LAN. I had even turned off Windows firewall, and still couldn’t ping my laptop from other PCs. I double checked McAfee’s firewall rules, where I had ticked “trust all computers on the LAN”.

For the better part of the last two years, I thought that the wireless access point was faulty. But no amount of Googling turned up anything.

Exasperated, I fineally decided to give Network Magic a try. In less than half an hour, it had not only pointed out that McAfee’s firewall was the culprit, but detailed exactly how to navigate the treacherous menus to fix the problem. It was non-trivial.

Configure >
  Internet & Network >
   Advanced ... >
     Trusted and Banned IPs ... >
        Apply a tick against checkboxes for the IP addresses which are trusted.

This level of menu maze madness is beyond insane, especially given the pretty network diagram showed all the PCs, but did not warn that these PCs have been firewalled. In contrast, Network Magic did it just right.

The moral? Don’t ever think one is so smart that one of those htmlMickey Mouse-looking software can’t help you. It just might be the right medicine.

Software Innovation

Sunday, February 3rd, 2008

Bob Warfield writes on software innovation:

Startups need to be 180 degrees out of phase with what big companies are doing. When big companies are innovating, startups should be commoditizing. When big companies are commoditizing, small companies should be innovating

Andy Kessler would probably disagree.

In his essay “How We Got There”, wrote on how the commoditization of a technology in clothes-making (following expiry of a patent) leads to the next logical innovation point.1

In 1785, Edmund Cartwright sought to fix this problem [of hand-run looms] by applying mechanical power to hand looms. But first he had to wait for Arkwright’s patent on cotton spinning to expire. He knew that cotton mills would then be built, which would turn out an abundance of thread and yarn. Cartwright thought about starting his own cotton mill but figured, smartly, that everyone and his brother would start one of those. Instead, he wanted to leverage the abundance of yarn, not help create it. So he began working on a Power Loom. He was only two hundred years ahead of his time in innovative business thinking.

[1] More on it here

More VMWare Server Review

Monday, December 10th, 2007

I had written up a VMWare Server review for the beta release earlier. Given that VMWare Server 2.0 Beta is already available, here’s a rundown on the differences between VMWare Server 2.0 and VMWare Server 1.x.

  • Web based management - this beats having to Remote Desktop to the host operating system to manage virtual machines.
  • Virtual guest OS can access 8 Gb RAM - assuming that your host is a 64bit OS (otherwise you are stuck with only 4Gb addressable memory).

Here’s a screenshot of the web-based management interface on VMWare Server 2.0. It prompts you to install an ActiveX control in order to view the server console. There used to be an annotation feature in 1.x server, so that you can write comments against a virtual machine. This useful feature seemed to have been dropped in 2.0.

VMWare Server 2.0 Beta

The performance of the viewer is not as good as Remote Desktop, although it’s comparable to VNC. There is a full screen option, but it needs more work before final release. It messes around with the resolution of my second monitor. The folks over at 4sysops don’t think too highly of the new VMWare web-based user-interface, and I’m afraid I too concur. It simply takes a little too much screen estate, especially with the frameset in the web pages, plus the space used by the browser chrome. I hope they fix that before the 2.0 release proper.

P2V assistant is now free

The physical machine to virtual machine converter is now free. It’s known as the VM Converter, and is available for download free. I haven’t had the need to use this yet, but I’ll look to using it when I migrate off the laptop I’m writing this post on.

Review of the Scripting Interfaces on VMWare Server

The virtual server now supports VIX API 1.2. The documentation for the VIX API is here. I’ll post more after having a play around with it. The API itself is quite useful, as it permits automation of processes running in the guest. The following function names give an indication how what you can do with the API:

Advertisement: Powerbuilder Australia

We Didn’t Start This Bubble

Wednesday, December 5th, 2007

Aptly put.

Escaping XML and CDATA

Saturday, October 27th, 2007

I don’t know what got into the head of the designers of XML, that CDATA escape sections have their own escaping syntax. Lshift has a low down on the details: Escaping XML and CDATA. Good to know that xml.dom.minidom gets this right.

Fighting Comment Spam

Saturday, September 15th, 2007

It’s so easy to get throwaway email addresses and throwaway domains combatting forum spam can be very difficult.

Here are some options:

1) use mod_access to ban open proxies

Similar to the approach used by denyhosts, this would blackhole proxies that are often used by bots.

2) publish ip addresses and signup dates to a central server

If there is too much signup activity around these ip addresses, a forum can retrospectively ban them.

3) work with a central service which answers questions like this:

how many different ip addresses are used by the poster xyz@hotmail.com

how many forums is the xyz@hotmail.com signed up to

Privacy can be maintained by having forum owners upload hashed email addresses, or using bloom filters.

4) Emails from low-trust domains (i.e. susceptible to bots) will result in the user being put through more hoops to register as a member, and posting

5) Easy bulk auditing of hyperlinks posted in forums

6) Transparent spam policy

7) Use overture.com data to assess the relative value of words being posted. Words with high CPC appear first during bulk audits.

8) Possible spam hyperlinks are turned into plain text

9) Flag possible spammy domains, e.g. multiple email accounts from 1chuan.org for example

10) Flag possible spammy email names, e.g. three consonants in a row

11) Flag possible spammy email to user name correspondence, usually user handle and email are related.

From defunct spamproofwiki project on sourceforge ,

Spam protection includes no posting of URLs by public members, e-mail sign up, RSS feed of IP block list, easy ISP reporting of attacks, Bayesian IP blocking, random submit fields and Google rel attributes.

Update: SpamHuntress is losing ground. Where’s akismet for wikis?

JScript. Dynamic languages’ forgotten cousin.

Sunday, July 29th, 2007

Update 27 July 2007: Looks like there’s resurgent interest in serious Javascript use on the server-side.

In response to Han’s Nowak’s question of what new language to use, and since Python had to be excluded, may I suggest JScript.NET?

(I’ve moved most of the technical aspects of the discussion over to the Python, Zope and dotNET blog. )

The question I’d like to pose here, is why a sloppy (in the good sense) language like JScript doesn’t gain more traction among the development community?

I’m keen to hear what your thoughts are.

More Spreadsheet Innovation

Saturday, July 14th, 2007

Readers are probably aware how fond I am of spreadsheets as a tool for rapidly prototyping applications. Here’s another implementation of spreadsheets based around IronPython, but addresses issues like shared updates.

Ubuntu 7.0.4 Feisty Fawn on Vmware

Thursday, July 12th, 2007

Just make sure your VMWare virtual hard disk is at least 2.5 Gb.

Ubuntu will not install properly from LiveCD (which is the default download) when the virtual hard disk is 2.0 Gb. Grub

What Rich Client Applications Can Learn from the Web

Wednesday, June 20th, 2007

Zef writes in Ajax Reality Check that

Does anybody realize where we came from and that these “web 2.0 technologies” aren’t great at all, but just the best we could do — in the browser?

However, I assert that do have something to learn from the browser1, and it’s not ajax.

Here is my list:

1. Bookmarkable applications. I can send you the URL of a particular screen, or a particular record.

2. This opens up applications to easily implement Most Recently Used Feature, or Most Recently Accessed Data

3. It makes it easy to add annotation-like features to an app (through mashups).2

4. You can create mashups easily, e.g. treat an application as a component for free3

5. Back button and Breadcrumbs (MSMoney-style UI), although the caching of application screens on browser make this somewhat broken, but desktop apps will be able to address this.

6. Ease of deployment - no need for Administrator rights to install applications, modify registry, centralized profiles. Notably, a lot of apps today are still written in the style that network availability and bandwidth is poor, rather than the other way round - ubiquitious network availability. (Sidenote: things get interesting in the mobile space, because bandwidth is expensive, while storage gets cheaper, you might see a resurgence of the older style of programming again)

Footnotes

1Check out AJAX is not a mobile paradigm

2I had mentioned here:

In Scribbling in the Margins, Jon Udell’s thesis is that extensibility was the mark of enduring design. Jon likened the extension of DNS records for use in SPF to how people scribble in the margins of a document.

3There are versioning and security issues with mash ups though. Spoofing, CSRF. Any guidance on doing this properly with a critical app?

A humble proposal for a few more HTML elements for HTML5

Saturday, June 16th, 2007

This is for the benefit of Google and Yahoo. (tongue in cheek)

  • <sponsored> Anything in between these tags are not to be trusted, advertisements, banner ads, text link ads, Adsense
  • <searchresults> Google had asked for search result pages not to be indexed
  • <unmoderated> Use this for sections where untrusted public can comment. Alternative to nofollow

An this is for the semantic web designers (tongue in cheek)

  • <tablelayout><trlayout><tdlayout> The return of table layout, semantic web style

But seriously now, where’s <input type=”grid”> ?

Considerations When Designing Your Own Programming/Scripting Language

Monday, June 11th, 2007

Found this - How to Design a Declarative Language - via Bill Clementson’s Blog Post (DSL Design Considerations). It’s so useful that I’m going to lift it and archive it a copy over here as well.


Contributors: Andrew Cooke, Steve Dekorte (*), Matthias Holzl (*), Jerry Jackson (*), Jonathan Rees (*), Anton van Straaten (*)

Here are some questions a prospective language designer should ask himself when starting the designing a programming language.

- What need are you trying to fill? Don’t fall into the trap of “a scripting language”, because they always turn into general-purpose languages.

- What’s the metaphor? Even though you might not be trying to build a “pure” language, it’s worth having a model for the core language, such as “imperative, block-structured” (C), “object oriented” (Smalltalk), “generic object orientation” (Lisp), “functional” (ML), “lazy” (Haskell), “logic” (Prolog), “production system” (OPS5), etc. These different core models influence the “natural” styles of program development in different languages even if the set of available facilities is similar. They also help define which late-arriving features will “fit” and which will be warts. [Jerry Jackson]

- How many programming paradigms does your language support? How tightly are they integrated? Which other paradigms can you integrate with the built-in facilities? How natural is the syntax of user-defined extensions? Many problems are much better suited to some non-standard programming model than to the usual object-oriented/functional approaches. For example, constraint languages allow a very concise description (and efficient solution) of many optimization problems. Dylan supports functional and object-oriented programming in a tightly integrated manner, but it offers no support for non-deterministic programming, constraint-solving, etc., and not much support to add them to the language. If you have first-class continuations in the language you can add one additional programming model that requires non-standard control flow, but, in general, different extensions based on call/cc don’t work together. [Matthias Holzl]

- Is high performance an issue? This says something about whether you want to implement an interpreted, a VM-based, or a natively compiled language.

- Is high programmer productivity an issue? How important is this with respect to performance? This decision can affect how you store values and do function calling.

- How portable across platforms do you want the language to be? This will relates to whether you want to compile to a VM or to machine code, and to how well you support native libraries. It will also affect library design for such things as graphics and GUI tools. [Anton van Straaten]

- Do you want easily distributed executable code, i.e., do you you want to allow code to be easily transmitted across networks and run elsewhere, as Java does? Do you want to provide built-in support for remote execution, like RPC/CORBA/RMI? If you are writing for a VM, this can simplify some of these issue considerably. [Anton van Straaten]

- What about debuggability? If you plan to compile it, you need to think about how to store debugging information.

- How do you want to bootstrap it? This, too, says something about what kind of back-end you might build. Perhaps you build a tiny VM in C, then compile to C. This way, you avoid fun but time-consuming work on code generation for modern super-scalar hardware, register allocation, etc.

- Do you want to be able to catch type errors early or late? That says something about your type system (whether you require that all types be statically declared at compile-time, or allow them to be dynamic, or have a hybrid scheme like Dylan does). In addition to the obvious effect on performance, this decision will affects your memory model in that completely static systems do not require tags or boxing.

- Will variables be associated with explicit type declarations?

  • If yes, will these type declarations be required or optional?
  • If optional, will the language use inferencing to supply unspecified types, or simply use an all-purpose type (like Object or ‘any’)?
  • [Anton van Straaten]

- Will the language have any run-time type discrimination/checking at all, or will types be completely statically determined? Some languages considered statically typed still do some run-time checking, such as Java. [Anton van Straaten]

- Will any type checking happen at compile-time? Some languages with explicit type declarations don’t always check types at compile-time, such as old Visual Basic. [Anton van Straaten]

- If you allow type declarations, you will want to think about whether you want parameterized types. If you go whole hog with, say, F-bounded polymorphism, you can get performance and type safety and ease of use, but it’s hard to get this exactly right.

- What about namespaces? Do you want to have a simple scheme as in Java, where classes, namespaces, and files are roughly equivalent? Lisp-style packages? Dylan-style modules and libraries? Within a single first-class namespace, how many second-class namespaces are there? Java has 7 or 8: class names, function names, local variable names, slot names, etc. Common Lisp has at least 3 (function, variable, and class names). Dylan and Scheme have one, which greatly simplifies things at a small loss of generality which can usually be worked around with name conventions.

- What about encapsulation? Do you want to do information-hiding on a per-class basis as in C++ and Java, or on a “module” basis as in Dylan?

- Is your language a functional language (that is, without side-effects)? If so, is it an almost-functional language or a true pure functional language? Or is there a functional core with some sort of machinery for isolating side-effects, like monads do in Haskell?

- What kind of evaluation semantics does the language have? Eager as in most languages, or lazy as in Haskell?

- Is your language purely lexical or do you offer dynamic variables (or, more generally, access to the dynamic environment) as well? Dynamic binding allows you to introduce local state for the duration of a computation without side effects and without adding additional parameters. [Matthias Holzl]

- Are there different semantics for “pointer-ish” and “non-pointer-ish” values, like in C? Or is everything a first-class object reference, like in Lisp? Having multiple ways of referencing values can make the user mode much more complicated. On the other hand, making everything be object references can require boxing and/or tagging schemes that make your compiler and FFI more complex.

- How do you want to pass arguments to functions? By name as in Algol? By value or by reference as in C? By object reference like Lisp does? Is there more than one convention in the language?

- Do you want first-class functions? What about lexical closures? First-class continuations? The answer to those questions will tell you things about heap- and stack-allocation, and will also tell you how important it might be to do a continuation-based compiler. It also tells you how hard your compiler has to work to avoid consing environments unnecessarily. Lots of sophisticated language designers go with simple closures and avoid full continuations, because full-scale environment capture is hard to do well.

- Does your language have an unwind-protect like facility? When you design a new language it is tempting to include call/cc because it allows you to do define many common (and uncommon) control structures. On the other hand you want to have a facility that allows you to reliably relinquish resources after you are done. If you simply try to combine call/cc and unwind-protect, you immediately get the “impenetrable shield vs. unstoppable force” problem in your language. Possible solutions include: no call/cc, weakened unwind-protect, different semantics for call/cc. [Matthias Holzl]

- How do you handle conditions/errors? Return codes or signalling? Do you have an unwinding-only model like C++/Java or do you allow restarts like Dylan/CL? If you do the latter do you separate conditions and restarts like Common Lisp or do unify them like Dylan? These questions are important, because every programming language has to deal with error conditions, and in many cases the unwinding model is used simply because the language designer is not aware of any other possibilities. [Matthias Holzl]

- Do you want the language to be “object-oriented” at all, given a broad definition of OO that includes the spectrum from single inheritance single receiver languages as in Java to multiple inheritance multiple receiver languages as in CLOS? Do you want to provide genericity through some sort of template scheme?

Object Orientedness is a Fuzzy Term

Here is how Jonathan Rees has characterized the very fuzzy term “OO”.

1. Encapsulation — the ability to hide the implementation of a type

2. Protection — the inability of the client of a type to detect its implementation, guaranteeing that any changes to an implementation that preserve the behavior of the interface will not break any clients. This also gives some measure of “security”, because things like passwords can’t leak out.

3. Ad hoc polymorphism — functions and data structures with parameters that can take on values of many different types.

4. Parametric polymorphism — functions and data structures that parameterize over arbitrary values, such as “a list of anything”). ML and Lisp both have this. Java doesn’t quite because of its non-Object primitive types.

5. Everything is an object — all values are objects. True in Dylan, but not in Java because of its primitive types.

6. “All you can do is send a message” (AYCDISAM) = Actors model — there is no direct manipulation of objects, only communication with (or invocation of) them. The presence of fields in Java violates this.

7. Specification inheritance = subtyping — there are distinct types known to the language with the property that a value of one type is as good as a value of another for the purposes of type correctness. An example is Java interface inheritance.

8. Implementation inheritance/reuse — having written one pile of code, a similar pile (such as a superset) can be generated in a controlled manner, that is the code doesn’t have to be copied and edited. A limited and peculiar kind of abstraction. (E.g. Java class inheritance.)

9. Sum-of-product-of-function pattern — objects are, in effect, restricted to be functions that take as first argument a distinguished method key argument that is drawn from a finite set of simple names.

Some people say Lisp is OO, meaning {3,4,5,7}. Some people say Java is OO, meaning {1,2,3,7,8,9}. E is supposed to be more OO than Java because it has {1,2,3,4,5,7,9} and almost has 6; 8 (subclassing) is seen as antagonistic to E’s goals and not necessary for OO. The conventional Simula 67-like pattern of class and instance will get you {1,3,7,9}, which many people take as a definition for OO. [Jonathan Rees]

- If the language is object-oriented, do you want it to be class-based or prototype-based? [Steve Dekorte]

- If you’ve got an object system, do you want it to have first-class objects that exist in the run-time? Should the object system extend to include all the way to the primitive types, or do you want to special-case those like Java does? Do you want a Smalltalk/Java-style single receiver object orientation, or a CLOS-style multi-method generic function dispatch? If the former, do you need some sort of static overloading like C++ has? If the latter and performance is important, do you need some sort of Dylan-style “sealing” so that you can do some compile-time optimizations? Do you want single inheritance, single inheritance with interfaces, multiple inheritance, or a hybrid single inheritance with mixins? If you’ve got a more static type system, you’ll need to deal with casts. Do you additionally want auto-conversion?

- If you’ve got an object system, how much of a meta-object system do you want to expose? Do you want it to be purely reflective, or more than that? In Dylan, we separated ‘make’ from ‘initialize’, which was a good idea, but do you also want to separate out ‘allocate’, so that you have control over where an object is created, e.g., in a “persistent memory” pool that might be back-ended by a database?

- Do you need hairy CLOS-style method combination, or is a simpler style like we did in Dylan enough? Do you care about what Gregor Kiczales calls “aspects”, which might change your decision?

- A more general question that relates to the object system, the meta-object system, and a different dimension of the bootstrapping question is: do you want to implement a language which provides a bunch of predefined and fixed constructs (such as an object system) or do you want to provide a layered language that implements such constructs in terms of lower-level features in the language? The former is probably easier, but the latter can allow very flexible customization, which tends to be traded off against standardization. Note that even a language with a powerful built-in meta-object system won’t necessarily allow you to replace that object system with something else, for example, unless the language supports that sort of thing. [Anton van Straaten]

- How do you want to do memory management, manual or automatic (GC)?

- Do you want to support threading? Do you want to roll your own threads or use OS threads? Do you want to support massive concurrency like Erlang does? The answers to those questions will tell you about aspects of the run-time, memory allocation/GC, and performance. Oh yeah — it also tells you if you can actually take advantage of the multiple processors sitting in most of the machines we all have. Do you want Java-style synchronization where it is built in to objects, or should that be handled orthogonally?

- If you have threads and continuations, how do they relate to each other?

- How well do you want to be able to integrate with native libraries? This decision affects your memory model, how you plan to represent run-time type info, how function call/return works, how signalling works, etc. By “memory model”, I also mean to include what sorts of objects are boxed or tagged. (Opinion: the Harlqn/FunO Dylan compiler got it wrong — I think we should have boxed everything, and then concentrated our efforts on box/unbox optimizations. This would have *hugely* simplified FFI issues.) Good integration with native code probably means that you will end up using a conservative collector, and that will effect the semantics of “finalization” (if you have it).

- Do you want to be able to return multiple values? How about &rest arguments? These affect function call/return, tail-call elimination, and stack vs. heap allocation optimizations.

- What’s your order of evaluation in expressions? This affects what sort of optimizations can be safely done.

- What compilation model do you want? Lots of include files like C[++]? Lots of “packages” like Java? Whole-worlds like Lisp? Separate libraries like Dylan? This affects a lot of things, not least of which is the ability to deliver small applications. It also informs the design of your core run-time.

- Is the core run-time tiny like Scheme’s? Small like Dylan’s? Huge like Common Lisp’s? If you like the Common Lisp model, it’s worth looking at EuLisp to see how to re-package it in a more layered way.

- Even in a small run-time, you need to get the basic types right. Are your numeric types “closed” (that is, do they include reals — rationals and irrationals — and complex numbers)? Are your string and character types rich enough to model Unicode?

- Think hard about collections. How do the following relate to each other: sets, tables, vectors, arrays, lists, sequences, ranges? In Dylan, we decided too late having the tail of a list be a “cons” was maybe not such a great idea; what about that? How do your collections interact with your threading model?

- Think hard about iteration, especially over collections. If all collections obey a uniform iteration protocol, it means that you can do things like ‘for e in c …’. Note that if iterators are done in a first-class way, this has performance implications that your compiler needs to worry about.

- Do you want some sort of security model built into the language? What sort of model do you want to use? A simple “checker” like the Java VM uses, or a more sophisticated capability-based model.

- What syntax do you want? Parentheses unaccountably give lots of people hives, but S-expressions make a lot of things much simpler. Infix syntax is quite nice when it’s done well, but you’ve got to get the “kernel” of that exactly right if you want your infix macro system ever to be usable. If you decide on S-expressions, should they be represented as lists and conses, or do you want a first-class object for that?

- Do you want to allow syntactic extensions (macros)? Lisp-style macros? Dylan-style pattern-matching non-procedural hygienic macros? Scheme-style ’syntax-case’ pattern-matching procedural hygienic macros? This says a lot about the syntax of your language, and it also says a lot about the model you choose for compile-time evaluation environments.

Update 26 Aug 2007:
From LtU Let’s Make A Programming Language, Frank Atanassow’s comment:

Go study Scheme and Prolog and ML and Haskell and Charity and Lucid Synchrone and OBJ and Erlang and Smalltalk. Look at Epigram or Coq or HOL or LEGO or Nuprl. Aside from Java, these are the important ones. If you are familiar with all of these, then you are in a decent position. If you have only ever programmed in C/C++/Java and Lisp and scripting languages, you have been sitting in a corner your whole life. Perl, Python, Ruby, PHP, Tcl and Lisp are all the same language. (Scheme itself is only interesting for hygienic macros and continuations.)

In the Attention Economy, you steal Attention by Borrowing Copyright

Monday, June 11th, 2007

John Andrews pushes back Lessig on a book exec’s stealing of a few Google’s computers at a book expo:

It’s no longer about stealing the book and leaving someone the lesser, Mr. Lessig. That ceased to be important when Google started advertising in search results. It’s now about monetization of the process of publishing original works. Perhaps the Macmillan executive should have simply copied the powerpoint slides from those Googlers, and displayed them at his booth to draw a crowd interested in hearing what Google was doing. Once in the Macmillan booth, of course Macmillan could deliver the Macmillan anti-Google message.

John is right here. Looking at the free-to-air TV as an example, the TV stations rent copyrighted material, and exchanges them for the attention of the audience (plays ads). Unlike bloggers, who exist symbiotically with search engines, book publishers are big enough to hold their own. They could always roll their own search engines, and Google pays them for book search results. It would be no different from Google paying for satellite pics to go on Google Maps.

“Flow”

Friday, June 8th, 2007

Getting in the flow is a concept that is familiar to both programmers and musicians. Flow is a state of immersion, where productivity peaks, and stream of genius and creativity literally “flows” into the fingers.

In Designing Musical Instruments for Flow, Spencer Critchley writes the path to “flow”:

  1. Immediate, meaningful, physical feedback
  2. Natural, ergonomic control
  3. Aesthetic pleasure
  4. Capture a wide range of input data
  5. Technology should serve simplicity
  6. Progressive complexity
  7. Activity oriented design

Empirically, the points above explain why languages like Python, Ruby appeal to programmers. Not having to statically type generates a great deal of “flow” - technology serving simplicity1. Library documentation which includes useful working examples aid this as well, as is a library API that is accessible - activity oriented design2, 3. Duck typing captures a wide variety of input data. REPL mode - immediate feedback. 4

One might think progressive complexity having has less to do with flow, and more to do with accessibility and usability. This is not true. Progressive complexity means that “newbies” get into flow quickly too, and provides the positive feedback which encourages them to persevere and refine their skills.

What about non-programming software? Working with spreadsheets puts me in the “flow”, but working with Word doesn’t. Perhaps this is because working with spreadsheets is repetitive in nature: set up a formulae, and then the rest of the operations are copy and paste. Perhaps it’s because the key bindings for spreadsheets make light work of repetitive tasks. Perhaps it’s all of the above.

Perhaps, with Word, writing is such a creative process, that I find - especially as a non-native speaker - that it is easier to get distracted by the variety of ways in which I can reformat a paragraph. In fact, I find it more productive to work in WordPress than in Word because of this.

Last - but definitely not the least - let’s not forget the contribution of search engines as the ultimate flow-generator. Most typical programming problems have been solved before by someone else. Having ready access to this knowledge makes it easy for developers to work-around those 10% problems that normally take 90% of the time.

footnotes

1Patrick Logan is right (dead link) about static (dead link) type (dead link) checking not being very useful, but he didn’t emphasize the point that people tend to statically type even when it is optional. In fact, it is a matter of discipline not to statically type when I program in VB. I have to consciously leave variables as variants while code is being prototyped.

2Look at the .Net api for saving an Image, how many people will be able to save an image to a file without looking at the sample code? Compare the DotNet sample code with the one in python. When I want to eat cake, please do not offer me flour, eggs, and sugar.

3Joel Spolsky offers a simpler view on where flow comes from: just being able to get started.

4What does this say about a web-based environment for developing programs? Environments such as Zope, and JotSpot? Applying the rules of flow tell us that, everything else being equal people will prefer a medium that offers faster feedback. I know that large chunks of my Zope development is prototyped through the web, and that was good enough. We wait eagerly to see the outcome of the JotSpot experiment for another data-point.

Bibliography

Mihaly Csikszentmihalyi, The Evolving Self.

Mihaly Csikszentmihalyi, What Is Enlightenment magazine interview

Will Linux Ever Get Out Of It’s Driver Mess?

Tuesday, June 5th, 2007

Linux has a scalability problem. The current vision of Linux device driver Utopia is based around “stick your source code in our tree, and as we change our kernel interface, some one in the open source world will fix up your drivers.” This approach simply wouldn’t scale, as the number of drivers in the tree increases.

The Linux Kernel Driver Interface FAQ has a mini rant on how unimportant stable kernel interfaces are, but yet assures us that the syscall interface is very stable and will not break.

I believe the proof is in the pudding:

  • The number of applications that run on Linux is a function of the stability of the userspace interface.
  • The comparative lack of drivers running in Linux is a function of the instability of the userspace interface

In fact, if Linux developers would even make one tiny change in the syscall interface, they’d face such a huge outcry that RH, Novell and Suse would fork the codebase.

Even should the Linux kernel utopia be and there are thousands of device drivers in the tree, who’s going to update every driver when the kernel interface changes? Are these masses of non-existent developers going to show up and update drivers in lock-step? Face-it, not defining a stable driver interface is not a good practice. Even for open source developers.

It’s time the kernel developers recognize that it’s only a kernel that they are providing, and they do live in an ecosystem. Hiding behind the open source argument to allow them to bash hardware vendors does themselves no justice in the long run. For Linux to thrive, it needs to abandon it’s “Aryan-opensource-is-pure” bullshit and embrace the melting pot of an the ecosystem where different participants have different survival strategies. For instance, the enemy of hardware vendors isn’t the open source developers. It’s other hardware vendors.

The Linux Driver FAQ argued that it’s impossible to provide stable binary interfaces. Ian Murdock points out that

that it takes a better engineer to move a platform forward while at the same time making sure things don’t break

After all, if Microsoft weenies could proffer some kind of stable driver interface, aren’t the clever Linux hackers smart enough to come up with an answer? Oh, lookie here, it is possible to come up with a stable driver interface for Linux.

Linux is a religion. There is only one License, and It’s Name is GPL.

Doug Hass points out that the lack of stable Linux driver interface is idealogically driven “Translation: if you choose to use a closed source driver, the terrorists win, you hate poor people, and are headed to eternal damnation.”

I’d contend that Linux’s development model is so broken that it’s precisely why IBM chose to back it, since they realize they can offer a migration path onto AIX when chagrined customers find their servers keep crashing due to driver issues. Hello kernel developers, you are being pwned, and you still refuse to acknowledge the reality of it.

Realize this: drivers take time to mature and become stable in themselves. Pushing implementation-level changes into the driver does nothing to improve it’s stability.

It’s time that Linux kernel developers abandon dogma and start experimenting and adapting. Linux doesn’t have to be a religion. It can free itself to scientific inquiry - e.g. run a stable driver interface for a subset of hardware - and judge the results for themselves.

References:
Will Linux Ever Make it to the Desktop?
I’m OK, You’re Not OK
Linux DDK
API problem much worse than expected
Ian Murdock - It takes a Better Engineer

Counter arguments:
Good Drivers Live in the Tree

Styles of programming

Friday, June 1st, 2007

(Sorry for blogging in too technical terms, I’ll clean this up when I have time later)

Sean McGrath suggests that the audience for computer programs should be people, not computers.

I was most struck by the reference comparing novel writing to diary writing. I would like to add a third style, screenplay writing. OO programming and in particular event-based-programming is most reminiscent of screenplay writing, where it is not possible to see what the linear story line is. With all the state being held in the class, it can get very difficult to decipher what is going on.

Imagine reading a book in three chapters that go like this:

Chapter 1 (Romeo)
    when seeJulietMotionless: kills himself
    when seeJulietOnTheBalconey: climbs in
    ....

Chapter 2 (Juliet)
   when forcedToMarrySomeOneElse: ...

Chapter 3 (main)
   while (nextEvent)
  {
     juliet.processEvent(nextEvent)
     romeo.processEvent(nextEvent)
  }

Activity diagrams can help somewhat, but this is usually done on the documentation side. How do programmers express the intention behind the original program when OO enforces separation of concern, but flow of control is effectively very hard to and express.

Things are coming a full circle of course.

  • RhinoScript aims to make web programming simpler by making the program logic run linearly. In spite of the callbacks.
  • Enumerators / co-routines is another technique available

Possible future evolution:

  • RhinoScript for message queues - again the idea is to linearize the storyline
  • RhinoScript for windows forms - this is a hard one. Everytime I read a convoluted VB program, it usually has to do with event based programming. The language of the future will provide native support for messaging between threads. for example, I would like to be able to write this without worrying about tying up the windows message loop.
    
      cancelButton.enable(new CancelledEvent())
      try {
        while (! object.doSomething1())
        {
            object.doSomething2()
            :
            :
            :
            :
            :
    
        } catch (CancelledEvent e)
        {
           ... clean up ...
        }
     }
    
  • A story style of programming should give less concern to exceptional conditions. I haven’t used SmallTalk but my understanding is that it drops messages it doesn’t understand. Perhaps AOP can help here, by crosscutting exception handling into the code, or perhaps more intelligent editors can mask out exception handling so that the program flow can be better understood?

I dont’ believe declarative programming is going to solve this little problem. The problem is declarative programming.

Clustered Hosting Solutions

Wednesday, March 14th, 2007

It seems like 2007 is going to be year when web hosters move en-masse to virtualizating their shared-server. In the past, virtual hosting is simply the practice of hosting lots of websites on a single box, and a web hoster may run tens of boxes.

With clustered hosting, every website is served by every box, using a fast SAN. This means that for shared-nothing applications like PHP, your website is able to take on bursts of traffic without everybody else on your box suffering from a “bad neighbour” effect. 10 years ago, one only has to worry about slashdotting. However, with social media on the rise, websites are just so much more susceptible to traffic surges.

Offerings:

Please leave a comment if you are aware of other hosters. Thanks.

Platforms: AppLogic Amazon EC2
Source: Netcraft - Price Competition in Grid Hosting