partiallydisassembled.net

Clean tax

2007-02-12 13:40:32

I propose a carbon tax. Any politician, be they Liberal, Labour, Republican or Democrat, who mentions "clean coal" without a clarifying adjective such as "poisonous", "polluting", "environmentally unsound", "unproven" or "non-existant" must pay a tax. The amount they pay must be in proportion to the number of voters they mislead, and to the amount of "real" carbon combusted by said voters who gain false hope that such a technology will save their children. It shouldn't take a left-wing liberal pinko to see that pumping a poisonous gas into the ground or ocean is a somewhat risky environmental move, considering how well we went with merely dispersing it into the atmosphere. Thankfully Howard has clarified his position: he will not support any emissions scheme "that burdens our industries whilst allowing others that are less efficient and greater polluters to get competitive advantage." Thank goodness!

Schema-less database

2007-01-28 19:17:39

This is a half-thought-out idea. If it's been done before, please tell me so I can see how it ends ;-) The problem with relational (and object) databases is that the schema is relatively fixed. The reasoning is that business processes change far more frequently than storage needs. I've always thought that this is backwards, and that altering table schemas is the biggest, hardest, most common refactor to apply to an application. Forget everything you know about relational (and object) storage. The world consists of *items*, which have *properties* and optional *values*. Nothing groundbreaking here. A property's name is in fact another item (allowing for metadata on properties), as is the optional value. A relational mapping might look like: TABLE items item_id (PK) data (string) TABLE mappings mapping_id (PK) item (FK -> items.item_id) property (FK -> items.item_id) value (FK -> items.item_id or NULL) A book might be an item with properties: 'book' (no value, just tells us that it's a book) 'title' (data of value is string, or properties of value could break into subtitle, etc) 'author' (data of value might be unused; value has properties like 'name', 'address') Some operations you can perform on a populated database: # set of all items I() = SELECT item_id FROM items I, mappings m WHERE I.item_id = m.item # set of all items with an 'author' property I(author) = SELECT I.item_id FROM items I, mappings m, items P WHERE I.item_id = m.item AND P.item_id = m.property AND P.data = 'author' # set of all items where 'author' = 'Alice' I(author='Alice') = SELECT I.item_id FROM items I, mappings m, items P, items V WHERE I.item_id = m.item AND P.item_id = m.property AND V.item_id = m.value AND P.data = 'author' AND V.data = 'Alice' # set of all items where author's first name is 'Alice' I(author.firstname='Alice') = SELECT I.item_id FROM items I1, mappings m1, items P1, items V1, mappings m2, items P2, items V2 WHERE I1.item_id = m1.item AND P1.item_id = m1.property AND V1.item_id = m1.value AND P1.data = 'author' AND V1.item_id = m2.item AND P2.item_id = m2.property AND V2.item_id = m2.value AND P2.data = 'first' AND V2.data = 'Alice' # set of values of 'title' property on all items I().title # titles of all books by Alice I(author='Alice').title # books I have purchased I(purchaser='Alex') # other books by authors I have purchased I(author=I(purchaser='Alex').author) # other purchasers of books by authors I have purchased I(author=I(purchaser='Alex').author).purchaser # books that were purchased by people who've bought a book by an author # of a book I've purchased (recommendations) I(purchaser= I(author= I(purchaser='Alex').author ).purchaser) Properties can also be treated as user-added tags... # set of all properties P() = SELECT item_id FROM items P, mappings m WHERE P.item_id = m.property # set of all properties defined on all items (excludes orphaned # properties) P(I()) # set of all properties defined on items that have a 'howto' property # (other tags of howtos) P(I(howto)) # set of items that have at least one of the properties defined on the set # of items that have a 'howto' property # (items related to howtos by their tags -- e.g. catches items tagged # "howto", "guide", "faq", "manual") I(P(I(howto))) Metadata on properties: # Set of all properties of type 'int' I(P(type='int')) # Properties that are actually tags P(tag) # More thought on notation needed here.. # Meta-properties can form property hierarchies. Some consequences: * There is no schema defined in the database. Applications define their own schema and redefine it at any time. * Applications can share data within a database, so long as their are no name conflicts. "Well-behaved" applications could add an "application" property to items and meta-property to properties to effectively isolate the items and properties they are responsible for. * Properties and meta-properties make generic database explorers/editors very feasible. Related-tag queries shown above can be used to suggest properties to add to items. * "Everything is a set" may be limiting, but I suspect not. * Store information in "data" attribute or in separate properties is a confusing point, similar to XML element/attribute problem. Can't see any solution -- convention would be that if an item has properties, 'data' is a summary (a sort of __repr__). * The generated SQL queries get quite large very quickly. Can be simplified if meta-properties are not supported (3-table model). Or, perhaps a relational backend should not be used. Some possible extensions: * History / version tracking easily applied to Items table. Notation could become quite a bit more complex though. * Explicit typing can also be applied to the 'data' property of Items table. A convention of appending type name to property name would simplify notation: I(user.birthday_date=today) * Explicit user / access controls. Could also be applied using properties, but limiting access to the acl properties is problematic.

Richard writes:

Interesting idea. I'd have thought a big consequence would be speed due to the lack of indexes. I(author) isn't valid Python (unless author is actually a declared variable but that's not what you mean in your example). You could use I(author=all) (where "all" is the builtin) The Roundup "anydbm" hyperdb backend is actually kinda similar to this except that it does have a schema. It doesn't really need one (the schema can change on a whim with no impact on the data stored in anydbm) except that the schema enforces property types and also allows many-to-many relationships (though there's no reason your proposal couldn't hande M2M). Since it also lets you define link and key properties you get indexes.

Andy Todd writes:

See RDF ;-) The problems with this approach are; - How do you summarise information in your database (counts, sums and other calculations)? - How do you manage change? If you want to change a database schema it is (as you rightly point out) a hard job and has to be applied to all of the instances (rows) of a class (table) or none. With this hybrid approach when you want to change something you can have instances of nominally the same class with different attributes, which can confuse. - Speed. But Richard's got that covered.

Alex writes:

Cool, thanks for the feedback. At this stage I'm just curious if I'm touching on prior art (scouting for PhD ideas). Had thought about RDF, but I think this is sufficiently different to warrant its own investigation. Specifically, I'm _not_ encoding any information about the type of relationship, or trying to infer anything about relationships, within the DB, which is pretty much the point of RDF (AFAICT). Re: notation errors: was not intended to look Pythonic, it just happened that way ;-) Re: speed: firstly, I'm not yet saying a relational backend is ideal, and there may be ways to index and store data in this format more efficiently. Even with relational, the "data" field can be indexed (either for short strings, i.e. property names only, or full-text search), as can all the id fields, so I can't see that each join or select is going to be any slower than a relational db, besides having more joins. Since everything's a set, aggregates are easy (both in notation and to compute): number_of_books = count(I(book)) Ok, it's obviously not fully thought through yet. I suspect I can get around the change issue completely though. No schema, not even implicit. The application defines and uses the properties and items it wants to. If an item doesn't conform (checked on access), then it's not returned in the query, so (to the application) doesn't exist. Perhaps namespaces should be more prominent, but I'd really like to avoid the idea of a "class", "table", etc. If an item has all the required properties, why shouldn't it be treated as that class? (If you disagree, you're not a Python programmer ;-) )

Peter writes:

sounds a lot like XML to me

Aaron Sorkin Character Development 101

2007-01-25 21:03:40

Include one young, attractive, skinny, female secondary character with the appetite of a sumo wrestler in training and capitalise on the comedy of this in every scene they are in. Ensure plentiful supply of donuts, waffles and salad additives.

writes:

Have you got Studio 60?

Rachel writes:

that was me.

New layout structure and algorithms

2007-01-14 21:34:02

The last 4 refactors of the layout package didn't satisfy me, I had to rewrite the whole thing one more time to achieve true happiness. The reasons for this rewrite (not just a refactor, most of the classes and algorithms are all-new) were to allow for changes to the document tree at runtime, specifically changes to element style, specifically, support for the :hover pseudo-class. The new layout package drops Boxes: the Frame tree is constructed directly from the Content (document) tree, and the Style tree is constructed at the same time. Frame continuations are reconstructed in response to resize events, but are otherwise unaffected (the previous implementation reconstructed the Frame tree from the Box tree after a resize). The content tree is created incrementally from a source file, allowing for the possibility of incremental load/display (useful for loading web pages off a network, but not much else). Changes to the content tree, including changes to an element style, such as the addition of a pseudo class, are bounced off the Document onto all DocumentListeners. The only current DocumentListener is the DocumentView (sounding famililar?), which then ensures the frame tree is kept up-to-date. Only affected frames are reflowed (descendents and ancestors, no siblings or cousins), but there is not yet any optimisation for avoiding reflow when the size is unaltered. This won't be impossible to add though. **Update:** It does the optimisation now. Thanks to the style tree, memory usage should be way down, as style is shared between frames where possible. Computed style is also cached: on the style node (hence shared) when possible, or on the frame if not. All these changes happen to make for faster construction, faster reflow and faster drawing than before: * 7-12x faster document construction * 80-150x faster resize reflow (really!) * 100x faster drawing (0.03ms for the xhtml.py example now -- still without viewport culling) I also implemented a far better HTML content constructor that infers missing begin and end tags according to a sort of DTD. I'll try to document this in the next week or so as I think there are some pretty interesting algorithms involved.

Relearning trombone, day 1

2007-01-12 20:53:17

After six or seven years in a cupboard: * both trombones are in good condition, slides needed no work, surprisingly * biggest difficulty is in finding center of pitch; have lost ear for each harmonic * and also in attacking cleanly * range at the beginning of the day was low Bb to Bb above it * at end of day (30 mins of practice, in 2 blocks) from pedal F to double-high F (but I can't pitch them out of thin air yet) * left wrist/thumb is very sore Alto sounds better than tenor (probably the smaller mouthpiece), but I'll probably leave it alone for a few weeks until I have things under control with the tenor.
login