Export Firefox sqlite database to URL files

I wrote a new php class that exports my firefox sqlite favorites database to a set of url files.

Which works great. The handy thing is that I now can version this set of url files (using subversion, git and / or clearcase) and also export out of chrome (using sync.sqlite3) and internet explorer to the same directory. I can also manually drag and drop stuff and, if the version system does its work they all should in sync (e.g. clearcase keeps track of moved files AND directories since each element gets a unique id just as in git where a file gets a unique hash).

So you see the list of folders:

image

and inside each folder the list of bookmark URL files:

image

and inside each of the URL files:

image

 

 

 

For the favicons I decided to not put them in the main structure but in a favicon folder (yeah I really like pharaos)

image

I can now just doubleclick such an icon and it opens in my browser, so it works pretty much as if I would open my bookmark folder with the difference that I can use total commander to manage them and that my browser does not carry around  tens of megabytes of urls in its database :) I can now manage my urls offline and independent of browser and well… carry them around.

I think i will make one main dir "home" and one main dir "work" for starters a directory per contact is the next step.

I have some TODO points on my list though which I should add:

a. The dividers could be easily also created (using — , —-, —–, —— ) but the problem is what happens when I re-run the export. In this case I should check if it exists and if it exists it will do nothing. not nice. I could also check for the firefox database id but that would make it rely too much on Firefox. I have to think about this I think I will delete all dividers IF the directory is encountered and then recreate them with the logic that if a source has the same directory name it probably knows about those dividers too. (to bad there is no divider support in native Windows Explorer).

b. I need to add export functionality as well to a variety of export sources e.g. a firefox html page that I can export in a clean firefox

c. the physical storage has some dependencies e.g. when i delete a bookmark also its favicon should be cleared

d. i would like more representations of the data on there just physical on the disk e.g. com.google. so in the directory com i want all com domains i bookmarked

e. i also want a representation per tag so i can compare if the way i stored them is in line with the way everyone stores them this probably means i will have to query e.g. delicious, stumbleupon, alexa, dmoz and zillions of other tag bookmark systems for the most used tags (AND related tags) and check if they way "they" propose it is in line with how i would want it.
That point brings up a million thoughts: why not just download all urls out there already nicely stored… and as soon as I tag it, it will just become highlighted in the millions of urls…. a gazillion things i could do here. but i don’t have the time :)
I was also thinking: why not only tags but check out other taxonomies as well… something todo in the future.

f. i also want to add some additional tooling like ‘check for duplicates’ and/or ‘top tags’ (to see what i find interesting)

g. i also need to add more import sources also e.g. a sql dump from my weblog and auto-get all urls that i ever put in there.

h. when all of that is finished i want to wrap it in a nicer webshell with multiple representations and tools.

i. there is a workflow to follow ofcourse, after exporting preferably all urls are cleared from the bookmarks and in the next run only the "\todo" folder in the bookmarks should be exported or something like that.

j. what i would really like is a sidebar in firefox that is actually a view on my directory of URL files… but what i can do anyway is drag the folders containing the url files in the firefox bookmark bar so that i can still easily "click"

h. directories also contain descriptions. I will probably add an descript.ion file in the folder itself.

p.s. no i dont follow the complete offical URL file standard

(new) WordPress Custom Posts and Taxonomies and custom fields and URL design: I’m lost!

A lot has already been written about URL/URI design, the new WordPress 3.0 enhance support for Custom Posts and Custom Taxonomies so I leave it to the reader to "Google-Upon" it to find out out more about it.

The new WordPress additions bring WordPress a step closer to a real CMS system. A step closer since it will bring many new WordPress applications enabled by these new options but it also makes it apparent that it is really in the core a system for building weblogs.

In CMS systems very roughly spoken we have something called nodes which is a very generic "thing" often tied to a unique URL. So /animal/goat is the node about a goat.

To be honest I’m currently totally LOST with regards to the architecture behind WordPress with these new custom posts and taxonomies enhancements.

I just take a little outtake on what’s now present:

image

So… the core thing I don’t understand:

- why is there the difference made in all these types of content while they could have just become the same thing.

In the example above you can decide to make a custom taxonomy for movies e.g. if you write 100 posts about startrek1 an 100 posts about startrek 2 you could make a term "movie" and a taxonomy "startrek1" and "startrek2" to label your posts with. When you then type /movie/startrek1 you get all posts about startrek1. Sadly when you type /movie you do not get all posts where you used a taxonomy related to that term.

In the example above you can also decide to make a custom post type "movie" you can then write a custom post of type movie called "startrek 1" and the benefit is that with /movie/startrek1 you have unique url to your startrek1 postingS(!) and with /movie you get all your movie posts. Nice for url hacking.

In the example above you can also decide to make a custom field "movie" to tie to your e.g. blogpostings. So you write a post about startrek1 and then fill you custom field "movie" with "startrek1". In this case you do not get any url support…

Is there any difference here what you try to achieve from the point of the social graph/urls? Nope.

Reading through the forums you can now e.g. also see questions about comments. When you tie in buddypress you even could say that /movie/startrek1 could involve your forum or group you wanted on that page.

In short: both in the database storage and is the url representation of "objects" and/or "taxonomies" and/or "whatever" I don’t understand which way all of this is heading.

Because in my viewpoint all of this is much simpler when I look at it from the viewpoint of a url (and url hacking). An endpoint of a url is simply a node, whatever on that node is. It "represents" something that could just as well be the "term".

I think that (custom) fields, (custom) posts, (custom) taxonomies are all of the same. If I would be a facebook person I would say that /movie/startrek1 is an endpoint for everything around that movie. What I would want is say /contact/edward.de.leau is the node for "me". and /contact/edward.de.leau/addresses shows then all my addresses. While /address/mystreet_63_amsterdam shows my address and /address/mystreet_63_amsterdam/contacts shows all persons living on that address.

I would expect not only url handling doing this (so everything is clickable) but also the database structure underneath e.g.:

image 

And If I would become even more madder I could even go as far as to say:

image

Taxonomies versus Custom Post Types example 1

Suppose you have –whatever- relational database (just pick one) and you want to represent the records as custom-post-types of type <table_name>. No problem. Just loop through the list of tables and add them as custom post types (register_post_type(‘<table name>’,$args);). The fields of the records then become custom fields which you attach to this custom post type via e.g. a nice meta box on the right hand side.

So /wp_term_relationships/record_1 would display the contents of that first record another one would be /wp_posts/recorcd_23

The problem is now how you are going to represent the relations between the tables (hence relational database). What you want to indicate is e.g. that with 1 record within a table there are 3 related records in another table e.g. wp_term_relationship –> wp_posts. A nice thing is that /wp_term_relationships shows all records of that table.

Laying the relation now is more difficult since you can not lay relations out of the box. You have two choices here. Since we have two relations outgoing out of the custom posts types either via custom fields or via a custom taxonomy.

1. via custom fields you would have to code a custom field that does a query on all custom post types of a certain record and then pick the ones that are related. In the gui you then would have to make them links to the correct custom post type object of the related item.

2. via a custom taxonomy you could create a custom taxonomy per record e.g. wp_term_relationship_record_1 and then "tag" the records that are related with this custom taxonomy.

In terms of work it is the same: you have to go to the specific record and then click either the custom field relations or click the custom taxonomy entries.

The advantage of the taxonomy direction is that it becomes a hyperlink and brings you to a record 1 page where it shows all related records. Unfortunately this is not the custom post record 1 page, so you have to code something for every entry to show to custom post on top (or something likewise).

The advantage of the custom fields is that the items will be more or less more easier to select via your own written selection system. The disadvantage is that they will not be clickable in the gui. So you will have to write that link to the custom post type object yourself AND you will have to add a loop on there yourself showing all related records.

So in this example:

/wp_posts/  : shows all records out of the box IF using custom fields otherwise shows nothing if using taxonomy
/wp_posts/record23 : shows the record23 posting but the related records you will have to add yourself (when using custom fields) or visa versa: shows all related records but not the custom post object of record 23 itself (when using custom taxonomy).

I don’t know yet which is the easiest way.

One in-the-middle-solution could be to also "tag" the initial record with the custom taxonomy of the relation and then define that one to be "sticky" so that it appears on top and when coding against it so you know which one is the "from" and which ones are the "to".

uitgebreid kranten archief online : waarschuwing, verslavend!

Ik ben gek op de online archieven van kranten, tijdschriften e.d. Vandaag las ik over http://kranten.kb.nl :1 millioen pagina’s online en … genoeg voer voor mij om er de komende maanden in rond te dwalen. Natuurlijk als eerste op zoek naar wat bekende nieuwsfeiten maar ook eens gewoon "de leau" ingetypt en dan komt er toch nog redelijk veel terug in de zoekresultaten, onder meer resultaten als hierbeneden.

Vooral ook advertenties zetten goed een tijdsbeeld neer.

image

image

Maar daarnaast gewoon het rondsurfen zelf en gewoon de echte geschiedenis over je heen laten komen zonder dat deze samengevat is in een geschiedenisboek is een waar genot, klik, random, uit de krant van 1808. Vooral de gedetailleerde berichtgevingen over wat er zoals gebeurde van dag tot dag rond de tijd van Napoleon is een eye opener het brengt het allemaal zoveel dichterbij. Je wordt veel meer op de feiten gedrukt over wat er van dag tot dag gebeurde en hoe veel meer complexer de wereld was dan simpelweg samengevat in een alinea.

image

Het werkt mooi als je in wikipedia iets aan het lezen bent, laten we zeggen de pagina over Willem 1 :

image

 

 

 

Je leest dan dat op 25 augustus 1830 een opera opvoering in Brussel uiteindelijk leidde tot een onafhankelijk Belgie. Je zoekt dan onder geavanceerd:

image

en kunt dan in allerlei kranten rustig, alsof het event zich vandaag afspeelt en je er middenin zit, de gebeurtenissen van dag tot dag volgen. De eerste paragraaf verwijst waarschijnijk naar de Telegraaf van die tijd *grin*

image

image

fantastisch naslagwerk en waarschijnlijk een site die ik wekelijks ga napluizen over het het nou ECHT zat. Want daar zit hem ergens de meerwaarde van dit archief. Je komt heel vaak interpretaties en samenvattingen van gebeurtenissen tegen vanuit een bepaalde optiek, maar juist dit soort bronnen brengen je vrij dicht tegen "het complete verhaal aan" wat veel vaker een niet zo simpele one-liner is maar een complex geheel van gebeurtenissen en meningen die een bepaalde kant opgaan en die soms bepaalde patterns vormen die je veel vaker tegenkomt. (fans van Hari Seldon weten over welke ultieme droom ik het heb)

En dat brengt me ook opeens op de gedachte vanuit hoeveel wikipedia pagina’s 1 specifieke kranten pagina wel niet gelinked zou kunnen worden (per woord, per geschiedkundig feit, per persoon, enz…) , laat staan 1.000.000.

dus… het zou mooi zijn als de KB ons collectief deze pagina’s zou laten annoteren met tags of andere taxonomieen. Gewoon simpelweg een tagbox eronder, dat opent dan weer mogelijkheden voor auto-linking op basis van thema’s en integratie met "alles is gelinked met alles".

Wat we daarna kunnen doen is op basis van die gelinkte informatie proberen om patterns te kunnen auto detecteren zodat we wellicht geschiedenis meer op mathematische manier kunnen benaderen.

Het idee daarachter is dat we nog niet echt snappen hoe 1 mens werkt maar… we door middel van psychologie, cognitie wetenschappen e.d. wel tot een bepaalde percentage van begrip kunnen komen.

Groups dynamica e.d. beschrijft patronen over hoe dan groepen van mensen bepaald gedrag vertonen en natuurlijk is dat verder scalable langs andere wetenschappen inclusief de invloed van cultuur en tijdsinvloeden (als die al niet te generalizern zijn en als het ware uit de equation te filteren zijn).

Als we geschiedkundige patronen kunnen matchen met features en patronen gevonden hebben in gedragswetenschappen dan moeten we volgens mij een eind kunnen komen in een nieuw te vormen wetenschap. en hoeven we niet te wachten tot 12.000 GE tot een meneer Seldon dat voor ons uitwerkt.

Ik heb ooit gevraagd aan een "geschiedenis prof" of het ooit mogelijk zou zijn om geschiedenis "mathematisch" te benaderen d.m.v. pattern based behavior recognition … hey…

image

Hey…. i´m the first person in the world mentioning pattern based behavior recognition! (PBBR) !

Cool… I have a new hobby: the science of pattern based behavior recognition, which university offers me a life-time job?

Basically it is cognitive science (which already is a multi-disciplinary science) with the addition of history. So that you don’t need to create models / do tests / experiments but do pure feature detection on historical records and verify the outcome with behavior already happened. So if a group of people in a certain condition always behaves in a certain way we could verify that with historical records. The goal is to automate these detections and not longer have historians perform "parallel behavior search and write article about it".

It also allows influencing of groups of people on a massive scale by just fine tuning some parameters and has probably overlap on a higher level with the science behind politics and marketing although it takes a more holistic approach.

(while typing this I read Is marketing science? really interesting read)

New Version of Windows Live Writer but not the WLW Wave 4!

There are about 10 applications out there which I use A LOT. One of them is Windows Live Writer, the blogging tool from Microsoft. There haven’t been updates for some time though.

Today I got a message that a new version is available:

image

Hurrah! However… when I click for more information I end up at a blog which seems to be "not updated" anymore since 2009. It also seems to be very hard to get any information on the release notes or "what’s new" in it.

What actually happened to WLW? Did Microsoft abandon it?

When I download the installer it tells me an error occurred and I need to download the installable download, it’s 135 Mb, so let’s see if it has any other version number. Currently I am running:

image

 

 

 

The installable is from 24-05-2010 but that is probably because it takes the date of download, let’s check what’s in it.

The first thing I notice is that there IS a new copyright statement dated March 2010. I know I should read all of it but it is simply too long to go through. I bet there are quite a lot of websites which have the manpower to go through each and every change and make a nice story about it but uh… i don’t.

I just read quickly through it and well… I agree. I understand there is no warranty and I should not spam. I understand that they want to auto update it.

image

There is a lot to install but I don’t want messenger (I never message), I don’t want mail (unless it syncs contacts with my PocketPC and hotmail account, need to dive into this), I don’t want photo gallery since i use my own photo app, I dont want the toolbar since I have already one million toolbars cluttering my browser, I don’t want family safety since I dont know what it is, I don’t want Microsoft office live add-in since I use mainly Symphony.

It is however nice to see the complete subset of live offerings and it is nice to see the components that are also installed with it since it is a checklist of things I have to dive into when I have time.  It now needs 25,7Mb for installation which ok.

ow… I now need to close WLW to continue with the installation… :) so Let’s close this and then continue my post…

… continue … unfortunately… : the copy right says 2009 …. :(

image

And as you can see… I DO actually have a newer build…but still version 14 and not 15 …. and I now have "en" instead of "nl"… oops.

image

 

 

 

 

 

Let’s check if this project is still alive…

http://help.live.com/help.aspx?mkt=en-us&project=WL_Writerv3&querytype=keyword&query=qaf
the help does not give a link to release notes

http://windowslivewriter.spaces.live.com/
the help points to the developer team link but that just links to the blog on which no new updates have appeared since july 2009.

On the forums: http://social.microsoft.com/Forums/en-US/writerbeta/thread/5a771be8-e28e-4c55-b9f5-1d9e6bde95d7 I read something about "Windows Live Wave 4" … hmm… never heard of it. Let’s do some search on this.

Hey… now I find the http://www.live-writer.net/  windows live writer weblog with also some news on Windows Live Wave 4.

Some differences on the upcoming version and this version: http://www.lehsys.com/2010/02/windows-live-writer-2010-just-minor-updates/

and some screenshots: http://www.live-writer.net/2010/02/21/new-wlw-wave4-screenshots/

Ah … well… let’s wait some more :)