Importing Delicious Bookmarks Into Drupal, or the Cloud is Disappearing

Error message

Deprecated function: Function create_function() is deprecated in GeSHi->_optimize_regexp_list_tokens_to_string() (line 4736 of /var/www/zeitkunst.org/htdocs/sites/all/libraries/geshi/geshi.php).
"Many":http://www.readwriteweb.com/archives/deliciouss_data_policy_is_like_setting_a_museum_on.php have "written about":http://www.salon.com/technology/silicon_valley/index.html?story=/tech/dan_gillmor/2010/12/17/yahoo_shuttering_bookmarks_service the impending demise of "delicious":http://delicious.com, one of the first so-called "Web 2.0" companies, and one of the few cloud-based services that I actually use. delicious was useful because of its simplicity, with a pared-down interface, clean bookmarklets or browser plugins, and the ability to easily tag new bookmarks. I used to use it extensively to find new sites, but stopped doing that a few years ago. These days it was basically a way to store bookmarks in a place that was accessible anywhere I had an internet connection. But as we have expected for a while, and are witnessing at the moment, the cloud is disappearing, its solidity nothing but a mirage. We are in for trouble in the future, methinks, because so many institutions, "including universities":http://www.hastac.org/blogs/nknouf/wikileaks-broadcast-internet-and-importance-new-media-assemblages, have moved their online activities from local hosting to this mythical "cloud". The point of this post is not to go into the deeper issues here, but to explain, in some detail, how you can setup a delicious-like interface on your own server. I'm going to be using "Drupal":http://drupal.org, one of the most well-known free software content management systems (CMS). Doing this requires a certain amount of technical ability, and I won't be able to go into that here. But if you're interested, read on. h2. Creating the content type While there are Drupal modules for creating "weblinks":http://drupal.org/project/weblinks, in my brief testing I found them to be both too cumbersome, and too limited for what I wanted to do. In any event I wanted to see how easy it might be to do this using the tools Drupal provides. Turns out that we can easily create a Bookmarks type using CCK. The two additional fields I created are for the "URL" and a "Post Date" (so that we can ensure we have the correct dates for the bookmarks when we import our data below). The first is of type "text" and the second is of type "date". The exported CCK code can be found below. *Update*: By using the "Unique Field":http://drupal.org/project/unique_field module, we can require that the URL is unique, redirecting you to the existing bookmark if desired. This helps prevent posting of identical bookmarks, and might remind you to look through your bookmarks if you keep saving things that you've already saved...not that that just happened to me. h2. Taxonomies I decided to create a separate vocabulary for my bookmarks tags, and I thus edited this vocabulary to allow editing by the Bookmarks content type. Make a note of the vocabulary ID, as we will need this later. h2. Services To easily import our bookmarks, it's best to use Drupal's "Services":http://drupal.org/project/services module. Using services you can add nodes, get taxonomy items, and so on using XML-RPC or JSON. Setting up and configuring Services is beyond the scope of this post, but the documentation is actually fairly good. h2. Delicious import Next, we need to import our bookmarks. To begin, we need to _export_ our bookmarks from Delicious. The code to this, on Mac OS X or Linux, is:
curl https://<username>:<password>@api.del.icio.us/v1/posts/all > bookmarks.xml
where <username> is your Delicious username and <password> is your Delicious password. Your bookmarks will then be downloaded and saved in the @bookmarks.xml@ file. We're now ready to parse this XML file and post it to our new Bookmarks content type. I wrote a dirty python script called @parseDelicious.py@ to do this; you can find the link to it at the bottom of the page. It requires the @lxml@ library for @etree@. The script is probably highly inefficient, especially in the parts that check whether or not existing tags are in your Drupal taxonomy. The script requires a number of options that are explained by calling it with the @-h@ switch. To run:
python parseDelicious.py -s <server> -u <username> -p <password> -v <vid> -f <bookmarks xml>
Here <server> is the hostname of the Drupal server (*without* @/services/xmlrpc@), <username> is the username of a user on the Drupal server, <password> is your password, <vid> is the numeric vocabulary ID of the desired vocabulary for your Delicious tags, and <bookmarks xml> is the path to the XML file just downloaded. Depending on how many bookmarks you have, this might take a _long_ time. h2. Creating a View of your Bookmarks Next, let's create a simple view that will give us our bookmarks in a format that is close to that of Delicious. Here we use the Views module, as well as Semantic Views to allow us to style the resulting data. I've designed the view to make it easy to style in CSS. Basically the logic is that we pull the node title, the CCK post date, the CCK URL, the node body, and the taxonomy terms. We then link the node title to the CCK URL, create useful classes on each element, setup a pager, and set our CSS classes. The is at the bottom of the page. It is also possible to create a tag cloud of all our bookmark tags using the Tagadelic module. h2. Creating a bookmarklet for posting Finally, we can create a simple bookmarklet for posting to your site. To do this we need the "Prepopulate":http://drupal.org/project/prepopulate module. The bookmarklet code below is modified slightly from the example given in the documentation for the module. notextile.
javascript:u=document.location.href;t=document.title;s=window.getSelection();
void(window.open(%22http://example.com/node/add/bookmarks?edit[title]=%22+escape(t)+
'&edit[body_filter][body]='+escape(s)+'&edit[field_url][0][value]='+escape(u),'_blank',
'width=1024,height=500,status=yes,resizable=yes,scrollbars=yes'));
Simple change "example.com" to the name of your host. The names of the values are based off of the content type we created above. You can see the bookmarks that I've imported from Delicious, with minimal CSS formatting, "here":http://zeitkunst.org/bookmarks. h2. Where's the social? Of course this method means that we only host the bookmarks on our own site, thus losing all of the social capabilities of delicious. The challenge for the future, given the disappearance of the cloud, is how to store data locally, but be able to access it distributively. There is no reason that something like Delicious could not be implemented using a "distributed hash table":https://secure.wikimedia.org/wikipedia/en/wiki/Distributed_hash_table, thus preventing any one company from being a source of failure. It seems like it would be possible, then, to write a program that would store bookmarks as hashes in this table, and then build on top of that the necessary metadata such as tags, users, groups, etc. This would parallel the development of "thimbl":http://www.thimbl.net/, using as well public-key encryption to provide security. But this is a project for more than a single day; if you are interested, please let me know. h2. Code h3. Bookmarks Content Type notextile.
$content['type']  = array (
  'name' => 'Bookmarks',
  'type' => 'bookmarks',
  'description' => 'Trying to be a delicious replacement...',
  'title_label' => 'Name',
  'body_label' => 'Notes',
  'min_word_count' => '0',
  'help' => '',
  'node_options' =>
  array (
    'status' => true,
    'promote' => true,
    'sticky' => false,
    'revision' => false,
  ),
  'upload' => '1',
  'pingback' => '0',
  'community_tags_display' => '1',
  'old_type' => 'bookmarks',
  'orig_type' => '',
  'module' => 'node',
  'custom' => '1',
  'modified' => '1',
  'locked' => '0',
  'content_profile_use' => 0,
  'comment' => '0',
  'comment_default_mode' => '4',
  'comment_default_order' => '1',
  'comment_default_per_page' => '50',
  'comment_controls' => '3',
  'comment_anonymous' => '0',
  'comment_subject_field' => '1',
  'comment_preview' => '1',
  'comment_form_location' => '0',
  'ant' => '0',
  'ant_pattern' => '',
  'ant_php' => 0,
);
$content['fields']  = array (
  0 =>
  array (
    'label' => 'Post date',
    'field_name' => 'field_post_date',
    'type' => 'date',
    'widget_type' => 'date_popup',
    'change' => 'Change basic information',
    'weight' => '-4',
    'default_value' => 'now',
    'default_value2' => 'same',
    'default_value_code' => '',
    'default_value_code2' => '',
    'input_format' => 'j F Y - g:ia',
    'input_format_custom' => '',
    'year_range' => '-3:+3',
    'increment' => '1',
    'advanced' =>
    array (
      'label_position' => 'above',
      'text_parts' =>
      array (
        'year' => 0,
        'month' => 0,
        'day' => 0,
        'hour' => 0,
        'minute' => 0,
        'second' => 0,
      ),
    ),
    'label_position' => 'above',
    'text_parts' =>
    array (
    ),
    'description' => '',
    'group' => false,
    'required' => 1,
    'multiple' => '0',
    'repeat' => 0,
    'todate' => '',
    'granularity' =>
    array (
      'year' => 'year',
      'month' => 'month',
      'day' => 'day',
      'hour' => 'hour',
      'minute' => 'minute',
      'second' => 'second',
    ),
    'default_format' => 'medium',
    'tz_handling' => 'site',
    'timezone_db' => 'UTC',
    'op' => 'Save field settings',
    'module' => 'date',
    'widget_module' => 'date',
    'columns' =>
    array (
      'value' =>
      array (
        'type' => 'varchar',
        'length' => 20,
        'not null' => false,
        'sortable' => true,
        'views' => true,
      ),
    ),
    'display_settings' =>
    array (
      'label' =>
      array (
        'format' => 'above',
        'exclude' => 0,
      ),
      'teaser' =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      'full' =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      4 =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      2 =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      3 =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      'token' =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
    ),
  ),
  1 =>
  array (
    'label' => 'URL',
    'field_name' => 'field_url',
    'type' => 'text',
    'widget_type' => 'text_textfield',
    'change' => 'Change basic information',
    'weight' => '-3',
    'rows' => 5,
    'size' => '60',
    'description' => '',
    'default_value' =>
    array (
      0 =>
      array (
        'value' => 'http://',
        '_error_element' => 'default_value_widget][field_url][0][value',
      ),
    ),
    'default_value_php' => '',
    'default_value_widget' =>
    array (
      'field_url' =>
      array (
        0 =>
        array (
          'value' => 'http://',
          '_error_element' => 'default_value_widget][field_url][0][value',
        ),
      ),
    ),
    'group' => false,
    'required' => 1,
    'multiple' => '0',
    'text_processing' => '0',
    'max_length' => '',
    'allowed_values' => '',
    'allowed_values_php' => '',
    'op' => 'Save field settings',
    'module' => 'text',
    'widget_module' => 'text',
    'columns' =>
    array (
      'value' =>
      array (
        'type' => 'text',
        'size' => 'big',
        'not null' => false,
        'sortable' => true,
        'views' => true,
      ),
    ),
    'display_settings' =>
    array (
      'label' =>
      array (
        'format' => 'above',
        'exclude' => 0,
      ),
      5 =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      'teaser' =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      'full' =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      4 =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      2 =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      3 =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
      'token' =>
      array (
        'format' => 'default',
        'exclude' => 0,
      ),
    ),
  ),
);
$content['extra']  = array (
  'title' => '-5',
  'body_field' => '-1',
  'revision_information' => '1',
  'author' => '0',
  'options' => '2',
  'comment_settings' => '7',
  'menu' => '6',
  'taxonomy' => '-2',
  'path' => '5',
  'attachments' => '4',
  'path_redirect' => '3',
);
h3. Bookmarks View notextile.
$view = new view;
$view->name = 'bookmarks';
$view->description = 'Bookmarks added to the site';
$view->tag = '';
$view->view_php = '';
$view->base_table = 'node';
$view->is_cacheable = FALSE;
$view->api_version = 2;
$view->disabled = FALSE; /* Edit this to true to make a default view disabled initially */
$handler = $view->new_display('default', 'Defaults', 'default');
$handler->override_option('fields', array(
  'field_post_date_value' => array(
    'label' => '',
    'alter' => array(
      'alter_text' => 0,
      'text' => '<div class="bookmarkPostedDate">[field_post_date_value]</div>',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'link_to_node' => 0,
    'label_type' => 'none',
    'format' => 'default',
    'multiple' => array(
      'multiple_number' => '',
      'multiple_from' => '',
      'multiple_to' => '',
      'group' => '',
    ),
    'repeat' => array(
      'show_repeat_rule' => '',
    ),
    'fromto' => array(
      'fromto' => 'both',
    ),
    'exclude' => 0,
    'id' => 'field_post_date_value',
    'table' => 'node_data_field_post_date',
    'field' => 'field_post_date_value',
    'relationship' => 'none',
  ),
  'title' => array(
    'label' => 'Title',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'link_to_node' => 0,
    'exclude' => 1,
    'id' => 'title',
    'table' => 'node',
    'field' => 'title',
    'relationship' => 'none',
  ),
  'field_url_value' => array(
    'label' => '',
    'alter' => array(
      'alter_text' => 1,
      'text' => '[title]',
      'make_link' => 1,
      'path' => '[field_url_value] ',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'link_to_node' => 0,
    'label_type' => 'none',
    'format' => 'default',
    'multiple' => array(
      'group' => TRUE,
      'multiple_number' => '',
      'multiple_from' => '',
      'multiple_reversed' => FALSE,
    ),
    'exclude' => 0,
    'id' => 'field_url_value',
    'table' => 'node_data_field_url',
    'field' => 'field_url_value',
    'relationship' => 'none',
  ),
  'body' => array(
    'label' => '',
    'alter' => array(
      'alter_text' => 0,
      'text' => '<div id="bookmarkDescription">[body]</div>',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 1,
    'empty_zero' => 0,
    'exclude' => 0,
    'id' => 'body',
    'table' => 'node_revisions',
    'field' => 'body',
    'relationship' => 'none',
  ),
  'tid' => array(
    'label' => 'in',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 1,
    'empty_zero' => 0,
    'type' => 'separator',
    'separator' => ', ',
    'link_to_taxonomy' => 1,
    'limit' => 1,
    'vids' => array(
      '4' => 4,
      '2' => 0,
      '1' => 0,
      '3' => 0,
    ),
    'exclude' => 0,
    'id' => 'tid',
    'table' => 'term_node',
    'field' => 'tid',
    'relationship' => 'none',
  ),
));
$handler->override_option('sorts', array(
  'field_post_date_value' => array(
    'order' => 'DESC',
    'delta' => -1,
    'id' => 'field_post_date_value',
    'table' => 'node_data_field_post_date',
    'field' => 'field_post_date_value',
    'relationship' => 'none',
  ),
));
$handler->override_option('filters', array(
  'type' => array(
    'operator' => 'in',
    'value' => array(
      'bookmarks' => 'bookmarks',
    ),
    'group' => '0',
    'exposed' => FALSE,
    'expose' => array(
      'operator' => FALSE,
      'label' => '',
    ),
    'id' => 'type',
    'table' => 'node',
    'field' => 'type',
    'relationship' => 'none',
  ),
));
$handler->override_option('access', array(
  'type' => 'none',
));
$handler->override_option('cache', array(
  'type' => 'time',
  'results_lifespan' => 3600,
  'output_lifespan' => 3600,
));
$handler->override_option('title', 'Bookmarks');
$handler->override_option('use_pager', '1');
$handler->override_option('use_more', 0);
$handler->override_option('use_more_always', 0);
$handler->override_option('style_plugin', 'semanticviews_default');
$handler->override_option('style_options', array(
  'grouping' => '',
  'group' => array(
    'element_type' => 'h3',
    'class' => 'title',
  ),
  'list' => array(
    'element_type' => '',
    'class' => '',
  ),
  'row' => array(
    'element_type' => 'div',
    'class' => 'bookmarkItem',
    'last_every_nth' => '0',
    'first_class' => 'first',
    'last_class' => 'last',
    'striping_classes' => 'odd even',
  ),
));
$handler->override_option('row_plugin', 'semanticviews_fields');
$handler->override_option('row_options', array(
  'semantic_html' => array(
    'field_post_date_value' => array(
      'element_type' => 'div',
      'class' => 'bookmarksPostDate',
    ),
    'field_url_value' => array(
      'element_type' => 'div',
      'class' => 'bookmarksURL',
    ),
    'body' => array(
      'element_type' => 'div',
      'class' => 'bookmarksDescription',
    ),
    'tid' => array(
      'element_type' => 'div',
      'class' => 'bookmarksTags',
      'label_element_type' => 'label',
      'label_class' => '',
    ),
  ),
  'skip_blank' => 0,
));
$handler = $view->new_display('page', 'Bookmarks page', 'page_1');
$handler->override_option('path', 'bookmarks');
$handler->override_option('menu', array(
  'type' => 'none',
  'title' => '',
  'description' => '',
  'weight' => 0,
  'name' => 'navigation',
));
$handler->override_option('tab_options', array(
  'type' => 'none',
  'title' => '',
  'description' => '',
  'weight' => 0,
  'name' => 'navigation',
));
h3. parseDelicious.py notextile.
 #!/usr/bin/env python
 
 import optparse, sys, time, xmlrpclib
 
 from lxml import etree
 
 def setupConfig(url, username, password):
     return {
         'url': url,
         'username': username,
         'password': password
     }
 
 def parseDeliciousBookmarks(filename):
     bookmarksTree = etree.parse(filename)
     postTree = bookmarksTree.xpath("//post")
 
     count = 1
     total = len(postTree)
 
     for post in postTree:
         href = post.get("href")
         tags = post.get("tag")
         description = post.get("description")
         extended = post.get("extended")
         postedDatetime = post.get("time")
 
         tagsList = tags.split(" ")
   
         # Fill out the node values that we can
         # create our taxonomy terms
         terms = [tag for tag in tagsList]
 
         postedDatetime = postedDatetime[0:len(postedDatetime) - 1]
         tuple_time = time.strptime(postedDatetime, "%Y-%m-%dT%H:%M:%S")
 
         postedDate = time.strftime("%d %B %Y", tuple_time)
         postedTime = time.strftime("%I:%M%p", tuple_time)
 
         node = {
             'type': 'bookmarks',
             'status': 1,
             'title': description,
             'body': extended,
             'field_url': [
                 {'value': href}            
             ],
             'field_post_date': [
                 {
                     'value': {
                         'date': postedDate,
                         'time': postedTime,
                     },
                     'timezone': 'UTC',
                     'date_type': 'date'
                 }
             ],
             'tags': terms
         }
 
         print "%d of %d: %s (%s) - %s - %s (Posted on %s at %s)" % (count, total, description, href, extended, tags, postedDate, postedTime)
         count += 1
 
         yield node
 
 def updateTaxonomyMapping(vocabulary, server, sessid):
     taxonomyTree = server.taxonomy.getTree(sessid, vocabulary)
     # Create a simple mapping from term to termID
     taxonomyMapping = {}
 
     for tag in taxonomyTree:
         taxonomyMapping[tag["name"]] = tag["tid"]
     
     return taxonomyMapping
 
 if __name__ == "__main__":
 
     parser = optparse.OptionParser()
 
     parser.add_option("-s", "--server", action = "store", type="string", dest = "server", help = "Server to connect to, without 'services/xmlrpc'")
     parser.add_option("-u", "--username", action = "store", type="string", dest = "username", help = "Username, with necessary services rights")
     parser.add_option("-p", "--password", action = "store", type="string", dest = "password", help = "Password")
     parser.add_option("-v", "--vocabulary", action = "store", type="int", dest = "vocabulary", help = "Vocabulary id to store tags in.  Defaults to 1", default = 1)
     parser.add_option("-f", "--file", action = "store", type="string", dest = "file", help = "Path to XML file containing bookmarks exported from delicious; defaults to 'bookmarks.xml'", default = "bookmarks.xml")
 
     (options, args) = parser.parse_args()
 
     if options.server is None:
         parser.error("Server is required")
     elif options.username is None:
         parser.error("Username is required")
     elif options.password is None:
         parser.error("Password is required")
 
     config = {
         'url': options.server + "/services/xmlrpc",
         'username': options.username,
         'password': options.password
     }
 
 
     server = xmlrpclib.ServerProxy(config['url'], allow_none = True)
     connection = server.system.connect()
     session = server.user.login(connection["sessid"], config['username'], config['password'])
     sessid = session['sessid']
     user = session['user']
 
     vocabulary = options.vocabulary
     taxonomyMapping = updateTaxonomyMapping(vocabulary, server, sessid)
     
     for bookmark in parseDeliciousBookmarks(options.file):
 
         bookmark['uid'] = user['uid']
         bookmark['name'] = user['name']
         bookmark['created'] = str(int(time.time()))
 
         # Check our taxonomy terms
         terms = bookmark["tags"]
 
         for term in terms:
             if not term in taxonomyMapping.keys():
                 newTerm = {"vid": vocabulary, "name": term}
                 result = server.taxonomy.saveTerm(sessid, newTerm)
         
         taxonomyMapping = updateTaxonomyMapping(vocabulary, server, sessid)
 
         termIDs = [taxonomyMapping[term] for term in terms]
 
         del bookmark["tags"]
         
         bookmark["taxonomy"] = termIDs
 
         try:
             n = server.node.save(sessid, bookmark)
             print "Created nodeID ", n
         except xmlrpclib.Fault, err:
             print "A fault occurred"
             print "Fault code: %d" % err.faultCode
             print "Fault string: %s" % err.faultString
 

Tags

Comments

[...] those libraries using Drupal, Zeitkunst has an article on setting up your local install to serve your bookmarking [...]

[...] Importing Delicious Bookmarks Into Drupal, or the Cloud is Disappearing | zeitkunst Moving bookmarks from delicious to self-hosted Drupal. (tags: drupal del.icio.us web) Filed under: del.icio.us Leave a comment Comments (0) Trackbacks (0) ( subscribe to comments on this post ) [...]

Add new comment