What is the Akashic Record project?
The Akashic Record FOSS project aims to collect and maintain data and media to make it more accessible, usable, and manageable. With the increased availability of computing power and large sources of information there still is a great need for increasing ease of use. I have wanted to work on this for nearly a decade; mostly as a method to organize the large collection of my own personal documents and media. If I am looking for information on a specific event or phrase that I may have mentioned in the half-decade’s worth of chat logs, e-mails, or documents I have I don’t want to search each data type individually. Even looking up keywords is inefficient as I could be misremembering how the topic could have come up or what words I used to refer to it.
You can come chat with the project at the #akashicrecord channel on IRC at irc.freenode.net or check out the project at Gitorious.
I’ve purchased a domain for the project at akashic-record.org (which currently does not point anywhere; I will eventually work out exactly what I want to do there in the coming weeks).
I am also working on a logo for the project; more on that later.
Currently in Development
There are a number of pieces of the Akashic Record that are currently in active development. I’ve listed the most active items below.
While I was thinking about how I intended to archive large amounts of content even from the parts of the public web that I am interested in I came to the realization that rate limits and bandwidth caps might actually become an issue. To alleviate this I decided to split off the archiving function into its own independent module that I’ve dubbed the Akashic Archivist. This will allow anyone to run an Archivist node (either independently or as a worker node for a primary Akashic Record instance).
In particular I thought that it could be a good use for a Raspberry Pi. Add a portable 1TB hard drive with an external power adapter and a small USB wall-wart and you have a low power archiving node ready to send out to a friend or family member. The archivist node operator would be able to set the maximum monthly bandwidth as well as a maximum transfer rate per second and a central node could request that specific places be archived by that archivist node. Then node operators could make user of the sneakernet and its wonderfully available bandwidth to transfer the data to the main node. While mailing out a hard drive concerns me somewhat there are other options for the storage media (like two 128GB USB drives) that are a bit more rugged.
Currently the only method of archiving will be in “Raw Mode” (get everything you see, save it all verbatim) which is going to be very costly insofar as space is concerned. I will likely compress all archived data and media to save on space but then I may need an alternative solution for hardware to avoid the processing bottleneck. Later when other formats are developed (see: data architecture) the archivist will be configurable to only use a select few formats although that might also require a lot of processing power (if tagging / context / OCR is included).
This item is in the “Akashic Recorder” repository under the project (no code committed publicly yet as of this posting).
Daughter of Mnemosyne
This is my main focus right now since it’s going to be immediately useful to me. It is a libpurple plug-in that creates a chat-bot that can be accessed from across any supported protocol (I plan to use XMPP and IRC myself). This chat bot will give anyone access to the public items from my data collection through a series of simple commands (formatted for chat or linked to a temporary document). In addition it will have an authenticated part that allows specific users with certain levels of access to access various other pieces of data including system status and maintenance functions for my server. The private, personalized access functionality is planned to only work during on-on-one communication; potentially with a dependency to Off-the-Record Messaging. I’m still up in the air with this.
The name for this comes from the Greek personification of memory, Mnemosyne. It’s apt considering the name of the project which refers to a shared memory for the entire human race.
So far I have not settled on any particular data architecture for the main system but I do know that I want it to be generic and abstracted away from the pieces of software that could request information or media from my collection. It needs to be able to accommodate most of the prevalent digital formats available on the Internet without adding too much overhead as the large amount of links each item could have could cripple it.
If anyone has any recommendations for good data architecture references / books or specific FOSS technologies that would be good to look at I’d love to hear from you.
I’ve been exploring many different storage solutions from rack-mounted SAN setups, NAS arrays, and commodity PC clusters. The most important aspect of my choice here is to make sure it is both cheap enough to start a base system now as well as able to scale to to my future storage needs. This has been more challenging than I had expected when thinking about all the storage I could potentially need depending on how I save, version, and tag pieces of information and media.
Setting up a large amount of storage would be simple enough but I need to make sure to have a redundant system, especially as my data collection grows.