Thursday, December 13, 2007

eNemy At The Gates

If you want to see an entertainingly polarized debate among a lot of historians - or academics in general, but it seems historians in particular lately - you need do no more than invoke online sources. Particularly in the context of assignments, those two words tend to result in a couple of common reactions:
  1. "It's an online source. So what? They're just as good as conventional ones."
  2. "Online sources are intrinsically bad should never ever be used."
Of course there's others, but most of the time I've discussed the idea I've heard variants of those. I'll wax provocative for a moment and suggest that exhibiting either opinion likely shos a certain, perhaps conscious, lack of thought on the issue. Both views are highly problematic and not questioned often enough, which is where I'm going to try to come in tonight. I'm mainly going to focus on the latter claim with tonight's article - it is by far the more commonly-heard one. (Before doing so, I will cheerfully say that I consider automatic trust of Web sources to be at least as silly as automatic trust of AM talk radio, or perhaps the Weekly World News.)

There are usually a few, pretty predictable arguments presented when people argue for an automatic rejection or disdain of online sources. I'm going to address the most common ones I've run into, in order from most to least absurd. (I'm not going into arguments which go so far as to dismiss government or university sources for being online; that, I hope, is too self-evidently ridiculous to warrant refutation.)

Objection the First: "It's too difficult to track down references. You can't cite Web pages as specifically as you can books or other materials: there's no page numbers!" The crux of this argument is that online sources are not print sources - duh - and therefore are too difficult/unreliable/etc to source to bother citing, because of inconsistent layout and the fact that it may not be immediately obvious where one may get all the information needed to do a proper, full citation. A number of simple solutions exist here. If all the information isn't there, then that's fine; it's not your fault if the specific author or organization behind a Web site isn't explicit enough, for instance, provided most of the information (and the location itself) are there. As far as citing specific parts of a site, of course Web sites aren't going to have page numbers. They aren't books. I don't see a problem here. On the other hand, most sites out there - and all more static media like PDFs - are organized into small enough chunks that you can usually narrow down a cite to a moderately specific page. (They're also often equipped with anchors, which are great in properly-designed sites.) If one can't due to a large block of text, browsers come equipped with search functions for a reason, at least if one's simply concerned with confirming that the information's there.

There is a real problem with some aspects of online sources. "Deep Web" materials - material which is usually procedurally generated, only accessible in its specific form through cookies, searches or other forms of interaction, and so on, are considerably more difficult to get a hold of. As if that wasn't bad enough, they're growing: the Deep Web is hundreds of times larger than the "surface" one right now. There will have to be mechanisms to deal with this in time; handling them on a case-by-case basis is a bare minimum, however. I'm not convinced at the desirability of rejecting an entire field because of some slight inconvenience.


Objection the Second: "Just anyone can put up a Web page!" Oh ho! Yes, this is true - and so what? If you believe it's difficult to have a particularly absurd piece of work show up in book form - or, in the right circumstances, appear as a published article in an academic journal - then I have a bridge I'd like you to consider buying. This argument doesn't impress me at all, mainly because its main underlying assumptions - that "just anyone" can put something online, and that "real" repositories can effectively prevent people from getting their crackpottery in among them - are both flatly untrue.

Another implication of this claim bothers me considerably more. It is the claim, sometimes explicit but usually not, that the identity of a person making an argument has some bearing on the quality of the argument itself - or, indeed, is more important than said argument. This is a contemptible idea, built around a set of logical fallacies that all but the most sophistric freshmen are usually aware of. If we are talking about a world of debate and scholarship - and even amateurs can engage in either! - then these arguments should rise or fall on their own merits. An historical stance should be effective regardless of its creator, provided it stands up to scrutiny - but using its creators' identity as the sole point of that scrutiny is not an appropriate way to handle such things. The identity of a person can influence an argument to a point - after all, consistently good (or bad) arguments can imply more of either in the future - but in the end the effectiveness of a stance should be determined by, well, its effectiveness, and not its creator.

With that in mind, I also think it's fantastic that it's easier for people to put information up for all the world (or at least a specific subset of it) to see. The amount of lousy history - and economics, and science, and art, and recipes - will go way up as a result, but there's room for the good stuff as well. We shouldn't ignore the latter because of the presence of the former, any more than we should shun good archaeologists because von Daniken ostensibly published in the field. We're dealing with a medum here which allows people to do end runs around the gatekeepers for various fields. So what if things get somewhat nuts and over-varied as a result? Personally, I want to embrace the chaos.

Objection the Third: "Online material isn't peer-reviewed and therefore shouldn't be used." While this is often used synonymously with #2, above, it is a distinct complaint, and the only one of these three which I don't see as enirely without merit. While the first two complaints are ones of mere style or elitism, this is an issue of quality control. While the lack of (obvious) peer review - detailed criticism and corroboration by a handful of experts in a specific field - is indeed a problem, it is one which provides some good opportunities for the readers both lay and professional to hone some abilities.

A huge component of the discipline of history, on the research side of things, is the notion of critical examination of sources. Note that this is not the same as merely rejection them! We are taught to look with a careful, hopefully not too jaundiced, eye at any source or argument with which we are presented, keeping an eye out for both weaknesses and strengths. The things to which historians have applied this have diversified dramatically in the last several generations, moving out of libraries and national archives and accepting - sometimes grudgingly, sometimes not - everything from oral traditions to modern science to (as in public history) popular opinions and beliefs about the issues of the day or the past. It's a good skill, and probably a decent chunk of why people with history degrees tend to wind up just about everywhere despite the expected "learn history to teach history" cliche (which, of course, I plan to pursue, but hey!). Online sources shouldn't get a free pass from this - but they should not get the automatic fail so many seem to desire either.

To one point or another, we are all equipped with what Carl Sagan referred to in The Demon-Haunted World - find and read this - as baloney detection kits - a basic awareness of what may or may not be problematic, reliable, true or false about anything we run into in day-to-day affairs. There's semi-formal versions of it for different things, but to one level or another even the most credulous of us have thought processes along these lines. It's a kit which needs to be tuned and applied towards historical sources online - just like all other sources - and in a far more mature way than the rather kneejerk pseudoskepticism which is common these days.

(I compiled a sample BDK for evaluating online resources a couple of years ago as part of my TAing duties at SMU; once I'm back home for the holidays I intend to try to dig that up and I'll follow up with this post by sticking it here.)

The reflexive dismissal of sources of information based entirely on their media is not just an unfortunate practice. It involves a certain abdication of thought, of the responsibility to at least attempt to see some possibility in any source out there, even if it doesn't share the basic shape and style of academic standards. Besides, as I mentioned earlier, there are opportunities in this as well. The nature of online soruces isn't simply the "problem" that someone else didn't do our work for us, pre-screening them for our consumption ahead of time. Their nature is such that it underscores the fact that we need to be taking a more active role in this anyway. For the basic materials out there, it's far easier to vet for basic sanity than many might think - I did effectively show a room full of non-majors how to do it for historical sources in an hour, anyway - and giving everyone a little more practice in this sort of thing can't exactly hurt. In other words, we need to approach online sources with a genuine skepticism.

But guess what? This whole thing's just a smokescreen for a larger issue anyway. We're willing, indeed eager, to hold varying degrees of skepticism towards online sources, but why are we singling them out? Why the complacency as regards citations of interviews, of magazine articles, of books? If you're going to go swinging the questioning mallet, you should at least do so evenly, don't you think?


And on that note, I head off to be shoehorned into a thin metal tube and hurled hundreds of kilometers. I shall post at you next from Halifax!

Tuesday, December 4, 2007

Silently Posturing

As an aside exercise for my digital history class, we were asked to read a paper by Alan Cooper called "Your Program's Posture." Cooper categorizes programs as sovereign, transient, daemonic, or parasitic, the specific classification depending on how it interacts with the user, and the assignment asked us to consider where the programs we use in the course of our work lie in that grouping. I already had a good idea of where the software I use would lie, but I also felt I should read the article before going with my gut instinct of classifying everything as daemonic.

Cooper's categories are described in terms of "postures," essentially their dominant "style" or gross characteristics which determine how users approach, use, and react to them. The first of these four postures is the "sovereign" posture: sovereign programs are paramount to the user, filling most or all of the screen's real estate and functioning as the core of a given project.

The second is "transient," and is the opposite of sovereign software both visually and in terms of interfaces. Intended for specific purposes, meant to be up only temporarily (or, if up for a long time, not constantly interacted with), transient programs can get away with being more exuberant and less intuitive than sovereign applications.

I realized my gut reaction of describing half the annoying stuff I use as daemonic when I realized that the third posture refers to daemon in the computing sense of the word rather than the more traditional gaggle of evil critters with cool names. (Computing jargon tends to come from the oddest places.) Daemonic postures are subtle ones, running constantly in the background but not necessarily being visible to the user at any given time. Daemonic programs tend to either have no interface (for all practical purposes) or tend to have very minimal ones, as the user tends not to do much with them, if anything. They're usually invisible, like printer drivers or the two dozen or so processes a typical computer has running at any time.

The final set of programs are called "parasitic" ones, in the sense that they tend to park on top of another program to fulfill a given function. Cooper describes them as a mixture of sovereign and transient in that they tend to be around all the time, but running in the background, supplementary to a sovereign program. Clocks, resource meters, and so on, generally qualify.

In the interest of this not being entirely a CS post, I should probably answer the initial request on the syllabus as to how it can affect my historical research process. I'm not sure, fully, but I'm also answering this entirely on the fly and and more concerned with how it should affect my process. At present, I'm not using many programs specifically for research purposes. Firefox and OpenOffice (which I use en lieu of Microsoft Office, moreso since that hideous new interface in Office '07 began to give me soul cancer), the main programs I tend to have up at any given time and which I obviously do a lot of my work in, are definitely sovereign program, taking up most of my screen's real estate. The closest thing I have to a work-related application that's transient is Winamp, which is usually parked in the semi-background cheerfully producing background noise I need to function properly. I don't make much use of parasitic programs due to a lack of knowledge of the options about them, mainly, and of course my daemonic ones are usually invisible.

The chunks of this I make use of are mostly a case of "if it ain't broke, don't fix it." I've got my browser, through which I access a lot of my research tools (including Zotero, the most obvious parasitic application I have, and the aggregator functions of Bloglines, the, uh, other most obvious parasitic application I have); I've got my word processor, through which I process my words; I've got Photoshop for 2D graphics work and hogging system resources; I've got Blender for 3D graphics stuff (much though I am annoyed by its coder-designed interface); I've got FreeMind, which is great for planning stuff out. I've no shortage of big, screen-eating sovereign applications, in other words, most of which do their often highly varied jobs quite well.

Some of these can wander from one form to another, of course. I spent an hour earlier this evening working with Blender's animation function to produce a short CG video. When I started the program rendering the six hundred frames of that video, I wasn't going to be doing anything else with it for awhile, and was thus able to simply shunt it out of the way. That left me with a small window showing the rendering process in one corner of my screen, allowing me to work in some other stuff, albeit slightly more slowly as the computer chundered away. Cast down from the throne, the sovereign program became transitorily transient.

What I'm wondering about now, though, are applications which fill the other two postures; stuff that you can set up and just let fly to assist with research or other purposes. An simple and obvious example of this sort of thing would be applications which can trawl RSS feeds for their user. Some careful use setting the application up in the first place - search, like research, is something which can occasionally take significant skill to get useful results - and you could kick back (or deal with more immediate or physical research and other issues) and allow your application to sift thousands of other documents for things you're interested in. Things like this are not without their flaws - unless you're a wizard with searches or otherwise incredibly fortunate, you're as likely as not to miss quite a bit of stuff when trawling fifty or five hundred or five thousand feeds. Then again, that's going to happen anyway no matter what you're researching in this day and age, and systems like this would greatly facilitate at least surveying vast bases of information that would otherwise take up scores of undergraduate research assistants to get through.

The information is out there; there just need to be some better tools (or better-known tools) to dig through it. Properly done, something like this would need minimal interaction once it gets going; you set it up, tell it to trawl your feeds (or Amazon's new books sections, or H-Net's vast mailing lists, or more specialized databases for one thing or another, etc.), and only need to check back in daily or weekly or whenever your search application beeps or blinks or sets off a road flare, leaving you to spend more of your attention on whatever else may need doing. Going through the results would still involve some old-fashioned manual sifting, as likely as not, but if executed properly you would be far more likely to come up with some interesting results than you would by sifting through a tithe of the information in twice the time.

Something like this could help get data from more out-of-left-field areas, as well; setting up a search aggregator as an historian and siccing it, with the terms of whatever you're interested in, on another field like economics or anthropology or law or botany or physics might be a bit of a crapshoot, but could well also yield some surprising views on your current topic from altogether different perspectives, or bring in new tools or methods that the guys across campus thought of first (and vice versa). That sort of collision is what resulted in classes like this (or, at a broader level, public history in general), of course. I want to see more of that - much more.

It could be interesting to see what kind of mashups would result if people in history and various other fields began taking a more active stance on that sort of thing. Being able to look over other disciplines' shoulders is one of those things that simply can't hurt - especially if we have the tools to do so more easily than we could in the past.

I meant to segue into daemonic applications by talking some about distributed computing research, as much to see if I could find ways to drag history into that particularly awesome and subtle area of knowledge, but as usual my muse has gotten away from me and forced a tome onto your screen. So I do believe I shall keep that for some other time...