2.7.4 Newspapers and the Semantic Web

By mchoate
Last modified: 2006-09-04 13:23:33

Tim Berners-Lee, the inventor of the original World Wide Web, now invisions a different kind of Web, one he calls the “Semantic Web.” In an article that first appeared in Scientific American in May, 2001, he writes, "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." (Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001).

The Resource Description Framework (RDF) is the language of the Semantic Web and it is a better way to describe the content of documents that are available online. In many ways, RDF is like a library card catalog system for the Internet – it’s an agreed-upon format for describing what particular documents or images are about so that it is easier for people to find them. It also goes beyond a card catalog system, because it also provides a mechanism for computers to make inferences from this information and to help people learn new things which may not be explicity stated in the document itself but can be inferred from it.

PRISM (Publishing Requirements for Industry Standard Metadata) is the group responsible for establishing standards for the publishing industry that will help make the Semantic Web a reality. The practical application of this technology for publishers is that it “assists in the automation of publishing production processes and content exchange.” (see prismstandard.org). PRISM describes the opportunities for publishers thus:

"There was a time, not long ago, when publisher’s content moved along a straight and narrow path that reached a dead end at the printing press. Now that's all changing with the emergence of new technologies such as the Web, where content can be located, bought and sold; asset management tools, where digital files can be archived for easy use and reuse; and screen-based devices, such as laptops, PDAs, cell phones and kiosks, where content can be viewed.

"Instead of content taking a one-way trip to the printed page it can now be sent in multiple directions at once, not only to various consumer devices, but to other providers and aggregators as well, who will repackage, reposition and repurpose it."

The PRISM standard incorporates many newspaper industry standards, including NewsML, and News Industry Text Format (NITF), both of which are XML formats whose aim is to make distributing news and information more efficient. Some practical benefits to newspapers are:

  1. Generate additional revenue through cross-media publishing and content re-use. Applications include on-demand printing, variable-data printing, Internet publishing, fee-based archives, etc.

  2. Streamline online publication workflow – smart use of metadata can significantly reduce the amount of work required to publish online.

  3. Improve the quality of your internal archives for news gathering purposes. Utilize inference engines as part of newsroom computer-assisted reporting strategies.

  4. There is also potential to use this improved search ability in the creation of next-generation classified advertising products.

To recap, RDF has emerged as the document format standard for the publishing industry. This format specifies how to apply “metadata” – additional information about a document that makes it more useful – to news articles, photos and other assets. As this standard continues to emerge, newspapers will need to accomplish the following:

  1. Convert existing archives into this new format.

  2. Integrate the collection and documentation of metadata into the newsroom workflow.

  3. Upgrade existing systems to support the format.