THE REGNET & REGBASE PROJECT

  1. Overview
    The Regnet & Regbase research work so far has focused on developing tools and formalisms for making regulations more accessible to interest groups and the public. The approach has been to process regulations into a standardized, machine-comprehendible format that can be easily augmented with new information. This foundation can be exploited for a number of tasks, from searching to reasoning.

  2. Taxonomy Building
    The project is also focused on constructing taxonomic conceptual hierarchies for environmental regulations using software from Semio Corporation. These concept hierarchies allow grouping of regulations by content, rather than by the structure of the organizations that authored the regulations. Browsing the taxonomy structure enables one to find regulations pertaining to a particular topic, regardless of the regulations' origin.

  3. XML-Standardized Format
    A standardized XML format has been developed that is universally applicable to regulations, and that can be validated against a DTD (Document-Type Definition). This DTD was designed to be applicable to all regulations, with a focus on federal, state, and local regulations.

    Regulation-parsers are developed that will transform a variety of regulation formats in plain text into our standardized XML format. Parsers have already been built for federal and Illinois state environmental regulations, as well as for federal and several European accessibility standards. The next step is to complete a universal parser for environmental regulations.


  4. Adding XML Meta-Data and Enabling Regulation Reasoning
    An automated system has been built to add concepts and taxonomy categories to regulation provisions. This additional meta-data is inserted in the XML representation of the regulations, and can be used for tasks such as searching or determining context.

    Additionally, a system has been built to enable reasoning with environmental regulations. The system scans regulations for logic sentences (pre-encoded into the XML) and allows the user to pose logic questions. A demo for this system is available under the Search 40 CFR section.

    Finally, a parser is developed to facilitate automated linking of references appearing within a regulation. This meta-data will be added to the XML documents and can be used for searching and retrieving complete information.


  5. Feature Extraction and Similarity Analysis
    To enable a similarity analysis between different sources of accessibility regulations, features are extracted and incorporated into the XML legal corpus.  Common features include concepts, definitions, as well as domain knowledge such as measurements and effective dates.  Features are tagged as additional meta-data.

    A similarity analysis is performed using a combination of content comparison and structural analysis.  Results are documented in several papers under the Publications section.  In addition, we applied our analysis core to the domain of e-rulemaking and a demo of use can be found under the Presentations section.

    Currently, we are working on extending the technology to analysis and compare drinking water standards.