WIDE Technical-Report in 2012 Cybersecurity Information Discovery Mechanism wide-tr-TakeshiTakahashi-SecInfoDiscovery-00.txt WIDE Project: http://www.wide.ad.jp/ If you have any comments on WIDE documents, please contact to board@wide.ad.jp. Title:Cybersecurity Information Discovery Mechanism Author(s): Takeshi Takahashi and Youki Kadobayashi Date: 2012-12-15 CYBEX Working Group T. Takahashi NICT Y. Kadobayashi NAIST Cybersecurity Information Discovery Mechanism wide-tr-TakeshiTakahashi-SecInfoDiscovery-00.txt Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Proposed Mechanism . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Information Structure . . . . . . . . . . . . . . . . . . . 3 2.3. Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1. Implementation . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Demonstration . . . . . . . . . . . . . . . . . . . . . . . 5 4. Discussion and Future Works . . . . . . . . . . . . . . . . . . 6 5. Copyright Notice . . . . . . . . . . . . . . . . . . . . . . . 6 6. Normative References . . . . . . . . . . . . . . . . . . . . . 6 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 7 Takahashi & Kadobayashi [Page 2] wide-tr Cybersecurity Information Discovery Oct 2012 1. Introduction To cope with increasing amount of cyber threats, organizations need to share cybersecurity information beyond the borders of organizations, countries, and even languages. Assorted organizations built repositories that store and provide XML-based cybersecurity information on the Internet. Among them are NVD, OSVDB, and JVN, and more cybersecurity information from various organizations from various countries will be available in the Internet. However, users are unaware of all of them. To advance information sharing, users need to be aware of them and be capable of identifying and locating cybersecurity information across such repositories by the parties who need that, and then obtaining the information over networks. This paper proposes a discovery mechanism, which identifies and locates sources and types of cybersecurity information and exchanges the information over networks. The mechanism uses the ontology of cybersecurity information [Takahashi_SINCONF2010] to incorporate assorted format of such information so that it can maintain future extensibility. It generates RDF-based metadata from XML-based cybersecurity information through the use of XSLT. This paper also introduces an implementation of the proposed mechanism and discusses extensibility and usability of the proposed mechanism. 2. Proposed Mechanism 2.1. Roles The proposed mechanism introduces four distinct roles. Discovery Client retrieves cybersecurity information by communicating with one or more arbitrary Discovery Servers. Discovery Server provides assistances to find proper Information Source to Discovery Clients by communicating with multiple Registries, aggregating information from them, and then delivering that to the Discovery Client. Registry manages an internal registry that contains the metadata of Information Sources by communicating with them. Information Source provides cybersecurity information that is described in XML format by communicating with Registries. 2.2. Information Structure A Registry uses an RDF-based internal repository to maintain the metadata list of cybersecurity information residing in Information Takahashi & Kadobayashi [Page 3] wide-tr Cybersecurity Information Discovery Oct 2012 Sources. The metadata is generated by accessing Information Sources and extracting needed information from them, as described in Section Section 2.3. Note that the level of details of the metadata depends on implementation, but URI that can uniquely identify an Information Source is needed The repository uses the information structure described in Table \ref{Tabl:MajorCybersecurityInformationStandards}, which separates information category and content description format. The ontology of cybersecurity operational information proposed in [Takahashi_SINCONF2010] is used for the category, whereas various industry specifications are used for the content description format, so that it can maintain future extensibility and compatibility with future such specifications. 2.3. Protocol Information Publishing is a procedure for an Information Source to publish its XML-based cybersecurity information. An Information Source sends registration message, which contains the information's URI, category, and allowed access method (e.g., http), to a Registry. The Registry then accesses to the URI by using one of the methods, receives the information, and converts it into RDF-based metadata by running XSLT. It then generates and sends Notification message to its Discovery Servers, which may also send the message to Discovery Clients so that they can receive any security information updates as soon as possible. Server Registration and cancellation are procedures for a Discovery Client to use a Discovery Server. A Discovery Client sends join message to a Discovery Server it wants to use. The Discovery Server then sends result message with the category and supporting format information. Though this paper proposes a single category following the ontology proposed in [Takahashi_SINCONF2010], the procedure allows to use different categories by embedding different category information in the Result message, so that the proposed mechanism can provide future extensibility. When the Discovery Client wishes to stop using the server, the client may send leave message to the server. Information Retrieval is a procedure for a Discovery Client to retrieve and obtain cybersecurity information. A Discovery Client sends query message to a Discovery Server, which forwards the message to all of the Registries it communicates with. Each of them then retrieves its internal repository and creates and sends a Result message. The Discovery Server receives the messages from all of the Registries, aggregates them into one, ranks and reorders the candidate Information Sources, and then embeds the information into a new Result message, which is sent back to the Discovery Client. The Discovery Client chooses one Information Source among the candidate Takahashi & Kadobayashi [Page 4] wide-tr Cybersecurity Information Discovery Oct 2012 Information Sources that is listed in the message. Then it accesses to the Information Source's URI using the allowed access method and obtains the XML information stored inside the Information Source. 3. Prototype 3.1. Implementation A prototype of the proposed mechanism is implemented with Java on Linux CentOS. It uses a certificate provided by Jetty to certify the Information Source. Its Registry simply converts all the tags of the Information Sources' XML information into RDF-based metadata by using XSLT, though meticulous metadata extraction mechanism could be implemented, if needed. Sesame, an implementation of SPARQL engine, is also used. The proposed mechanism allows Information Sources to support arbitrary transport protocol for accessing itself, but this implementation supports only HTTP, HTTPS, and WebSocket. During the retrieval procedure, the Registry needs to rank candidates of Information Sources. Though the ranking algorithm is outside the scope of the paper, the implementation adopted a simple algorithm as follows. The algorithm counts the number of keywords available in a tag, and then divide the number by the total number of the words in the tag. Then it assigns high rank on the entry that has higher resultant value. If the same value could be found, the one with older registration date gets higher rank. 3.2. Demonstration We set up a demonstration, where an Information Source publishes its cybersecurity information. This demonstration is conducted over a network consisting of 1 Discovery Client, 3 Discovery Servers, 15 Registries, and 30 Information Sources, all of which are running over different virtual machines. The demonstration has a network view, which describes network topology and communication status within it. The demonstration also has a search view of Discovery Clients. It provides category-based search, keyword search, and security information update. The keyword search is in the bottom part of the view. Users can enter arbitrary keyword in the bottom text box and run search by clicking on the "Search" button. Users may enjoy more sophisticated searches by clicking on the "Advance Search" button in the view and moving to the advanced search view, where they may specify the target tags of the retrieval. When specifying the tags, the users may Takahashi & Kadobayashi [Page 5] wide-tr Cybersecurity Information Discovery Oct 2012 lookup the available tags by clicking on the "Select category" button. The Discovery Client can provide the category information since it went through the server registration procedure as described in Section Section 2.3, where it received the information from its Discovery Server. Users can simply select the tag, then identify the keyword in the advanced search. 4. Discussion and Future Works The proposed mechanism incorporates various formats defined by assorted industry specifications, which are yet to be developed further. Its metadata structure is designed so that it can maintain extensibility. In case current information format becomes obsoleted, any new specification could be introduced as a means to describe information of the types defined by the ontology. In this way, the changes are kept minimal. Even more, the ontology itself could be extended though the ontology is designed so that these won't happen in the near future. In addition to the extensibility, this mechanism needs to be scalable to accommodate large volume of cybersecurity information. This evaluation must be done as our future work. The proposed mechanism enables users to search cybersecurity information across assorted repositories including NVD and JVN. The current implementation is, however, run on the Intranet that is isolated from the Internet since it does not consider security of the system. For instance, it may suffer from impersonation or man-in- the-middle attacks, which may cause severe security incidents. Though this paper excluded the security issue from this paper, our future work considers this issue and integrates with assorted security techniques to reinforce the mechanism's security level. Further information with reader-friendly diagrams are available in [Takahashi_ACSAC2012_Poster]. 5. Copyright Notice Copyright (C) Takeshi Takahashi (2012). All Rights Reserved. 6. Normative References [Takahashi_SINCONF2010] Takahashi, T., Kadobayashi, Y., and H. Fujiwara, Takahashi & Kadobayashi Expires April 18, 2013 [Page 6] wide-tr Cybersecurity Information Discovery Oct 2012 "Ontological Approach toward Cybersecurity in Cloud Computing", International Conference on Security of Information and Networks SIN, September 2010. [Takahashi_ACSAC2012_Poster] Takahashi, T., Kadobayashi, Y., and Y. Takano, "Linking Cybersecurity Knowledge: Cybersecurity Information Discovery Mechanism", ACSAC Poster Session, November 2012. Authors' Addresses Takeshi Takahashi National Institute of Information and Communications Technology 4-2-1 Nukui-Kitamachi Koganei 184-8795 Tokyo Japan Phone: +80 423 27 5862 Email: takeshi_takahashi@nict.go.jp Youki Kadobayashi Nara Institute of Science and Technology 8916-5 Takayama, Ikoma 630-0192 Nara Japan Email: youki-k@is.aist-nara.ac.jp Takahashi & Kadobayashi [Page 7]