How to Import Complex Documents into Innoslate

One of the first things you want to do when using a requirements or PLM tool is to import complex documents into the tool, so that you can begin analysis. Most documents have pictures, tables and other elements of information that you want to be able to access. Often these complex documents come as an Adobe Portable Document Format (PDF) file; other times they come as Comma Separated Values, MS Word and other formats. In this blog, we will deal with PDFs.

When bringing in a new document, we recommend starting with a new Innoslate project. Also, most documents are a result of non-uniform word processing, which means that import software has to deal with many different possible numbering schemes and formats. This fact makes getting a document into a toll very difficult. Fortunately, Innoslate’s Import Analyzer provides the means to overcome most of this problem, but you may want to do a little bit of work on the file first.

PDFs come in two forms: 1) scanned documents; and 2) selectable documents. The first one requires the use of Optical Character Recognition (OCR) software. We recommend Google’s as it seems to be one of the best, but many other tools for OCR are available. This process converts it into a selectable document.

Once you have the document in an editable PDF format and you have a copy of the latest version of Adobe Acrobat Pro, you can try to save the document as an MS Word file. Adobe tends to do a very good job converting documents this way. You may still want to go through and clean up the document, such as removing the table of contents and other unnecessary information.

After you have the document in the format you want, select the Import Analyzer from the Menu and follow the process of using the “Word (.docx)” tab, selecting the class for import (Next), and dragging the file into the window for import (Step 1). The upload and analysis process may take a minute or so, depending on the size and complexity of the document (Step 2). The analysis includes the creation of parent-child relationships (decomposed by/decomposes) as identified by the numbering scheme.

Once the upload and analysis are complete, just select “Next” and you can see a preview of the information as it has been captured (Step 3). If satisfied, then select save the entities into the database.

Step 1:

Step 2:

Step 3:

The end result of this process is the document being seen in the Requirements View. The analyzer includes any pictures and tables, if they were properly developed that way in the original document (see below).

 

 

If you already have another project to which you want to add this one to, you can export and re-import the Innoslate XML file, or (better) use the branching/forking capability (go to Database View and use the “Branch” button). When creating a new branch, instead of selecting a “New Project” use the “Target” drop down menu to select the project with which you want the document to merge (see right).

 

The process above is a best-case situation for complex documents. Sometimes, this approach to importing fails due to problems in the MS Word document itself.

The second way to import a PDF file is to use the “Plain Text (.pdf, .txt, etc.)” tab in the analyzer (shown below).

Here you need to give the Artifact (the entity that will store the uploaded document) a name, which you can edit later. Again, select the class type for import (we default to Requirements, since that usually what you are importing). Finally, we need to select the type of list contained within the document, again for the purpose of creating the parent-child relationships.

 

 

After clicking the “Next>” button, you can paste the copied text from the file into the space provided. After clicking the Next button on that screen, the analysis proceeds and then you can preview the results as before.

 

Finally, the worst-case scenario is a PDF document that cannot be easily imported using any of the Import Analyzer tools. Although this is rare, it does occur. Recently I was asked to import a portion of the US Code. For anyone who has seen it, it’s double column and contains a lot of unusual characters. So, I determined that the fastest way to bring it into the tool was by cutting and pasting objects into the Requirements View of an Innoslate Project. I used this as an opportunity to conduct analysis on the document as I went. Since I was not under a tight deadline (I had days to perform the task, not hours) and we ultimately wanted to perform requirements analysis anyway, this let me take blocks of text and treat them as Statements, instead of Requirements, when they really only provided context or breakup paragraphs that contained multiple requirements into individual entities to they could be separately traced. That effort took a person day and one half, while the other ones above had taken really only minutes, but no analysis was done.

 

All-in-all Innoslate provides the means to bring any and all information from the outside into the tool. You can then use that information to complete the rest of the lifecycle within the same tool environment (with no plug-ins required).