VectorSection Proposal

Synopsis

Funding this proposal will add support for the binary .dwg CAD file format to the VectorSection open source vector graphics conversion system. This project will create the first open source .dwg to .dxf translator, as well as enabling open source innovation and collaboration in fields which are dependent on access to vector graphics data and interoperability.

The VectorSection Method

VectorSection is a system of vector graphics conversion programs designed for interoperability with an emphasis on reusability. It aims to be a universal translation system which will provide graphics users and developers with powerful and unified methods for accessing data without the barriers and complications of legacy file formats.

Graphics programs in different fields work with wildly differing basic objects (primitives) which have incompatible, but translatable mathematical definitions. The way these primitives are defined and encoded in a file format complicates communication even though meaningful data could otherwise be exchanged. For example, a 2D drafting program and a desktop publishing application both work with curves and lines on a 2D plane, but a drafting program will support NURB splines and dimension lines while the publishing program tends to have more sophisticated gradients and page layout data. In 3D graphics, support for wireframes, meshes, surfaces, and solid geometry are only the beginning of the diversity.

VectorSection tackles these diverse formats by identifying commonalities and defining an optimal superset of entities which are able to losslessly express the source data. Rather than attempting to store 2D desktop publishing data alongside 3D mesh graphics, there are several internal representations (hubs) being defined, each of which is tailored to represent several common formats within a given graphics paradigm. To date, the "Open Vector" hubs OVP (which expresses "Printing" graphics such as .svg, .xar, .ps, and .pdf) and OVD (which expresses "Drafting" graphics such as .dxf, .dwg, and .dgn) have made notable headway. Between these internal data representations, simple bridges provide the approximations required to transform entities into another paradigm hub.

The connectors are designed to be piped together using a YAML (object serialization) format plaintext stream. This stream can be loaded directly by many programming languages and can also be dumped into a directory tree form for persistent and atomic per-entity access via the filesystem. These two simple communication mechanisms enable a variety of useful applications beyond file translation and may serve as the building blocks for more specialized data stores.

The .dwg Format and Open Source

There is currently no open source translator for .dwg data, yet there are many open source programs which read .dxf data. So simply providing a cross-platform, open source .dwg to .dxf translator will enable many users to access their existing data more easily. Namely, Linux users have been largely without any options.

Further, development of open source software has been hindered by lack of support for this commonly exchanged binary format, both in terms of desktop software and in custom programming on Linux using e.g. Perl, Python, and Ruby. VectorSection's internal object store (OVD) will provide a simple way for programmers to access .dwg data for custom automation and niche applications.

The .dwg format is an undocumented binary format which would be completely opaque without deciphering the correct byte offsets and bit alignments in which the data is encoded. The Open Design Alliance has published many of their findings about these details. Art Haas (of Pythoncad) has attempted to implement an open source parser based on this publication. I have had some involvement with this python-based parser, and will be referencing both it and the ODA's publication during my work.

By contrast, the .dxf format is a documented, plaintext format. However, .dxf are not completely equivalent to .dwg files and AutoCAD(TM) defaults to save .dwg files. This means that saving a .dxf file requires an extra, manual step for most users and thus risks going stale relative to the definitive data. Consequently, .dwg formatted files have become the de facto means of storing and transmitting design data.

Benefits to the Linux and Open Source Community

The options for Linux users in dealing with .dwg files are currently limited exclusively to proprietary applications, many of which require Windows emulation or virtualization. Those vendors who do publish Linux applications have typically only provided x86 binaries. An open source option in this realm removes the limitations around CPU and operating system, making it easier for users to choose a Linux desktop.

The .dwg format is also a barrier to the development of open source CAD programs and special-purpose automation tools. Adoption of an open source program is typically contingent on support for a user's existing files, but investing a huge effort into format conversion is overwhelming for a new project. This creates a deadlock in innovation, where communities fail to form because an open source application in this space fails to achieve the "minimal practical usefulness" required to capture enough mindshare to sustain itself. By providing an open source tool to access to this commonly used format, this project will foster innovation and increased adoption of open source software.

While the .dwg binary format in particular represents a large hurdle of complexity, effort, and "tedious" programming, vector graphics format conversion in general is difficult and unrewarding enough that it fails to appear on most lists of "fun things to do this weekend." The VectorSection project aims to lower these technical and motivational barriers by providing a large amount of functionality for a minimal investment of effort. With support for the .dwg format, VectorSection will build momentum toward becoming a universal solution to access and share vector graphics data.

Plan and Phases

This proposal will focus on adding .dwg read-in and .dxf write-out support to the OVD hub, as well as suitably refining the internal data representation toward expressing 2D mechanical drafting data and 3D wireframe/mesh data. The end result will be two newly created "connectors" (dwg2ovd and ovd2dxf) as well as improvements to the existing dxf2ovd connector.

These improvements will build momentum for the VectorSection project by providing end-user tools which solve real-world problems. The addition of these connectors also lays the groundwork for exponential growth in interoperability, where each additional connector will be able to leverage all of the supported formats.

The .dwg format and .dxf format are essentially alternate representations of the same format. They are even parallel in that an entity's attributes in .dwg are encoded with the same ordering as with .dxf. Thus, the existing and ongoing work on dxf2ovd will inform the binary parser design and assist with translating and normalizing entities into the OVD data structure.

In each successive version, the .dwg format has undergone changes at the byte level which will require the parser to contain special cases based on the .dwg version. The following table lists the identifier code (found in the file header) and the versions of AutoCAD(TM) which are believed to have emitted each identifier.

 AC1018    2004, 2005, 2006
 AC1015    2000, 2000i, 2002
 AC1014    14
 AC1012    13
 AC1009    12, 11
 AC1006    10
 AC1004    9
 AC1002    2

For this project, the primary focus will be on the 1015 format. Support for 1014, 1018, and additional versions will be added subject to availability of time, sample data, and reliability of format documentation.

Phase 1: The .dwg Parser Rough-in

The first phase of this project will involve reading the .dwg file headers and stepping through the entities. Experiments indicate that it will be possible to step through the format without knowing the meaning of every byte, but also that some files may include unrecognized byte sequences which cannot be skipped. This suggests that the parser needs to be designed to accommodate failure in order to be useful at an early stage. The parser will also be designed to support multiple format versions with minimal extra code.

The entities supported in this phase will be limited to basic 2D objects and their essential properties (line, lwpolyline, point, circle, arc, ellipse, text). This will provide early usefulness in terms of data extraction while allowing the effort to remain focussed on the binary parser.

Some drawing-level data (units, etc) will also be extractable at this phase.

Tasks:

  1. Parser design rough-in
  2. Verify available .dwg documentation and prior art
  3. Identify and investigate unknown byte sequences
  4. Organizing and analyzing sample files
  5. Supporting extraction of basic entities
  6. Supporting additional .dwg versions

Deliverables:

  • dwg2ovd connector with basic entities supported for AC1015

Phase 2: OVD Entities and ovd2dxf

Once the .dwg parser is functional, the extracted data needs to be interpreted into a meaningful form. This phase of work creates a dxf output connector and refines the Open Vector Drafting (OVD) hub specification to support additional entities, properties, and concepts. Given the simplicity of the .dxf format, some enhancements to the existing dxf2ovd connector will be used to inform the OVD specification. Those enhancements will then be "ported" to the binary .dwg parser during the final phase.

Tasks:

  1. OVD drawing-level data definitions
  2. OVD entity refinement and specification
  3. Output of .dxf headers and drawing-level data
  4. Output of .dxf entities

Deliverables:

  • dxf2ovd connector refinements
  • ovd2dxf connector with basic entities supported
  • OVD schema and specification

Phase 3: Refinement and Additional Entities

The work done in the final phase will be largely dependent on the findings from the previous phases, and informed by community feedback. The efforts here will focus on improving the round-trip translation support, supporting more entities in the .dwg/.dxf parsers, and refining the OVD specification accordingly. Support for block/xref (drawing inclusion) entities, page layouts, and viewports will be added, time permitting. Finally, the installer and documentation will be improved based on community feedback.

Tasks:

  1. Binary parser bugfixes
  2. Additional entities in dwg2ovd and ovd2dxf
  3. Initial support for referenced/included drawings
  4. Initial support for page layouts
  5. Build and installation improvements
  6. Documentation

Deliverables:

  • dwg2ovd supporting additional entities and properties
  • ovd2dxf supporting additional entities and properties

Timeline

The work will involve roughly 12 weeks of effort, which could be completed within 3-5 months from commencement.

License

All code will be published under Version 2 of the GNU General Public License (GPLv2.)

About the Developer

Eric Wilhelm is an accomplished software developer with experience in engineering, manufacturing, design automation, and graphics processing. His software assisted A. Zahner Co. in the automated design and fabrication of the cladding on the de Young Museum in San Francisco, the Hunter Museum of American Art in Chattanooga, and the Ohr-O'Keefe Museum of Art in Biloxi. He has been extensively involved in the Linux and open source community, and is the author of a large number of CPAN modules (many of which address 2D and 3D graphics programming.) Eric is president of the Portland Perl Mongers user group, and organized the Perl Foundation's involvement in Google's Summer of Code student program in 2008.

References:

  • http://scratchcomputing.com
  • http://search.cpan.org/~ewilhelm
  • http://scratchcomputing.com/images/deYoung-pic.jpg
  • http://en.wikipedia.org/wiki/Image:Hunter_Museum_of_American_Art.jpg
  • http://www.georgeohr.org/portal/NEWMUSEUM/tabid/146/Default.aspx
  • http://www.metropolismag.com/cda/story.php?artid=1565

Linux is a trademark of Linus Torvalds. Website Copyright © 1999 – 2010 Linux Fund, Incorporated, a 501(c)(3) Organization.
website@linuxfund.org