|
|
1.1 root 1: Microsoft Foundation Classes Microsoft Corporation
2: Technical Notes
3:
4: #2 : Persistent Data Format
5:
6: This note describes the MFC routines that support persistent
7: C++ objects and the format of those objects in a persistent store.
8:
9: =============================================================================
10:
11: The Problem
12: ===========
13:
14: The MFC implementation for persistent data relies on a
15: compact binary format for saving data to disk. This format
16: is distinct from the format used for diagnostic output
17: of class objects for two reasons: (1) diagnostic output is
18: human readable, and (2) maximum space efficiency is desired
19: when saving to a persistent store (usually a disk). It is
20: for these reasons, that MFC does not provide a polymorphic
21: interface for storing objects, as is common in other "pure"
22: object-oriented languages, such as Smalltalk-80.
23:
24: MFC solves this problem by using the class CArchive. A
25: CArchive class object provides a context for persistence that
26: lasts from the time the archive is created until the CArchive::Close
27: member function is called, either explicitly by the programmer, or
28: implicitly by the destructor when the scope containing the CArchive
29: is exited.
30:
31: This note describes the implementation of the CArchive protected
32: members WriteObject and ReadObject. ReadObject and WriteObject
33: are never called directly by end users; these member functions
34: are used to implement persistent objects. Remember that end users
35: should use the class-specific type safe insertion and extraction
36: operators, enabled by including the DECLARE_SERIAL and IMPLEMENT_SERIAL
37: macros in a CObject-derived class. Similarly, the end user rarely
38: calls the virtual member function CObject::Serialize directly, unless the
39: object being stored is embedded in another class object, in which case
40: the exact type of the object is known.
41:
42:
43: NOTE: This note describes code located in the MFC
44: source file ARCHIVE.CPP.
45:
46: =============================================================================
47: Saving objects to the store (CArchive::WriteObject)
48: ===================================================
49:
50: The protected member function void CArchive::WriteObject(const CObject*)
51: is responsible for writing out enough data so that the object
52: can be correctly reconstructed. This data consists of two parts:
53: the type of the object and the state of the object. This member
54: function is also responsible for maintaining the identity of the
55: object being written out, so that only a single copy is
56: saved, regardless of the number of pointers to that object
57: (including circular pointers).
58:
59: The saving (inserting) and restoring (extracting) of objects
60: relies on several manifest constants. These are values that
61: are stored in binary and provide important information to the
62: archive (note the "w" prefix indicates 16-bit quantities).
63:
64: wNullTag // used for NULL object pointers (0)
65: wNewClassTag // indicates class description that follows is new
66: // to this archive context (-1)
67: wOldClassTag // indicates class of the object being read
68: // has been seen in this context (0x8000)
69:
70: When storing objects, the archive maintains a CMapPtrToWord
71: (the m_pStoreMap) which is a mapping from a stored object to a
72: 16-bit persistent identifier (PID). A PID is assigned to every
73: unique object and every unique class name that is saved in
74: the context of the archive. These PIDs are handed out sequentially
75: starting at 1. It is important to note, that these PIDs have
76: no significance outside the scope of the archive, and in
77: particular are not to be confused with the "record number" or
78: other identity concepts.
79:
80: When a request is made to save an object to an archive
81: (usually through the global insertion operator), a check is made
82: for a NULL CObject pointer; if the pointer is NULL the wNullTag is
83: inserted into the archive stream.
84:
85: If we have a real object pointer that is capable of being
86: serialized (the class is a DECLARE_SERIAL class), we then check
87: the m_pStoreMap to see if the object has been saved already, and if
88: that is the case we insert the 16-bit PID associated with that
89: object.
90:
91: If the object has not been saved before, there are two possibilities
92: we must take into account, either both the object and the
93: exact type (i.e. class) of the object are new to this archive context,
94: or the object is of an exact type already seen. To determine
95: if the type has been seen we query the m_pStoreMap for a CRuntimeClass
96: object (formally, CRuntimeClass is a structure to avoid problems
97: associated with meta-classes) that matches the CRuntimeClass
98: object associated with the object we are saving. If we have seen this
99: class before then WriteObject inserts out a 16-bit tag that is the
100: bit-wise OR'ing of wOldClassTag and this index. You will note
101: that this operation imposes a hard limit of 32766 indices per
102: archive context. This number represents the maximum number of
103: unique objects and classes that can be saved in a single archive,
104: but note that a single disk file can have an unlimited number
105: of archive contexts. If the CRuntimeClass is new to this archive
106: context, then WriteObject will assign a new PID to that class
107: and insert it into the archive, preceded by the wNewClassTag value.
108: The descriptor for this class is then inserted into the archive
109: using the CRuntimeClass member function Store. CRuntimeClass::Store
110: inserts the schema number of the class (see below) and the
111: ASCII text name of the class. Note that the use of the ASCII
112: text name does not guarantee uniqueness of the archive across
113: applications, thus it is advisable to tag your data files to
114: prevent corruption (imagine distinct applications that both
115: define the class CWordStack, for example). Following the
116: insertion of the class information, the archive places the
117: object into the m_pStoreMap and then calls the Serialize member
118: function to insert class-specific data into the archive. Placing
119: the object into the m_pStoreMap before calling Serialize prevents
120: multiple copies of the object from being saved to the store.
121:
122: When returning to the initial caller (usually the root of the
123: network of objects), it is important to Close the archive.
124: If other CFile operations are going to be done, the CArchive
125: member function Flush MUST be called. Failure to do so will
126: result in a corrupt archive.
127:
128:
129: =============================================================================
130: Loading objects from the store (CArchive::ReadObject)
131: =====================================================
132:
133: Loading (extracting) objects uses the protected
134: CArchive::ReadObject function, and is the converse of WriteObject.
135: As with WriteObject, ReadObject is not called directly by user code;
136: user code should call the type-safe extraction operator (enabled by
137: DECLARE_SERIAL/IMPLEMENT_SERIAL), which then calls ReadObject.
138: This extraction operator will insure the type integrity of the extract
139: operation.
140:
141: Since the WriteObject implementation discussed above assigned
142: increasing PIDs, starting with 1 (0 is predefined as the NULL object),
143: the ReadObject implementation can use an array to maintain
144: the state of the archive context. When a PID is read from
145: the store, if the PID is greater than the current upper
146: bound of the m_pLoadArray, then ReadObject knows that a
147: "new" object (or class description) follows.
148:
149:
150: =============================================================================
151: Schema numbers
152: ==============
153:
154: The schema number, which is assigned to the class when the class'
155: IMPLEMENT_SERIAL is encountered, is the "version" of the
156: class implementation. The schema refers to the implementation
157: of the class, not to the number of times a given object has been
158: made persistent. Properly, the latter is usually referred to as the
159: object version. If you intend to maintain several different
160: implementations of the same class over time, incrementing the schema
161: as you revise your object's Serialize member function implementation
162: will enable you to write code that can load objects stored using older
163: iterations of the implementation.
164:
165: The CArchive::ReadObject member function will throw a CArchiveException
166: when it encounters a schema number in the persistent store that differs
167: from the schema number of the class description in memory. If your
168: implementation of Serialize for a class with multiple schemas catches this
169: exception, you will be able to continue the extraction operation taking
170: into account the differences in the implementation of the Serialize
171: member function.
172:
173:
174: =============================================================================
175: CRuntimeClass
176: =============
177:
178: The persistence mechanism uses the CRuntimeClass data
179: structure to uniquely identify classes. MFC associates one
180: structure of this type with each dynamic and/or serializable class in
181: the application. These structures are initialized at application
182: startup time using a special static object of type CClassInit. You
183: need not concern yourself with the implementation of this information,
184: as it is likely to change between revisions of MFC.
185:
186: The current implementation of CRuntimeClass does not support
187: multiple inheritance (MI). This does not mean you cannot use MI
188: in your MFC application, but it does imply that you will have
189: certain responsibilities when working with objects that have more than
190: one base class. The CObject::IsKindOf member function
191: will not correctly determine the type of an object if it
192: has multiple base classes. Therefore, you cannot use CObject
193: as a virtual base class, and all calls to CObject member functions
194: such as Serialize and operator new will need to have scope qualifiers
195: so that C++ can disambiguate the function call. If you do find
196: the need to use MI within MFC, then you should be sure to make the
197: class containing the CObject base class the leftmost class in the
198: list of base classes.
199:
200: For advice on the uses and abuses of MI, a good reference is
201: "Advanced C++ Programming Styles and Idioms" by James O. Coplien
202: (Addison Wesley, 1992).
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.