Archiving the united Germany: II

     

    Unification and E-records: The example of East Germany’s "Kaderdatenspeicher"

     

    Michael Wettengel

              Michael Wettengel

     

     

     

    by Dr Michael Wettengel

    Archivist,

    German Federal Archives (Bundesarchiv)

     

     (see end notes)

    Abstract

    After German unification, many former East German government agencies and institutions were closed down.  Archivists had to secure not only their paper records, but also a considerable number of machine-readable data holdings.  Very often, however, the documentation of these electronic records proved to be incomplete or even totally missing.  In those cases, like the "Kaderdatenspeicher", the database of files on party functionaries, different approaches were taken to identify and verify data file structures and to reconstruct missing documentation. 

     




     

    1.  Introduction

    The process that led to German unification was rapid and spectacular.  As nobody could foresee the dynamics of change and the sudden collapse of the former German Democratic Republic (GDR, East Germany) that caused the unification of the two German states, procedures to handle the various problems of this period of transition had to be improvised. 

     

    German archivists were confronted with a situation without precedent as well.  After 45 years of separation and different institutional traditions, the former East German Central State Archives were merged with the West German Federal Archives in October 1990. 

     

    At the same time as this reorganization took place, archivists had to face considerable challenges.  When, suddenly, former East German government agencies and institutions were closed down, not only their paper records, but also a considerable number of machine-readable data holdings had to be secured or rescued from possible destruction.  Whereas paper records were treated with professional routine, concepts and procedures for the acquisition, appraisal, description, and management of machine-readable records were lacking.

     

    The new situation helped to bring about a change in German archivists’ attitudes towards electronic records.  Whereas previously, little attention was paid to machine-readable material, the need to take care of large quantities of East German data files revealed the necessity of a stronger commitment in that field. 

     

    The Federal Archives decided to establish a section for machine-readable archives, which became responsible for electronic records from former East German central agencies and institutions as well as from federal government offices.  Furthermore, this section was charged with advising these federal offices on information management issues.  The section was set up in August 1991 but not provided with staff and basic technical equipment until summer 1993.  By then, much precious time had already been lost. 

     

    The experiences with securing East German data files showed that the creating organizations were not the best custodians of machine-readable archives.  Many data files were no longer legible and data documentation was at least incomplete or missing in most cases.  Federal offices only cared for these electronic records in so far as they could use them for their purposes.  However, these experiences also showed that in a world where state and society are in constant transition, it makes sense to have archivists engaged in electronic records management and taking records of permanent value into their custody.

     

    2.  Conditions of acquisition


    "Applications for PCs with small hard disks were not introduced in East German government offices until shortly before the collapse of the GDR."

    Michael Wettengel

    In the former GDR, machine-readable data holdings had been processed by centralized mainframe systems in big data processing centres that belonged to the State and received their commissions from government agencies and party institutions.  In most cases, they were even institutionally affiliated with one or another of these agencies.  Data processing centres throughout the East German territory performed tasks and carried out orders from central government agencies. 

     

    Office automation systems had been unknown in East Germany, and the first applications for PCs with relatively small hard disks were not introduced in East German government offices until the second half of the 1980s, shortly before the collapse of the East German state.  Generally speaking, the GDR had yet to begin the introduction of decentralized desktop personal computers and local server networks. 

     

    With the coming of formal unification in October 1990, East German state agencies and institutions that were not taken over by federal offices or one of the newly established federal states ("Länder") were either privatised or dissolved.  The same happened to many data processing centres throughout the territory of the former GDR.  Therefore, archivists who tried to take over electronic records were confronted with varying situations, depending on what happened to the respective data centres after unification. 

     

    Archivists had the easiest time working with data processing centres still in operation and now operated by a federal government agency or a Länder.  In such instances, sufficiently documented data holdings could be acquired, and it was easy to obtain information from operators and programmers. 

     

    Very often, however, data processing centres were in operation for only a short time before they were closed.  In these cases, a process of decay in operation and organization was already underway while the various centres were still in existence.  Specialists from these centres tried to find new jobs elsewhere and took with them both knowledge and the relevant manuals and data documentation, which they regarded as their personal property.  Typically, only the data carriers were left to the archivists.

     

    The situation was better in those cases where the data processing centre was closed down immediately and the doors were locked.  Archivists had to enter sealed rooms, where they were confronted by huge piles of paper records, printouts, manuals, card indices, floppy disks, tapes, hard disk plates, and punch cards.  But as data processing centres in the former GDR were required to create and maintain sufficient documentation on every project in at least three different copies, chances were good to find enough context information along with the data files. 

     

    The situation was much worse in those data processing centres that had been privatised after unification.  These newly established private companies considered data holdings, which had been processed for government agencies before 1990, to be part of their business capital.  They did not refrain from selling former East German government data files.  Even in cases whereby a company acknowledged that these data files were now federal property, they nevertheless charged a tremendous fee for the alleged preservation of the data. 

    Statis Bunde
    Statistisches Bundesamt

     

    As can be seen from these different examples, much depended on whether there was a federal or state agency that took care of East German data files.  In the case of the statistical data holdings of the former GDR, these records have been secured by the Federal Office for Statistics (Statistisches Bundesamt). 

     

    The former East German Central State Administration for Statistics (Staatliche Zentralverwaltung für Statistik), that created these records, became a branch of the Federal Office for Statistics, whereas the former Data Processing Centre for Statistics (Datenverarbeitungszentrum Statistik) continued operation until the end of 1992 under the Common Office for Statistics of the New Länder (Gemeinsames Statistisches Amt der neuen Bundesländer).  By the end of 1991, the Federal Archives and the Federal Office for Statistics agreed on a formal co-operation in order to secure East German statistical data files. 

     

    Even if conditions for acquisition were good, as in the example of the statistical records, this did not mean that archivists could easily take over the files.  Thus, for instance, legal obstacles had to be overcome.  The Commissioner charged with the oversight and implementation of German privacy legislation (Bundesbeauftragter für den Datenschutz) demanded that all personal identifiers in East German data files should be deleted.  In addition, the Federal Office for Statistics claimed that statistical secrecy prevented the transfer of statistical data files with single items of data to the Federal Archives. 

     

    Despite these various problems, the Federal Archives have been successful in acquiring East German data holdings without alterations of the data in most cases.  Machine-readable records in the fields of statistics, economy, agriculture, education, penal registration, and labour have been taken over.  The Federal Archives Law, which was amended in 1990, provided the legal claim to take over East German records.  With the help of staff of the former East German Central Archives in the newly established GDR-division of the Federal Archives, appraisals and acquisitions of these records began.  The GDR archivists provided much necessary information for the description of the data files, information that proved to be very important if the original documentation was missing.

     

    3.  Media, record structures and codes

    Data processing systems in the territory of the former GDR proved to be not entirely different from those in the Western world.  East German computer centres possessed mainframe systems for the processing of large data compilations, as was common in Western countries about twenty years earlier.  East German data holdings usually had hierarchical file structures that were not very complicated.  The hard- and software used by East German data processing centres were copies of and variations on Western models, naturally with different names.  For instance, the so-called ESER-mainframe systems in East Germany were copies of IBM-mainframes.  These facts, of course, greatly facilitated the work of archivists. 

     

    As storage media, primarily 9-track tapes were used.  Many of them had only a density of 800 bpi.  Owing to production problems, these tapes bearing the East German trade marks ORWO or PYRAL proved to be in very poor condition.  Glue and abrasion had to be removed from the tapes before they could be read.  Sometimes, layers of the tape separated after the first reading because of insufficient binder.  In order to secure the data, the tapes had to be copied as soon as possible.  Although blocks or even whole tapes could often no longer be read physically, there generally existed at least one backup copy.  Therefore, data losses could be compensated for in many cases.  Magnetic hard disk plates had also been used as a storage medium.  As a result of their uneven surface, those plates sometimes damaged the reading heads.  Programs and job files were usually stored on tapes, on punch cards and on 5.25 or 8-inch floppy disks.  The physical state of the data files depended on when the information was stored on the tapes and on the storage conditions in the stack area.  If these conditions were inappropriate, up to 40% of the tapes could no longer be physically read after five years. 

     

    The labelling of the tapes followed the IBM scheme, with hardly any variation.  Similar to Western IBM-mainframe applications, EBCDIC was used as code.  The Russian code DKOI (in the former GDR also called “ESER Code”), which in translation means Binary Code for Information Interchange, could also be found in East German data files.  DKOI is very similar to EBCDIC and is basically an enlargement of EBCDIC with a few variations and some extra symbols:

     

    Hexadecimal

    4A

    4F

    5A

    5B

    6A

    A1

    C0

    D0

    E0

     

     

     

     

     

     

     

     

     

     

    DKOI

    [

    !

    ]

    O

    |

    ¾

    {

    }

    \

     

     

     

     

     

     

     

     

     

     

    EBCDIC

    ‘

    |

    !

    $

    (none)

    (none)

    (none)

    (none)

    (none)

     

     

    Binary-coded numerical values, often used alternately with fields in EBCDIC representation, have been typical features of East German data files, too.  The frequent use of data compression techniques provided a particular problem to archivists.  The record length was generally variable - a characteristic common to many Western IBM-mainframe applications, as well.  However, the data fields in East German records were usually not separated by delimiters. 

     

    East German holdings had been collected and processed for very specific purposes in the fields of statistics, social and economic policy planning, personnel management, distribution of goods, labour employment, and workforce distribution.  Large data collections of statistical files, goods and production files, and personnel files had been processed with the help of Assembler or PL/1 programs, which are highly dependent on the mainframe environment of the data processing centres.  Due to their sequential, hierarchical file structures, these machine-readable records were archived as “flat files”, that is to say, as mere sequential bit strings. 

     

    4.  Reconstructing documentation: The Kaderdatenspeicher

    In order to understand the content of East German data files, it was of high importance to obtain complete documentation.  Archivists were not only looking for program and data file documentation in a limited sense, but also for the relevant context information on the “history” and the various purposes of the data file.  As a minimum requirement, the Federal Archives ensured it could receive the data file structure, the number of data sets, the data values, complete codebooks, compression algorithms, and a list to identify the content of each tape.  In spite of this general rule, it was decided in rare instances to take over data files because of their informational value, although not even this basic information could be obtained. 

     

    GDR MSS
    Ministerium für Staatssicherheit

    One of these data files, the so-called database of party functionaries or “Kaderdatenspeicher”, may serve as an example.  The Kaderdatenspeicher contains personal data on 331,980 staff members (in 1989) of all former East German government agencies, excluding those of the Ministry of State Security (Ministerium für Staatssicherheit), the Ministry for National Defence (Ministerium für Nationale Verteidigung), and the Ministry of the Interior (Ministerium des Innern).  These files not only provide insight into the political and professional career of officials, but also contain information on their parents. 

     

    There were several copies of the Kaderdatenspeicher, of which the only one that still exists is the one acquired by the Federal Archives.  At least in one case, there is sufficient evidence that one copy of the Kaderdatenspeicher had been deliberately deleted shortly before the German unification in order to protect cadre members.  The considerable value of this holding provided an incentive for the Federal Archives to invest quite heavily into the reconstruction of its documentation. 

     

    After first copying the tapes of the Kaderdatenspeicher, the volume labels, the headers, and the first blocks of data of each file were printed out.  The volume labels and headers followed the IBM-scheme, so it was easy to comprehend.  From these data, information on the content of each tape and an initial idea of the different generations and applications of the Kaderdatenspeicher could be obtained. 

     

    However, one typical problem already became apparent at this early stage: In the few lines of the volume label and headers, three different ways were used to express the date:

     

    1. Day, month, year (ddmmyy) in EBCDIC (e.g.  180388 means March 18, 1988);

     

    1. Number of year and number of the day in that year (yyddd), both in EBCDIC (e.g.  88168 means 168th day in 1988 = June 16, 1988);

     

    1. Day, month, year (dmy) counted from 1 to 9 in numbers and from there on in alpha characters, thus:  Number of day = 1, 2, 3, 4, 5, 6, 7, 8, 9, then A to V, number of the month = 1 to 9, A, B, C, number of year in decade = 0, 1, ...  9.  (e.g.  V18 means 31st day in first month in eighth year of the 1980s = January 31, 1988).

     

    Of course, there are many more possibilities for expressing dates, especially considering the different ways of “packing” dates and numbers. 

     

    There is, for instance, a very common method of storing in only two bytes any date from the 20th century:

     

    Nine bits for the number of days in the year (0 to 511) and 7 bits for the number of years (0 to 127), starting with 1900.  This way of expressing the date again leaves two options, starting either with the days or years.

     

     

    Example

     

    Byte 1

    Byte 2

     

     

     

    either

    yyyy yyyy

    dddd dddd

    or

    dddd dddd

    yyyy yyyy

     

     

     

     

     

    There is also the possibility of expressing a date by counting a bit for every day (or whatever) since a system-dependent fixed date.  These so-called “timer-tics” are extremely difficult to decipher if the fixed date is not known.  In East German data files, many different possibilities were used to express dates or numbers. 

     

    The data sets of the Kaderdatenspeicher showed that only the full name, the Personal Identification Number (Personenkennziffer or PKZ), the address, and the agency were written in plain EBCDIC.  The Personal Identification Number was a unique number given to every citizen of the former GDR at birth.  By this number, every East German citizen could be identified.  East Germans carried this number with them in all official records throughout different life situations, be it professional career or imprisonment.  This Personal Identification Number was also the key to a flourishing exchange of personal data between different East German data processing centres, uninhibited by privacy legislation. 

     

     

    Structure of the Personal Identification Number

    “ddmmyy s cccc x”

     

     

     

    ddmmyy values =          

    Date of birth

    Two digit numbers for each day/month/year

     

    s value =

     

    Sex and century of birth.

    “2” = male born before 1900, 

    “3” = female born before 1900, 

    “4” = male born after 1900,    

    “5” = female born after 1900.

     

    cccc values =                            

    Location code. 

    For individuals born before 1970, place of residence.  

    PKZ used for individuals born after 1970 birthplace.   

       

    x value =                                       

    System control digit

     

     

    All the other data fields were coded by numerical values, represented as binary figures.  The record length of the Kaderdatenspeicher is variable.  Binary codes and packing methods had been quite common in East German data files, and the methods used often varied.  Fortunately, no further compression algorithms had been used in the case of the Kaderdatenspeicher.  The Kaderdatenspeicher had been processed by the help of Assembler programs.

     

    It became clear that without a precise description of the data file structure, there was no way to understand the meaning of the data.  Therefore, as much information on the Kaderdatenspeicher as possible was needed.  The orders and commissions to create and process the Kaderdatenspeicher came from the Council of Ministers (Ministerrat der DDR).  The vertical files of this office had been added to the collections of the Federal Archives in Potsdam after unification.  After searching these holdings for references to the Kaderdatenspeicher-project, a series of records that contains descriptions of the Kaderdatenspeicher and reports from the data processing centre with a lot of substantial information could be found. 

     

    These paper records provided information on the content, purpose, history, and development of the Kaderdatenspeicher project, in particular:

     

    • Who planned the Kaderdatenspeicher and who gave the orders,
    • Which agencies co-operated,
    • What were the different aims and purposes of the Kaderdatenspeicher,
    • What information was contained in the Kaderdatenspeicher,
    • How information was collected,
    • Which versions and updates of the Kaderdatenspeicher existed and which computer centres processed and stored them,
    • Who had access to which portions of the information contained in the Kaderdatenspeicher, and
    • How information was used. 

     

    The reports to the Council of Ministers also contained information on the data file structure and codebooks.  The Kaderdatenspeicher consists of annual compilations, so-called “generations” of data files for the year 1980 and for each year from 1985 to 1989 as well as of extracts for various purposes.  Almost all of these data files have at least a slightly different structure.  Nevertheless, the data file structures of all generations of the Kaderdatenspeicher could be found.  Much information could be inferred from so-called “address tables” (Adressentabellen), which represent the record layout of a specific file. 

     

    In some instances, the content of data fields could also be concluded from the formulas for the collection of the data, of which specimens were found in the records.  Of course, comparing the items in the formulas with the content of the data fields was only possible if the data items were not expressed in binary figures. 

     

    The data flow between East German data processing centres mentioned above proved to be another source of information in the effort to reconstruct lost documentation.  This exchange of large quantities of coded data could only operate on the basis of shared codebooks.  In fact, the codes used in the big East German personal-related data holdings have been relatively stable and were often the same.  Diagrams could be found in the records, where the codes of different data holdings were compared.  What was meant to be a tool to facilitate data exchange is now a guide for archivists to find out which codes of data fields in different data holdings are the same. 

     

    The data files of the Kaderdatenspeicher were closely linked with the so-called staff databases of ministries and separate government branches (Arbeitskräftedatenspeicher), the data base containing personal data of staff members.  All the data of the Kaderdatenspeicher were originally collected from these staff databases.  The Federal Archives has been successful in acquiring a relatively comprehensive and complete documentation of the staff database of the Ministry of Public Education (Ministerium für Volksbildung).  Therefore, additional information on the record layout and the data file structure of the Kaderdatenspeicher could be derived from the documentation of the staff database of this ministry. 

     

    However, many questions remained open.  Even if the data file structure of a record, the address, length, and content of a specific field is known, it may still not be understood.  To take the simple example from above, there are many ways to express a date and the one used may not be known.  In these cases, specific software is used to analyse sequential files. 

     

    In order to obtain background information, archivists have also made contacts with former employees of East German data processing centres who had created or worked with the data holdings that were acquired.  In rare and difficult cases, for instance, when compression algorithms were used which could not be deciphered, programmers from former East German data processing centres were even hired as consultants. 

     

     

    5.  Access for researchers

     

    As it has been pointed out, different approaches had to be taken in order to identify and verify data file structures and to reconstruct documentation:

     

    • Analysis of labels and data,
    • Searching for documentation in the corresponding vertical files,
    • Studying the original data flow in order to identify shared code books and similar file structures,
    • Obtaining information from former employees, and
    • Last but not least: Using specific software. 

     

    In this way, much of the missing documentation of East German data holdings could be reconstructed.  However, although a number of fairly well-documented data files can already be presented for research purposes, most East German data holdings still remain a problem because of the specific hardware-background in which they were created.  Since the main goal of reconstructing documentation is to facilitate access to the data, additional efforts are necessary.

     

    For the long term preservation, East German data files are stored as flat files.  Apart from this “archival copy”, the Federal Archives are planning to create “research copies” with specific formats that are better suited for research purposes and easier to handle.  These “research copies” are not meant for archival preservation.  The Federal Archives has made an agreement with the Centre for Historical Social Research (Zentrum für Historische Sozialforschung) and the Data Archives for Social Research (Zentralarchiv für empirische Sozialforschung) in order to use the expertise and the technical facilities of these institutions to create research files of East German machine-readable records.  The aim of this co-operation is to promote historical social research on the former GDR. 

     

    Taking over East German data holdings has certainly been an extreme experience from which it is difficult to generalise.  However, some of the attitudes and procedures in East German computer centres are probably universal.  For instance, it seems that people working with computers love to play around with programs and data but are not particularly fond of documenting what they are doing.  A lot of what is important for future archivists and researchers of data holdings will always be in private notebooks or in the memories of system administrators and records creators.  However, preserving these archival holdings means ensuring their accessibility in the future, and reconstructing documentation may be one of the keys to it.

     

     

     


     

     

    Bibliography

     

    Bikson, T.  K.  and Frinking, E.  J.  (1993), Preserving the Present.  Toward Viable Electronic Records (The Hague).

     

    Buchmann, W.  (1989), ‘Archive und die elektronische Datenverarbeitung.  Ein Diskussionsbeitrag zu den Folgen der Einführung einer neuen Technologie für die Archive’ (Archives and electronic data processing.  A discussion paper on the effects of the introduction of a new technology for archives), in Kahlenberg, F.  P.  (1989) (ed.), Aus der Arbeit der Archive (from the work of Archives), Boppard, 243-256.

     

    Kahlenberg, F.  P.  (1992), ‘Democracy and Federalism: Changes in the National Archival System in a United Germany’, American Archivist, 55: 72-83.

     

    Angelika Menne-Haritz (1993) (ed.): Information Handling in Offices and Archives (München, London, New York, Paris).

     

    Mühlbauer, H.  (1995), Kontinuitäten und Brüche in der Entwicklung des deutschen Einwohnermeldewesens: Historisch-juristische Untersuchung am Beispiel Berlins (Continuity and caesura in the development of the German inhabitant registration system.  Historical-legal research on Berlin) (Frankfurt a M., Berlin, Bern, New York, Paris, Vienna).

     

    Trugenberger, V.  (1994), ‘EDV in deutschen Archiven - eine Zwischenbilanz’ (Electronic data processing in German archives – interim results), ABI-Technik 14, 4: 283-298.

     

    Wettengel, M.  (1993a), ‘Zum Stand der Archivierung maschinenlesbarer Daten im Bundesarchiv’(Archiving of machine-readable data holdings in the Federal Archives), Mitteilungen aus dem Bundesarchiv (Proceedings from the Federal Archives) 1, 1: 21-23.

     

    - (1993b), ‘System zur Archivierung maschinenlesbarer Daten im Bundesarchiv’ (Systems for Archiving machine-readable data in the Federal Archives), Mitteilungen aus dem Bundesarchiv, 1, 2: 70-72.

     

    - and Rathje, U.  (1994): ‘Datenspeicher Gesellschaftliches Arbeitsvermögen der DDR’ (Database “Social Workforce” of East Germany), Mitteilungen aus dem Bundesarchiv, 2, 3: 157-159.

     

     

    Paper first published: History and Electronic Artefacts,  Edward Higgs (Ed.), Oxford University Presds, 1998, pp.  265-276., and originally  presented to the Annual Meeting of the Society of American Archivists, Washington, D.C., Sept.  2, 1995. 

     


     

    For further reading on German archives and records management, see

    Disposition and Archiving of Authentic Electronic Records in the new Germany's "Information Network Berlin-Bonn",

      by Drs Andreas Engel and Michael Wettengel.

    Development and Traditions of Records Management and Archives in Germany,

      by Dr Nils Brübach.

     


     

    The Author

    Michael Wettengel (born in 1957) is an historian.  Since 1989, he has worked as an archivist at the Federal Archives (Bundesarchiv), Germany.  In 1991, he became the head of the newly established machine-readable archives section of the Federal Archives, and since 1999 he has been responsible for government records and electronic records management.  He also gives professional training courses for archivists on electronic records at the Federal Archives and at the Archives Institute (Archivschule) in Marburg. 

    BMI

     Bundesministerium des Innern

     

    He is a member of the board of Quantum (Association for Quantification and Methods in Historical and Social Research).  He participates in projects, committees and working groups concerning electronic records management and information technology in public administration, such as the pilot project on Document Management and Electronic Archiving (DOMEA) in the Federal Ministry of the Interior (Bundesministerium des Innern, BMI).  He is a member of the Committee on Electronic and Other Current Records of the International Council on Archives (ICA), of the DLM (Donnèes lisibles par machine - Electronic Records) Monitoring Committee of the European Commission in Brussels, and of Sub-committee 11 “Records Management” of ISO/TC 46.  He chairs the sub-committee NABD/AA 15 on records management of the German Institute for Standardisation (DIN). 

     

     



    To go to the RIMOS home page

    Click here






    To go to The Caldeson Consultancy main index page

    Click here