Parliament’s botched digitisation may mean millions of precious documents were lost in the fire
Many scanned documents are impossible to read
- Valuable records in Parliament’s library are likely to have been damaged or destroyed in the fire earlier this month.
- A project about five years ago was supposed to create a digital store of Parliament’s archive.
- But quality-control samples suggest that nearly half the pages were not scanned properly, and there are troubling questions about how the project was managed, especially by Parliament itself.
A botched digitisation project has probably condemned irreplaceable documents to extinction following the fire in Parliament.
Over the course of two years, an outside company was paid millions of rands to scan Parliament’s collection of about 7,000 volumes of material or seven million individual pages.
Much of this collection was unique to Parliament — no other store of archival material in the country has a copy. Of particular importance are the annexures to the Hansard — the official record of Parliament’s deliberations going back to 1910. The records include unpublished government reports, annual reports, research, and manuscripts.
Parliament staff cannot yet access the buildings where these records are kept. While the library and stores at the National Council of Provinces are understood to be safe, a 4 January report to the Parliament Library warns that the entire collection in the stores at the National Assembly were affected by the fire, and may be lost. These stores contain South Africa’s entire pre-1994 parliamentary records.
The basement in the National Assembly that houses the archives is believed to have suffered grave water damage during the fire. At one point the water was chest-high, according to a report. This would be disastrous for the documents housed there, most of which are decades old.
The 4 January report offers some mitigation: these records have been digitised. But this is not accurate.
Almost half of the scans are semi-legible or even completely unreadable. This failure was identified by Parliament’s library staff after the records were scanned in 2017. The service provider returned to Parliament to rescan the botched records - but abruptly stopped this work long before it was completed.
Despite that millions of pages needed to be scanned again in order to be legible, the project was listed as being finished in official records.
In Parliament’s Annual Report for 2017/18, listed among the key achievements that year was the digitisation of “approximately seven million pages of rare and fragile library material which included books, Hansard, artwork, microfiche, photographs and maps”.
The annual report was presented to the Joint Standing Committee on Financial Management of Parliament (Joint Committee) on 31 October 2018 by Acting Secretary to Parliament, Baby Tyawa.
The legacy report of the fifth term of Parliament repeats this claim, saying that the Enhanced Library Service Project had digitised this huge archive and that the project had been completed in 2017.
But the monthly reports filed in 2017 by the Rare & Historical Information Services (RHIS), the division of the Parliamentary Information Centre that deals specifically with the huge archives at Parliament, show a different picture.
In January 2017, the librarian in charge of RHIS at the time, Ingrid Henrici, reported that 6,092 volumes had been scanned by that stage, or 88% of the total archive. This was being performed by a service provider called i-Kno, at just under R5 million according to Parliament’s spokesperson.
But the quality control that was done on the project suggested that the project was far from complete. Of the 1,746 scanned volumes that had been checked by the RHIS library staff, 44% had to be rejected. The Annexures to the Hansard and manuscripts were a particular problem. Three staff members had been assigned to the project.
At this stage, according to the delivery schedule in the report, the digitisation project was already two months overdue.
In the next report for February 2017, the situation is worse. “Delivery of re-scanned items did not take place as projected,” writes Henrici. Quality checks revealed more deficient scans, with the error rate rising to just under 50%. Of the 2,014 scanned volumes that had been checked, 1,001 had to be rejected and redone.
In March 2017, it was reported that a seventh harddrive used to store the digitised records had crashed, “a very concerning issue”, wrote Henrici. More worrying was that the quality control of the scans was “suspended until delivery of re-scanned volumes takes place”. The report states that the service providers had made no further deliveries of scanned materials, nor had they rescanned any new items.
The April report does not mention the digitisation project, and in June the only mention is a note that quality checks on the scans were “continuing slowly”.
In July, a farce: “An attempt was made by i-Kno to access Parliament to complete the outstanding rescans and pack up the equipment and remove it from Parliament. This was unsuccessful due to their vetting having expired and their problem in getting revetted due to outstanding tax documents.”
In its last available report, from September 2017, RHIS gives up. “Quality checking of previously digitised resources from the project has currently come to an end. QC will be put on hold as there are no further developments with regards to getting the project completed.”
These reports were sent to Albert Ntunja, the Chief Librarian of the South African Parliamentary Library at the time. From here, Ntunja should have reported on the progress of this project to the Divisional Manager of Knowledge and Information Services. In 2016, Neil Nel retired from this role, and his position was taken over by Dr Leon Gabriel.
In turn, Nel and Gabriel should have reported to the Deputy Secretary to Parliament: Core Business, who should have reported to the Secretary to Parliament.
The Secretary to Parliament at the time until his suspension for suspicion of maladministration and abuse of power in June 2017 was Gengezi Mgidlana. Baby Tyawa was appointed as Acting Secretary to Parliament after Mgidlana’s suspension, and remains in the role.
Someone in this chain of responsibility reported that this vast multi-million rand digitisation project had been satisfactorily completed, and this claim was confirmed in various official reports, and transmitted to the Joint Committee.
How was this failure swept under the rug? Why was it not identified as fruitless and wasteful expenditure in Parliament’s annual reports?
On 25 October 2017, Tyawa, Acting Secretary to Parliament, presented Parliament’s 2016/17 annual report to the Joint Committee, and announced that in that reporting period “irregular expenditure had decreased by 84%, from R15 million in 2016 to R2.4 million in 2017” while fruitless and wasteful expenditure had “increased by 29%, from R830,000 in 2016 to R1 million in 2017”.
The digitisation project was not considered among these expenditure items. Further, the project was not listed among the failed performance targets that year. In fact, no mention was given to this project at all in this performance report.
When Mgidlana presented Parliament’s 2015/16 annual report to the Joint Committee on 21 October 2016, he announced that of the fruitless and wasteful expenditure during the financial year, R257,000 was from “backlogs of digitisation of library books due to unavailability of Parliament staff”. Mgidlana’s presentation explained that the service provider (i-Kno) could not easily access Parliament or receive barcodes from Parliament staff during the period that staff were on strike. The annual report itself declares that the project under which the digitisation fell was mostly on track.
In the financial performance review of Parliament for 2016/17, the incomplete digitisation is not accounted for. Instead the R257,000 in wasted expenditure from the 2016 strike is rolled over.
The digitisation of the archives was designated as a sub-project in a larger effort (the so-called “Enhance parliamentary information centre services” project). Once the records were successfully digitised, they were to be entered into a database that could be accessed by a proprietary system called uVimba (Parliament’s central document management system). According to a mid-year report presented to the Joint Committee, uVimba was operational by mid-2018 - even if the digitised documents that could be accessed through uVimba were desperately deficient.
GroundUp has found no mention of the digitisation effort or the problems recounted in the RHIS reports in any documentation sent to the Joint Committee by Parliament officials between 2016 and 2018.
In response to our questions, Gerrit van Dyk of i-Kno and Smarter Image, said that i-Kno rescanned most of the items that were noted to be of poor quality and the items were provided on hard drives to Parliament for further review. He said that the scanned images were large due to the quality that Parliament required. Parliament did not have sufficient space on their servers for the images at the time, and so all the images were located on hard drives supplied by i-Kno.
“The main reasons for the ‘defective’ scans can also be found in the project documentation at Parliament. More than 80% of scans were successful and i-Kno did rescan those defective documents, as far as possible.”
In response to our questions, Parliament confirmed that the project ended in September 2017, as established in the RHIS report. That means that no further scanning or re-scanning was done by i-Kno after that date. Van Dyk of i-Kno likewise confirmed that to his knowledge, Parliament did not complete the checking of the rescans. (Read Van Dyk’s full response.)
But the volumes that we have seen were not rescanned. They are filled with defective pages.
According to Parliament: “Rescanned materials returned to Parliament were not rechecked and a final error rate was not determined. Library statistics confirmed that 95.35% of digitised materials were delivered to Parliament.”
But it is not clear how the 95% completion rate was achieved when read with the RHIS reports. The last report that provides information on total job completion says that 89% of all scans had been completed, excluding rescans. Quality control in the subsequent months slowed to a trickle and then stopped completely. Any digitisation from then on was done on demand by library staff.
Parliament concedes that the job was not done. “The Close-out report [to I-Kno’s contract] acknowledged that approximately 4.65% of the work was not delivered to Parliament.” As a result, according to Parliament, the 20% retainer, R1.1 million, was withheld. (Read Parliament’s full response.)
So while the contract was closed out, the work was not completed - and yet it was marked as complete in both the Annual Report and the Legacy Report.
What is possibly lost?
Parliamentary staff cannot yet access the library collection stored beneath the National Assembly. It is believed that the following archival materials are likely permanently damaged.
- The Annexures: all of the House of Assembly Reports covering the period from 1910-1980
- Commission Reports, including from Departmental Committees
- Hansard Debates
- Senate Debates
- Select Committee Reports from 1910-1992
- Legal Deposit Books
- Selected works of South African History and Geography - general collection items housed in the journals store were moved to the National Assembly store due to renovations to the National Council of Provinces and library
- Parliamentary Correspondence – Imperial Blue Books
- Statistical registers
- Bills 1910-1990
- Announcements, Tablings and Committee Reports
- House of Lords, Australia, New Zealand and Canada Hansard Debates
- Minutes of proceedings
- Question Papers
Almost all of UCT’s Hansard collection from 1920 was lost in last year’s fire. The National Library has copies and they will provide digital scans on demand but they will be missing annexures and many other documents only located in Parliament.
Parliament’s full response to our questions
QUESTION: Might you be able to confirm the price of the contract given to I-Kno as R16 million?
ANSWER: The total value of the contract was R4 863 475.44 (Excl. VAT). The Project was closed in September 2017. The Close-out report acknowledged that approximately 4,65% of the work was not delivered to Parliament. It was thus agreed to withhold the 20% retainer (R1,108,872.40 Incl. VAT) in the allocated budget of the project, which was thus not paid to I-Kno.
QUESTION: Can you confirm that Mr Ntunja recorded this project as being complete?
ANSWER: Yes, the project was closed in September 2017, however scanners were purchased as part of the overall Library Upgrade Project and internal staff were trained on digitization including quality assurance. This provided an opportunity for the work to be absorbed to line function and be part of the ongoing work of the Library.
QUESTION: Are you able to confirm precisely how much of the digitised archive has been quality checked and is free of error?
ANSWER: During the running of the project, the management of the Library identified the need to implement a second-tier quality assurance process by internal library staff, to ensure the quality of the digitized content received from I-Kno. It was agreed that a 30% random sample of materials scanned would undergo this 2nd level quality check with an acceptable error rate of 10%. Scanned materials that were
assessed with an error rate of above the 10% threshold were returned to the service provider for re-scanning. It should be noted that the library staff was very stringent on the 10% error, using absolute numbers up to 2 decimal places (e.g. if an acceptable error was 27,96 and the actual error was 28, this was deemed unacceptable and returned to the service
provider). The error rate during the operations of the project (i.e. the ‘working’ error through quality assurance) was therefore exaggerated.
Various quality shortfalls/errors were correctly revealed and highlighted as part of the quality assurance process and solutions devised to address these as part of the project implementation and internal management controls. Re-scanned materials returned to Parliament were not re-checked and a final error rate was not determined. Library statistics confirmed that 95,35% of digitized materials were delivered to Parliament.
QUESTION: Were the problems in the digitised archives reported to either of you?
ANSWER: The operations of the project were managed within the line function of the Library as per Parliament’s Human resources policies.
A Control Librarian was assigned responsibility for quality assurance and reported to the Chief Librarian. Matters that required specific intervention were escalated to the Division Manager:KISD. Internal operational reports and management reports were submitted monthly. These are intended for trouble-shooting, mitigation/corrective action and performance improvement.
QUESTION: Can you confirm that the archival collections are currently inaccessible and have possibly been destroyed by the fire or water?
ANSWER: Currently the area destroyed by fire is inaccessible as it is still a crime scene and investigation is still taking place. Therefore Parliament cannot at this stage confirm if the archival collections have been destroyed by fire or water.
Parliament issued an ill-tempered response to our article. Their main concern appears to be that we didn't include all their answers to all our questions inside the article.
In fact, as far as we can tell, we included all the substantive aspects of Parliament's reply that did not contradict Parliament's own reports or make no sense. We also included a clearly marked link to a PDF file with Parliament's full response (which remains in the text). This is standard media practice.
Nevertheless, we have updated the article by placing all our questions and all Parliament's responses in a gray box at the bottom of the article.
Previous: POPI Act: confessions of a convert
© 2022 GroundUp. This article is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You may republish this article, so long as you credit the authors and GroundUp, and do not change the text. Please include a link back to the original article.
We put an invisible pixel in the article so that we can count traffic to republishers. All analytics tools are solely on our servers. We do not give our logs to any third party. Logs are deleted after two weeks. We do not use any IP address identifying information except to count regional traffic. We are solely interested in counting hits, not tracking users. If you republish, please do not delete the invisible pixel.