Design of the SKA Central Signal Processing (CSP) Element
Who are we?
The “Central Signal Processor” (CSP) Consortium is comprised of 13 signatories from 8 countries with more than 10 additional participating organisations. The Consortium includes a rich mixture of engineers, scientists and managers from various academic institutions, industry and government labs spread over 5 continents (see https://www.skatelescope.org/csp/ for more details). As might be expected, it has been a challenge to proceed efficiently with such a diverse and distributed team.
The lead organisation of the Consortium is the National Research Council of Canada (NRC). NRC has contracted MDA Systems Ltd. (MDA) to assist in leading the Consortium.
What are we designing?
The CSP Element includes design of the hardware and associated firmware/software necessary for the generation of visibilities, pulsar survey candidates and pulsar timing data from the telescope arrays. More background on the CSP can be found in the previous eNews submissions: http://newsletter.skatelescope.org/category/newsletter-archive/
Current Status of Design Activities
Since the last eNews submission in December the CSP team has completed another round of costing and has been updating the ICDs and progressing the requirements to support the System PDR and to pave the way to CDR. In addition, the team has supported the Cost Control Project initiatives providing suggestions for cost savings and evaluating the impacts of various scenarios.
The sub-element design teams have continued to progress with detailed design and prototyping focussing on the most promising architectures and technologies. There has also been much activity on the system engineering side (requirements, ICDs, modeling, processes, standards, ILS/RAMs) with contributions from New Zealand Alliance, STFC, CSIRO, ASTRON, Oxford, AUT, University of Manchester, NRC, Swinburne, and MDA. There are still challenges in finalising the Level 1, 2, and 3 requirements that are required for the efficient progression to CDR. It is unclear how the Cost Control Project will impact the the path to CDR.
Key Sub-element Design Development
Local Monitoring and Control (LMC)
The CSP Local Monitoring and Control (LMC) Sub-element is responsible for coordinating all the CSP processing functions according to commands from the Telescope Manager (TM), returning status rolled-up from the various processing sub-elements, and configuring and sequencing the sub-elements. This sub-element is being led by NRC with assistance from NCRA and INAF. The CSP LMC team is actively supporting SKAO led initiatives to define SKA standards and guidelines for implementation of the monitor and control system and the SKA software engineering process. The CSP LMC team is leading the effort on the definition of states, modes, commands and configuration and contributing to the definition of the design patterns for generation and handling of logs and alarms. Significant progress has been made on the definition ofinterfaces. The INAF team is developing a prototype based on the current version of the SKA Control Systems Guidelines.
LOW Correlator and Beamformer (Low.CBF)
Low.CBF held two face to face workshops during the past quarter: the Convergence Workshop and the Firmware Costing Workshop. The Convergence workshop was held from 7-14 December 2016 in Penticton, Canada. This workshop focussed mainly on architectural convergence – what does the hardware look like and how does it connect. There was much discussion on the hardware specifications (LRU, cooling, subracks), on what will be prototyped and the updates to be made to the Prototyping Plan for TIM#5 submission. With the pre-CDR milestone looming in the next quarter it was an opportunity for the team to review the status of our detailed design, with the goal of assigning sections to different team members, drafting up initial content or updating existing content and reviewing with the wider team. We paid a visit to our Mid.CBF colleagues at DRAO and participated in a fun tour of their facilities in snowy -13 degree conditions (Figure 1).
A second workshop specifically on Firmware was held at ASTRON, our Dutch collaborators’ site in Dwingeloo. Together the team wrote up the specifications of all 68 firmware modules in the system and estimated the time required to design, test and integrate each of them into the system. It was an interesting exercise to see each other’s strengths and weaknesses but in general the estimate variations were relatively small. The team submitted a new costing for the sub-element in which we reduced costs by about 25%. The week was finished off by participating in a sub-element cost review with CSP leads before the Australian team members had to rush off to catch their plane back home.
Early this year the Perentie Gemini Proof of Concept board also arrived and the testing of the high-speed optics and FPGA power consumption is well underway (Figure 2).
In the coming quarter the Perentie team is focussed on putting together the detailed design document as well as continuing hardware, mechanical, firmware and software prototyping activities required to support the sub-element CDR. Systems engineers keep abreast of ongoing requirements reviews, ICD updates and operational document deliverables. Management attends the weekly PMO telecons and forums to ensure that the Low.CBF team stays on track and meets the deadlines for its contribution to the CSP element CDR.
MID Correlator and Beamformer (Mid.CBF)
The Mid.CBF Sub-element is led by NRC and is based on a Stratix 10 FPGA solution. This is a joint effort with MDA, NZ Alliance, UPM Spain, and Selex ES (now Leonardo). The Mid.CBF team has spent the past quarter focusing on progressing CDR deliverables and prototyping activities on the following fronts:
- Firmware design to progress high risk areas of firmware development
- Excellent progress made on implementation of X-Part correlator and PSS beamformer.
- Monitor and Control Software running on embedded processors
- TANGO successfully running on Arria 10 embedded processor
- Significant progress made on Linux underlying drivers for M&C of FPGA IP blocks and FPGA configuration
- TANGO devices and GUIs under development to complete M&C software stack.
- Hardware design of TALON-SX LRU (Stratix 10 FPGA board)
- Schematic and component placement complete
- Board routing in progress
- Mechanical/thermal modelling and design work
- Many scenarios evaluated for liquid cooling using HFE7000 or water as facility liquid.
The Cost Control Project being undertaken by SKAO prompted the Mid.CBF team to develop an unsolicited proposal to significantly reduce the cost of Mid.CBF without impacting any key science capabilities of the Mid telescope. The “Frequency Slice Approach” design presents a significant reduction in the complexity of firmware and the amount of hardware required and results in cutting the cost of Mid.CBF nearly in half. This is achieved by reducing some of the commensal observation requirements and some of the sub-array flexibility. This new approach was discussed with key SKAO stakeholders at a meeting in Manchester in early March. Support was given to proceed with an ECP, which is currently in progress.
Work has been progressing rapidly on how to adapt the hardware platform under development for the “Frequency Slice Approach” and the system design is converging with a focus on cost and risk reduction. Areas currently under investigation include:
- Use of DDR4 instead of on FPGA High Bandwidth Memory in order to use FPGAs that are currently available.
- The reduced hardware required for the “Frequency Slice Approach” may allow the power density to be reduced to the point where air cooling is a feasible solution.
- Packaging of FPGA boards to mitigate EMI/EMC risks.
These investigations will conclude in the next month and the system design will converge so that the Mid.CBF team can move full speed ahead to the upcoming sub-element CDR.
Pulsar Search Engine (PSS)
The Pulsar Search Engine is a large sub-element of the CSP used to search for pulsars and fast transients that will have almost identical instances for both SKA-mid and low.The design team is led by the University of Manchester, University of Oxford and the Max Planck Institute for Radio Astronomy supported by input from INAF Italy, NZ Alliance, ATC Edinburgh, and ASTRON.
In the last few months we have been concentrating on preparation of the pre-CDR documentation pack. Alongside that we have been supporting the Cost Control Project and we have worked with the SKAO to work from a new baseline for our design which supports the processing of three PSS beams per compute node, instead of the previous two. This has been possible because of the detailed algorithm and prototyping work that we have been doing, which has allowed us to improve the processing efficiency and data throughput. This will also result in reduced cost and power consumption.
Work is also continuing apace with the prototype, protoNIP. Tenders were let for both the servers with GPUs and the FPGA boards and suppliers were chosen in early March. The hardware is currently under construction and will be delivered soon. This will be very good timing as it will provide further direct input from a reasonable scale version of the PSS hardware to support our design work and for improved power measurements. It has also provided invaluable information on the tendering, procurement, delivery, and soon installation, aspects of our design.
Work has also continued on the development of our test vector machine. A second server has been purchased which will be used to simulate the distribution of the data in real time, while the other servers, and protoNIP-like nodes will be used to capture and process it. The hardware has all been delivered and once the pre-CDR deadline is out of the way we will be setting up the system and using it for further testing of the algorithms and designs.
Pulsar Timing Engine (PST)
The Pulsar Timing Sub-element (PST) will perform high-fidelity, high-precision timing observations of known pulsars for both Low and Mid telescopes. The primary computational task performed by this instrument is phase-coherent dispersion removal, which requires performing many large Fast Fourier Transform operations in real time. The PST design is based on COTS hardware with GPU accelerators, and an early version of this solution is currently being commissioned at the MeerKAT telescope. Since our last update in December, Adam Deller joined the PST design team and will help to coordinate our efforts at Swinburne. Adam is the originator of the DiFX software correlator that has been widely adopted by the VLBI community. In addition to extensive software development experience, he brings leading expertise in pulsar astrometry to our team. In March we applied to Round 2 of the Australian SKA Pre-construction Grants Program to continue supporting our work toward Critical Design Review.
Path to CDR
Overall, the CSP Consortium has made good progress since December. The focus is to “freeze” the requirements and ICDs to support efficient progression to sub-element CDRs followed by element level CDR. This is very dependent on the outcome of the Cost Control Project. Nevertheless, the team is focussed and working hard to meet the deadlines.