Design of the SKA Central Signal Processing (CSP) Element
Who are we?
The “Central Signal Processor” (CSP) Consortium is comprised of 13 signatories from 8 countries with more than 10 additional participating organisations. The Consortium includes a rich mixture of engineers, scientists and managers from various academic institutions, industry and government labs spread over 5 continents (see https://www.skatelescope.org/csp/ for more details). As might be expected, it has been a challenge to proceed efficiently with such a diverse and distributed team.
The lead organisation of the Consortium is the National Research Council of Canada (NRC). NRC has contracted MDA Systems Ltd. (MDA) to assist in leading the Consortium. Sean Dougherty of NRC has now taken over as Consortium Leader from David Loop who has retired. We are very thankful to David Loop for his leadership over these years and wish him well in his retirement.
What are we designing?
The CSP Element includes design of the hardware and associated firmware/software necessary for the generation of visibilities, searching for new pulsar candidates, and pulsar timing data from the telescope arrays. More background on the CSP can be found in the previous eNews submissions: http://newsletter.skatelescope.org/category/pdf-version-of-enews/
Current Status of Design Activities
Since the last eNews submission in April the CSP Team has completed another round of costing in May and has been updating the ICDs and progressing the requirements to pave the way to CDR. In addition, the team has supported the Cost Control Project initiatives evaluating the impacts of various scenarios. At present the CSP Team has been directed to proceed with the “Frequency Slice Approach” for the Mid Correlator Beamformer and the 3 beams/node architecture for the Pulsar Search Engine. The Engineering Meeting in Rotterdam in June was well attended by the CSP Consortium and many good discussions were held.
The sub-element design teams have continued to progress with detailed design and prototyping as they approach their CDRs. There has been continued activity on the system engineering side (requirements, ICDs, modeling, processes, standards, ILS/RAMs). There are still challenges in finalising the Level 1, 2, and 3 requirements required for efficient progression to CDR.
Key Sub-element Design Development
Local Monitoring and Control (LMC)
The CSP Local Monitoring and Control (LMC) Sub-element is responsible for coordinating all the CSP processing functions according to commands from the Telescope Manager (TM), returning status rolled-up from the various processing sub-elements, and configuring and sequencing the sub-elements. This sub-element is being led by NRC with assistance from NCRA and INAF. The CSP LMC team is actively supporting SKAO-led initiatives to define SKA standards and guidelines for implementation of the monitor and control system and the SKA software engineering process. The CSP LMC team is leading the effort on the definition of states, modes, commands and configuration and contributing to the definition of the design patterns for generation and handling of logs and alarms. Significant progress has been made on the definition of interfaces. The INAF team is developing a prototype based on the current version of the SKA Control Systems Guidelines.
LOW Correlator and Beamformer (Low.CBF)
A large number of Perentie team members attended the SKA Engineering 2017 meeting in Rotterdam. This included Grant Hampson, Andre Gunst, Peter Baillie, Agnes Mika, John Bunton, Yuqing Chen, Koos Kegel, Gijs Schoonderbeek, Leon Hiemstra. We met our new CSP consortium leader from NRC Canada – Sean Dougherty. There were many parallel workshops about infrastructure, power, supporting data networks, signal processing, requirements and the TT-low meeting. The last day was CSP consortium day – another 8 hours of presentations. Everyone reached the overload point with presentations! During the Perentie presentation our Dutch colleagues showed the great progress they are making on the hardware. With pre-CDR milestone looming in the next quarter it was a good opportunity for the Perentie team to review the status of our detailed design document and other deliverables.
ASTRON have organised the manufacture of the Gemini LRU and backplane PCBs and the accompanying assembly. The PCB assembly is complete and testing has just begun. The PCB is fitted with a FPGA engineering sample, however, a production Xilinx Virtex UltraScale+ FPGA has been ordered for assembly on a second board (this is not the final HBM version of the FPGA but allows power measurements to be made). The FCI Mid-Board-Optics will be demonstrated at 25Gbps, along with the DDR4 memory module operating at 1866MTps. Liquid cooling on the backplane and LRU has been demonstrated (not the complete LRU loop) with no leaks so far.
The Monitoring and Control Environment (MACE) design documentation has been completed to a point where the team is keen to start the implementation phase. The team consisting of Andrew Brown, Leon Hiemstra, John Matthews, Eric Kooistra and Mia Baquiran have started the firmware and software prototyping which will provide good supporting information for the CDR submission as well as providing critical infrastructure to the Gemini LRU testing. These activities will be accelerated with Leon Hiemstra and Koos Kegel (both from ASTRON) spending two months at CSIRO. In the coming quarter the team continues to work on documentation, models, cooling, test specs, and many other items that contribute to the CDR deliverables.
MID Correlator and Beamformer (Mid.CBF)
The Mid.CBF Sub-element is led by NRC and is based on a Stratix 10 FPGA solution. This is a joint effort with MDA, NZ Alliance, and UPM Spain.
The Cost Control Project being undertaken by SKAO prompted the Mid.CBF team to develop an unsolicited proposal to significantly reduce the cost of Mid.CBF without impacting any key science capabilities of the SKA-mid telescope. The “Frequency Slice Approach” design presents a significant reduction in the complexity of firmware and the amount of hardware required and results in cutting the cost of Mid.CBF nearly in half. This is achieved by reducing some of the commensal observation requirements and some of the sub-array flexibility. The base-specific processing is done within the Very Coarse Channelizers operating on frequency “slices”. The band-invariant processing is done within the Frequency Slice Processors. This is now the baseline design for Mid.CBF and an ECP is being processed to include this at the system level.
The Mid.CBF team has spent the past quarter focusing on progressing CDR deliverables and prototyping activities on the following fronts:
- Firmware design to progress high risk areas of firmware development
- Excellent progress made on implementation of X-Part correlator and PSS beamformer
- Resource usage and target clock rates are being met
- Monitor and Control Software running on embedded processors
- TANGO successfully running on Arria 10 embedded processor
- Further progress made on Linux underlying drivers for M&C of FPGA IP blocks and FPGA configuration
- TANGO devices and GUIs under development to complete M&C software stack
- Hardware design of TALON-DX Processing Board
- TALON-DX Processing board contains the following key components:
- Intel Stratix 10 SX210 System-on-Chip FPGA
- 4 x DDR4 DIMMs @ rates up to 2666 MT/s
- 5 x FCI LEAP mid-board optical modules (12x25G bi-direction SERDES links each)
- 2 x QSFP28 cages for external interfaces
- Schematic and component placement complete
- Board routing in progress
- Estimated delivery of first prototype in September, 2017.
- TALON-DX Processing board contains the following key components:
- Mechanical/thermal modelling and design work
- Custom 2U 19” rack mount server box containing
- 2 x TALON-DX processing boards
- 1 x Simple power distribution / isolation module
- 1+1 Redundant COTS ATX power supply
- 4 x hot-swappable fans (LRU can function using only 3)
- Detailed modelling of air-cooling which is now the baseline design.
- Detailed design for the TALON LRU packaging.
- Custom 2U 19” rack mount server box containing
The Mid.CBF Team is now moving full speed ahead to the upcoming sub-element CDR.
Pulsar Search Engine (PSS)
The Pulsar Search Engine is a large sub-element of the CSP used to search for pulsars and fast transients that will have almost identical instances for both SKA-mid and low. The design team is led by the University of Manchester, University of Oxford and the Max Planck Institute for Radio Astronomy supported by input from INAF Italy, NZ Alliance, ATC Edinburgh, and ASTRON.
In the last few months, we have been addressing requests that have come out of the Cost Control Project. Most importantly, due to successful development of our most demanding accelerated algorithms, i.e. de-dispersion and pulsar acceleration searches, we are now in a position to proceed to CDR with a verified design that will achieve the requirements of the pulsar and fast transient search with two thirds of the original processing nodes. The implications on cost, power, cooling and required space are extremely beneficial to the project.
In other exciting news, our prototype PSS cluster, protoNIP, is currently being deployed at the Karoo Array Processing Building. In total, 18 dual accelerator servers (GPU+FPGA) have been shipped to South Africa, and we are working in close collaboration with the South African SKA teams to set up protoNIP as a CDR prototyping facility. Watch this space, as we will post a full report soon.
We are putting a lot of emphasis on end-to-end pipelining of our software, to correctly understand the mapping of specific processing modules to accelerators, and eliminate unnecessary overheads to do with data movement. It is this process that is giving us confidence to support a smaller version of the PSS, as the timings of processing modules are extremely encouraging.
Regarding our test vector machines and continuous integration process, we have now set up a Virtual Private Network that allows us to virtually place our testing servers, physically located at our different institutes, on to the same virtual network. This type of infrastructure will demonstrate how collaborative code development, continuous integration, and careful unit testing, allow us to develop high quality code across an international collaboration.
Pulsar Timing Engine (PST)
The Pulsar Timing Sub-element (PST) will perform high-fidelity, high-precision timing observations of known pulsars for both Low and Mid telescopes. The primary task performed by this instrument is phase-coherent dispersion removal, a computationally intensive algorithm that requires performing many large Fast Fourier Transform operations in real time. The PST design is based on commodity off-the-shelf hardware with GPU accelerators, and an early version of this solution is currently being commissioned at the MeerKAT telescope. In May, the PST design team was awarded funding in Round 2 of the Australian SKA Pre-construction Grants Program. This funding will continue to support our work on the detailed design and supporting documentation to be submitted for Critical Design Review towards the end of 2017.
Path to CDR
Overall, the CSP Consortium has made good progress since April. The focus is to “freeze” the requirements and ICDs to support efficient progression to sub-element CDRs followed by element level CDR. This is very dependent on the outcome of the Cost Control Project and the soon to be released Level 1 Rev 11 requirements. The team is focused and working hard to prepare the required materials.