# Master Thesis Proposals

June 2021





Politecnico di Torino







### Outline

#### • Logic In Memory

- T.LiM.1 : Architectural Explorations with Dexima CAD
- T.LiM.2 : FPGA LiM Implementation
- T.LiM.3 : LiM Toolchain
- T.LiM.4 : Octantis
- T.LiM.5 : Octantis & The LiM Toolchain

#### Quantum Computing

- T.QC.1 : Emulation of Quantum Annealing on GPU
- T.QC.2 : Emulation of Quantum Gate Array On FPGAs
- T.QC.3 : Emulation of Quantum Gate Array On GPUs
- T.QC.4 : Compilation of Quantum Circuits
- T.QC.5 : Quantum High Level Synthesis

#### CAD Tools

- T.CAD.1 Hybrid CMOS Simulation
- T.CAD.2 To PoliNano Simulator
- Field Coupled Technologies
  - TMAG.1 Racetrack memory architecture







#### Logic-In-Memory





#### The memory bottleneck problem









### LiM Architectures



- **Parallel** computing capabilities to enhance performance
- In memory computing features, to minimize power consumption due to data movement and to increase performance
- Modularity to adapt processing capabilities depending on the application
- Flexibility to support for a wide range of data processing related operations





# LiM Design Environment

 To design and test LiM architectures an entire
Software Toolchain has been created



- Several components are part of it:
  - > Manual Design CAD (Dexima CAD)
  - High-Level Synthesizer, from C to a LiM architecture (Octantis)
  - > LiM Simulator (Dexima Simulator)
  - > Technology dependent Libraries of LiM cells



#### **T.LiM.1**: Architectural Explorations with Dexima CAD<sup>SKILLS:</sup>



# **T.LiM.1**: Architectural Explorations with Dexima CAD

What advantages can be obtained with a LiM architecture with respect to a classical CPU-Memory one?

- Realize different LiM structures at the architectural level with Dexima CAD
- Test their functionality with an embedded UVM testbench
- Improve Dexima CAD with new functionalities
- Estimate the performance of a LiM architecture with both DExIMA backend and Synopsys/Cadence
- **Compare** the results with a classical von Neumann architecture with Gem5





#### T.LiM.2 : FPGA LiM Implementation



Politecnico di Torino

STM32L496 microcontrollers



#### T.LiM.2 : FPGA LiM Implementation

Estimate the impact of the LiM paradigm on a real evaluation board.

- Realize different LiM structures at the architectural level. As a starting point, you will use the already existing LiM architectures developed by VLSI Lab.
- Test their functionality and download the code on the FPGA.
- **Estimate** the performance of a LiM architecture by performing measurements on the circuits.





#### T.LiM.3 : LiM Toolchain



### T.LiM.3 : LiM Toolchain

Realize a CMOS-Based library for LiM architectures, using the standard Digital Design flow

- Realize different LiM cells at the architectural level and test their functionality.
- **Design** the LiM cell at transistor-level and **simulate** it with Cadence Virtuoso
- Layout of the LiM cell with parasitic extractions
- Create a Liberty file of the LiM Cells for synthesis with Synopsys
- Performance estimations with Synopsys/Cadence





# T.LiM.4 : Octantis

- → Starting from an input C code, Octantis rearranges it for a direct implementation inside a Hybrid System provided by a CPU and a LiM Unit. In particular, an optimal LiM Unit is fully synthesized by the program.
- With the aim of making the program grow, add synthesis compatibility to new LiM architectures
- Test and prototype the accelerator derived from your work on the VirtLab 1.0 board

#### **SKILLS:**

- C++ programming
- HW design and testing
- FPGA prototyping

Interdisciplinary project (SW and HW)







#### **T.LiM.5** : Octantis & The LiM Toolchain

- → Starting from an input C code, Octantis rearranges it for a direct implementation inside a Hybrid System provided by a CPU and a LiM Unit. In particular, an optimal LiM Unit is fully synthesized by the program.
  - Expanding the actual Exploration Capabilities of the tool and promoting the integration within the whole LiM Toolchain
  - Add new modules for improving the collaboration among the tools
  - The mixed design process will be tested through advanced verification procedures

#### SKILLS:

- C++ programming
- Python programming
- HW design and testing
- UVM verification









#### Quantum Computing





#### Where are we?

Process size in nanometers







#### Analogies with classical computing

Quantum computing and classical computing share the concepts of:

- Algorithm as a sequence of transformations
- State of a register
  - Bit
  - Qbit
- Quantum computing uses a different computational paradigm from classical one: operations can be slower/faster in either model.
- At the end of the computation, some information on the state of the quantum register is obtained by means of a **special measurement** operation.





# Some Algorithms

Two most notable algorithms:

- **Shor's** algorithm for integer factorization in polynomial time used for cryptographic purposes [Shor, 1997]
- **Grover's** algorithm for black-box search in O(√n) time [Grover, 1996].
- **Simulation** of complex systems like molecules, economy...

Even if a classical computer can simulate a universal quantum computer, it may not be efficient: some tasks may be exponentially faster in one model of computation versus the other.





# **Quantum Computing**

Quantum Computing is a discipline of Information Theory related to the definition of computational routines based on a unit of information, named qubit, encoded onto a quantum physical quantity.

Every computation has three elements:

- **Data** = qubits
- **Operations** = quantum gates (unitary transformations)
- **Results** = measurements







#### The qubit



- A classical bit can take two different values (0 or 1).
- It is discrete.
- A qubit can "take" infinitely many different values.
- Qubits live in a Hilbert vector space with a basis of two elements that we denote |0> and |1>.
- A generic qubit is in a superposition

$$|\psi\rangle = \alpha |0\rangle + \beta |1\rangle$$





### How can we implement a qubit?

| Physical system     | Qubit state      |
|---------------------|------------------|
| Photon              | Polarization     |
| Electron or nucleus | Spin             |
| Trapped ion         | Electronic state |
| Molecule under NMR  | Collective spin  |
| Quantum dot         | Charge or spin   |
| Flux qubit          | Charge           |





### Quantum Computational Advantage

- **Superposition** permits to simultaneously evaluate multiple data
- Entanglement can assist in fast converging to the problem's solution.

They both permit to define, for some hard problems, algorithms with computational costs lower than the best corresponding classical ones.







#### Practical Quantum Circuit strategies

- **Oracular**: circuits thought for labeling a solution and amplifying the probability of measuring it (*e.g.* Grover's search)
- Variational: parameterized circuits exploring the whole solutions space, whose quantum gates angles are iteratively updated by a classical optimizer according to a cost function.



### **Quantum Annealing**

- Specific-purpose Quantum Computing for solving Quadratic Unconstrained Binary Optimization (QUBO) problems, whose cost functions are mapped onto spin Hamiltonians.
- A quantum annealer exploits **tunneling** for achieving the lowest energy spin configuration, corresponding to the optimal solution.
- Quantum annealers are employed for solving **routing or logistic problems** (*e.g.* <u>traffic flow</u>), in **materials engineering** (*e.g.* <u>magnetic properties</u>) and in <u>Machine Learning</u>.



# **Programming languages**

Different frameworks and programming languages:

- qasm
- Qiskit (IBM)
- Cirq (Google)
- Forest/pyqil (Rigetti)
- Q# (Microsoft)
- Ocean (D-Wave)





### Limitations of current hardware

- Building a quantum computer is extremely challenging in terms of:
  - Fabrication costs
  - Materials
  - Temperatures (some mK for superconducting qubits, some K for semiconductor qubits)
- Real qubits are always affected by non-ideality phenomena as decoherence, which can affect the reliability of the results of an algorithm on a given quantum computer.



### Emulators of quantum hardware

- Quantum computers fabricated by some big companies (IBM, Google, D-Wave, *etc.*) are already programmable *via-cloud*.
- Emulation of quantum computers on classical hardware (FPGA, GPU, etc.) could be a good approach for having an on-premises device behaving like an ideal quantum computer:
  - No decoherence
  - Full connectivity
  - Maintenance costs lower than real quantum computers



https://www.fujitsu.com/emeia/services/business-services/digital-annea ler/what-is-digital-annealer/



Pilch, J., Długopolski, J. An FPGA-based real quantum computer emulator. *J Comput Electron* **18**, 329–342 (2019). <u>https://doi.org/10.1007/s10825-018-1287-5</u>



### **T.QC.1**: Emulation of Quantum Annealing on GPU

Operating steps:

- Analysis of the state of the art algorithms
- Definition of potential improvements
- Software development:
  - GPU emulation
  - Interfaces with frameworks for QUBO/QUSO formulation
- Scheduling or Machine Learning use cases





Waidyasooriyaet al., <u>A GPU-Based Quantum Annealing Simulator for Fully-Conncted Ising Models Utilizion</u> <u>Spatial and Temporal Parallelism</u>, IEEE Access, 2020



C/C++ Programming

**GPU** programming

#### T.QC.2 : Emulation of Quantum Gate Array On FPGAs • Quantum computing • FPGA Design

#### Feasibility study:

Advantages & Disadvantages of Emulation of Variational Quantum Algorithms on Hybrid Computational Systems, composed by a uC and a FPGA accelerator.

#### **Operating steps:**

- Analysis of the state of the art
- Formulation of an effective VQA model for *Machine Learning*
- Implementation of the algorithm on a FPGA



Peruzzo, A., McClean, J., Shadbolt, P. *et al.* A variational eigenvalue solver on a photonic quantum processor. *Nat Commun* 5, 4213 (2014). <u>https://doi.org/10.1038/ncomms5213</u>





uC programming

#### **T.QC.3 : Emulation of Quantum Gate** Array On GPUs • Quantum computing

#### Feasibility study:

Advantages & Disadvantages of Emulation of Variational Quantum Algorithms on Hybrid Computational Systems, composed by a CPU and a GPU.

#### **Operating steps:**

- Analysis of the state of the art
- Formulation of an effective VQA model for *Image Processing*
- Implementation of the algorithm on a GPU



Heng *et al.*, <u>Exploiting GPU-based Parallelism for Quantum Computer</u> <u>Simulation</u>: A Survey, *IEIE Transactions on Smart Processing & Computing*, Vol.9 No.62020.12468 - 476



**GPU** Programming



### Quantum Circuit Design

- Real quantum computers have:
  - Specific native gates
  - Potential different decoherence times between qubits
  - Limited qubits connectivity
- Since quantum hardware shows a limited number of non-ideal qubits, the quantum circuit to be executed on a given backend must be critically chosen.
- In order to define a quantum circuit suitable for solving a given problem, simulators/emulators of ideal qubits can be employed without having access to real hardware.







#### T.QC.4 : Compilation of Quantum Circuits

A <u>quantum compiler</u> (substantially a **quantum circuit synthesizer**) could enhance the reliability of the circuit to be executed on a real quantum hardware.

- Quantum computing
- C/C++ programming
- Python programming

#### Operating steps:

- Study of compilation strategies in the <u>state of the art</u>, with a particular focus on <u>routing</u>.
- Analysis of a **compilation toolchain** under development in a thesis ending in July.
- Enhancement of the current compilation toolchain.
- Benchmarking and comparison with other compilers



#### T.QC.5 : Quantum High Level Synthesis

A challenging task of contemporary Quantum Information is defining a general methodology for automatic construction of quantum circuits from algorithms described classically.

- Quantum algorithms
- C/C++ programming
- Python programming
- Synthesis Tools

Operating steps:

- Understanding the main Quantum Computing routines (oracular, variational, etc.).
- Analysis of the circuit and compilation libraries under development at VLSI Lab.
- Analysis of High Level Synthesis strategies in Classical Computing.
- Definition of strategies for automatic generation of quantum circuits, according to a problem defined in a high-level language.
- Testing with practical use cases.







# CAD Tools for new emerging technology-based paradigms





# Why?

A change in the technology may imply a deep change in the development of the design tools. Terms of comparison with CMOS technology are needed.

 Physical simulators can perform very accurate simulations but they are extremely expensive in terms of computational costs. Indeed, it is not possible to use them to analyze large and complex circuits

• **High level simulators** (like Modelsim) can be used to model the logic behavior of new technology but with this approach the design of large architectures is not manageable.







#### A multi-platform CAD software for emerging nanotechnologies. Download: https://topolinano.polito.it/





#### ToPoliNano







MagCAD













## T.CAD.1: Hybrid Simulation

- → Starting from a structural HDL description ToPoliNano can perform physical design and simulation of circuits based on emerging technologies
  - Expanding the current Tool Capabilities of the tool enabling the possibility to simulate hybrid CMOS-Emerging technology circuits
  - Add domain translation modules to interface technologies operating on different domains
  - Automate the design flow

#### SKILLS:

- C++ programming
- Existing CAD tool tool usage
- HDL design





## T.CAD.2 : ToPoliNano Simulator

→ Hybrid simulation requires models of the technologies and ad-hoc simulators

- Study models of emerging technologies (e.g. pNML, Skyrmion, spinwaves, etc...)
- Extend the integrated simulators within ToPoliNano to enable the simulation of new technologies (e.g. adding the concept of motion)
- Validate the simulator with physical simulator data



**SKILLS:** 

C++ programming







#### Field Coupled Technologies





#### **Racetrack Memory**







#### **Racetrack Memory**



#### T.MAG.1: Racetrack Memory Architecture Skills: • knowledge on Synopsys and

→ Racetrack is an extremely dense Memory with computing capabilities

- Develop a high level memory model with standard memory interface
- Perform analysis to identify the best memory organization (size, etc...)
- Implement highly parallel algorithm and compare performance with custom memories



Mentor suites

Scripting





The theses presented here are ready now and available in the next few months. However other related or prosecutions will be available later.



