User Tools

Site Tools


subprojects:arithmetic_optimizations_for_hls

Arithmetic Optimizations for High-level Synthesis

The use of high-level synthesis tools (HLS) - use of the C language to generate architectures - is a big step forward in terms of productivity, especially for programming FPGAs. However, it introduces the restriction to only use the data-types and operators given by the C language. It has been shown many times that application-specific arithmetic provides large benefits. We want to bring such optimization to HLS. In this work, we propose a source-to-source compiler can transform a C source code to a functionally equivalent code, but optimized by using non-standard/application specific operators. This current version only supports summations that we retrieve in a source code using a pragma, and then replace with the specialized operators. Using these transformations, the quality of the circuits produced by HLS tools is improved in speed, resource consumption and accuracy.

A detailed report of the project is available at this link.

Transform an accumulation

In order to apply transformations to a C code containing an accumulation, one has to gather the following data:

  • The name of the accumulation variable (VAR)
  • The maximum value of the accumulation variable (MaxAcc)
  • The desired precision (epsilon)
  • [Optional] The maximum value of each input (MaxInput)

One has to write a pragma above the accumulation loop nest with the appropriate values. For example, the two following pragmas are valid:

#pragma FPAcc VAR=sum MaxAcc=522 epsilon=1e-8
#pragma FPAcc VAR=sum MaxAcc=522 epsilon=1e-8 MaxInput=31

The result of the transformation will be a C code were the accumulation will be replaced by a highly-tuned operator described at bit level in C. The code can be given in VivadoHLS without further modifications.

Installation guide

  • Follow GeCoS installation guide.
  • Eclipse → Help → Install New software → Work with : Eclipse repository → search “Subversive SVN Team Provider”
  • New SVN Repository location → URL: svn://scm.gforge.inria.fr/svnroot/gecos
  • Check out: trunk/gecos/vivadoHLS-transformations/fr.irisa.cairn.gecos.hls

User guide

In order to transform your code, go to the src-c folder. You can then place the code you want to transform in this folder. To run the plug-in, right clic on the gecos.cs file and select “run as”→“compiler script”. The generated code can be found in the src folder.

Use case example

If one wants to transform a C source code, he has to place it in the src-c folder. To test our tool, one can create a C source code containing the following code:

int main(float A[130000], float B[130000]){
	float acc = 0;
	#pragma FPacc VAR=acc MaxAcc=130000 epsilon=1e-15 MaxInput=1
	for(int i = 0; i <= 130000 - 1; i = i + 1) {
                acc+=A[i/2];
		acc+=A[i]*B[i];
	}
	return acc;
}

This code performs an accumulation over the acc variable. The user knows that the accumulator will not exceed 130K and that every input is maxed out by 1. He also want a precision up to 1e-15.

The users then run the gecos.cs script to obtain the resulting code. The obtained code might be hard to understand but is the following:

#include <ap_int.h>
 
union float_to_ap_int {
	float f;
	unsigned int ap;
};
typedef union float_to_ap_int bitwise_float_to_ap_int;
int main(float A[130000], float B[130000]);
ap_int<52> FP_to_accumulable_for_mult(ap_uint<58> in);
ap_uint<58> product(float in1, float in2);
ap_int<52> FP_to_accumulable(float a);
 
 
int main(float A[130000], float B[130000]) {
	float acc = 0;
	ap_int<68> long_accumulator_generated;
	ap_uint<1> s;
	ap_uint<8> exp;
	ap_uint<23> mant;
	ap_uint<32> ret;
	ap_uint<31> ret_without_sign;
	unsigned int lzc;
	bitwise_float_to_ap_int bits_to_fp;
 
	long_accumulator_generated = 0;
	#pragma FPacc VAR=acc MaxAcc=130000 epsilon=1e-15 MaxInput=1
	for(int i = 0; i <= 130000 - 1; i = i + 1) {
		#pragma HLS PIPELINE II=1
		{
		long_accumulator_generated = FP_to_accumulable_for_mult(product(A[i], B[i])) + FP_to_accumulable(A[i / 2]) + long_accumulator_generated;
		}
	}
	if (long_accumulator_generated[67] == 1) {
		long_accumulator_generated = -long_accumulator_generated;
		s = 1;
	} else
		s = 0;
	lzc = long_accumulator_generated.countLeadingZeros();
	exp = 143 - lzc;
	mant = long_accumulator_generated >> 44 - lzc;
	ret_without_sign = exp.concat(mant);
	ret = s.concat(ret_without_sign);
	bits_to_fp.ap = ret;
	acc = bits_to_fp.f;
	return acc;
}
 
ap_int<52> FP_to_accumulable_for_mult(ap_uint<58> in) {
	ap_int<9> exp_s;
	ap_uint<48> mantissa;
	ap_uint<1> sign;
	ap_int<52> current_shifted_value;
	unsigned int shift_val;
	ap_uint<52> shift_buffer;
	ap_uint<4> zeros;
 
	mantissa = in.range(47, 0);
	exp_s = in.range(56, 48);
	sign = in[57];
	zeros = 0;
	shift_buffer = mantissa.concat(zeros);
	shift_val = -1 - exp_s;
	current_shifted_value = shift_buffer >> shift_val;
	if (sign == 1)
		return ~current_shifted_value + 1;
	else
		return current_shifted_value;
}
 
ap_uint<58> product(float in1, float in2) {
	bitwise_float_to_ap_int var1;
	bitwise_float_to_ap_int var2;
	ap_uint<32> a;
	ap_uint<32> b;
	ap_uint<8> exp_a;
	ap_uint<8> exp_b;
	ap_uint<24> m_a;
	ap_uint<24> m_b;
	bool zero;
	ap_uint<1> s_a;
	ap_uint<1> s_b;
	ap_uint<1> ret_s;
	ap_int<9> ret_exp_ext;
	ap_uint<48> m_product;
	ap_uint<57> ret_without_sign;
	ap_uint<58> ret;
 
	var1.f = in1;
	var2.f = in2;
	a = var1.ap;
	b = var2.ap;
	exp_a = a.range(30, 23);
	m_a = a.range(22, 0);
	exp_b = b.range(30, 23);
	m_b = b.range(22, 0);
	zero = (exp_a == 0) || (exp_b == 0);
	m_a[23] = 1;
	m_b[23] = 1;
	s_a = a[31];
	s_b = b[31];
	ret_s = s_a ^ s_b;
	if (zero) {
		ret_exp_ext = 0;
		m_product = 0;
	} else {
		m_product = (ap_uint<48>)m_a * (ap_uint<48>)m_b;
		ret_exp_ext = exp_a + exp_b - 254;
	}
	ret_without_sign = ret_exp_ext.concat(m_product);
	ret = ret_s.concat(ret_without_sign);
	return ret;
}
 
ap_int<52> FP_to_accumulable(float a) {
	ap_uint<32> in;
	bitwise_float_to_ap_int var;
	ap_uint<8> exp_u;
	ap_int<8> exp_s;
	ap_uint<24> mantissa;
	ap_uint<1> sign;
	ap_int<52> current_shifted_value;
	unsigned int shift_val;
	ap_uint<52> shift_buffer;
	ap_uint<28> zeros;
 
	var.f = a;
	in = var.ap;
	mantissa = in.range(22, 0);
	mantissa[23] = 1;
	exp_u = in.range(30, 23);
	sign = in[31];
	exp_s = (ap_int<8>)exp_u - 127;
	zeros = 0;
	shift_buffer = mantissa.concat(zeros);
	shift_val = 0 - exp_s;
	current_shifted_value = shift_buffer >> shift_val;
	if (sign == 1)
		return ~current_shifted_value + 1;
	else
		return current_shifted_value;
}

This code embed the optimized operators described at the bit level. Therefore, ones would rather get this code generated that having to write it by hand.

subprojects/arithmetic_optimizations_for_hls.txt · Last modified: 2016/06/09 11:37 by yuguen