One of the most important challenges for autonomous driving is handling unsignalized intersections where there is no traffic light to control the priorities and the vehicle needs to pass the intersection while other cars or pedestrians are coming. In such cases, the behaviour generation module should provide suitable maneuvers which are safe, efficient and predictable. Traditionally, rule methods for these scenarios decide about passing the intersection according to the predicted Time To Collision (TTC) for other agents. Simply if the TTC is higher that a threshold, the vehicle will pass the intersection, otherwise, it must wait.
In this work, we study the effectiveness of using Deep Reinforcement learning (RL) for solving such scenarios. Deep RL algorithms are recently used for solving a majority of decision making problems in robotics. They try to learn a policy function (here it can be time to pass the intersection) with highest reward provided from the environment (reward can be a punishment for collision or positive value for successful crossing). For training and evaluation, we use a simulator shown in the above picture that can provide different scenarios. In order to get more ideas about this thesis, you can have a look into  as an example of related articles.
At first you should have a look into similar works in this topic and discuss about the best way to implement the RL algorithm (which reward function, action space and state representation should be used). You may also think about the best way of implementing neural network architecture for better results. In the next step, you can implement your designed algorithm into our current pipeline and evaluate it. You may define specific scenarios in the simulator to train and evaluate your algorithm and finally provide experimental results about efficiency of your approach.
 Isele et al., To Go or Not to Go: A Case for Q-Learning at Unsignalized Intersections, ICML 2017