Computing semantic similarity of texts based on deep graph learning with ability to use semantic role label information

In this section, we offer our proposed method for constructing the SRL graph based on the SRL process output and DG. Then we present gU-Net as the basic architecture and we offer our proposed network for processing the SRL graph.

SRL graph

One of the useful information obtained from a text is DG describing grammatical relations between words. The graph structure is a popular structure for data. This structure determines how the vertices connect to each other. Each vertex or edge can have its own representation. In the graph structure of DG, the vertices are words and the edges having grammatical labels connect the governor vertices to the dependent vertices. This representation is very suitable for use in neural networks because various operations on the representation of vertices can be easily run. By using the graph structure obtained from a DG, words can be accessed and processed beyond the linear arrangement of the words in a text. To depict the graph structure of DG of a text, we take the first text of the 2196th pair from STS2017 train set, “If the universe has an explanation of its existence, that explanation is God.”. Figure 1 shows the graph structure of DG for the selected example that is provided by Stanford dependency grammar25 implemented in Stanford Core NLP26.

Figure 1

Graph structure of an example produced by Stanford dependency grammar.

A piece of information obtained from a text is predicate-argument structures produced by an SRL system. By performing an SRL system, semantic roles are assigned to predicates and arguments. We use an SRL system proposed by Shi and Lin27 and implemented in AllenNLP28. Figure 2 shows the output of the SRL system for the selected example. In Fig. 2, two predicates including “face“and”is” along with their arguments are illustrated.

Figure 2
figure 2

SRL system output, first and second rows illustrate arguments of first and second predicates respectively. Predicates are shown with red labels.

Our goal is to generate an SRL graph by using the semantic roles assigned to arguments and also the graph structure obtained from the DG. We propose 4 steps for generating an SRL graph.

Step 1: Converting each predicate-argument structure to an SRL initial graph

In this step, we set the predicate as root vertex and arguments as children that are connected to their predicate with an edge. We assign the role of each argument to the edge connecting the same argument to its predicate. Figure 3 shows two initial graphs based on the predicates and the arguments illustrated in Fig. 2. As shown in Fig. 3, each predicate is identified as a root and the arguments are identified as children connected to their own predicates.

Figure 3
figure 3

Step 2: Extracting the graph structure of each argument

In this step, we use DG to specify the graph structure of the arguments. Our goal is to select a token or tokens as a root vertex or root vertices in each argument. The root means a vertex that its father is not inside the corresponding argument. To do this, we need to extract the graph structure between the tokens in the argument; hence, we connect two tokens in the argument which are connected in the DG graph with an edge. With respect to existing tokens in the argument, it is possible to obtain more than one subgraph. Figure 4 shows the graph structures of the arguments illustrated in Fig. 3.

Figure 4
figure 4

The subgraph of the arguments illustrated in Fig. 3.

Step 3: Building a base graph

In this step, we construct a base graph by specifying the roots of the subgraphs extracted for each argument. In each argument, we hold the roots of the subgraphs and eliminate the remaining vertices. Then we connect the destination of the edge placed between a predicate and an argument to the roots obtained for the corresponding argument. As a result, for each semantic role produced by the SRL system, there will be at least one edge connecting the corresponding predicate to the root or roots obtained for its argument. Figure 5 shows the constructed base graph based on the semi graph illustrated in Fig. 4.

Figure 5
figure 5

Step 4: The final step of making an SRL graph

At this point, we construct an SRL graph. We use the graph structure of the graph obtained from the DG. We use the DG graph in two forms; the DG graph with only one label for all of the edges and the DG graph with ordinary form. To use the former form, we replace the label of all edges with one label. Figure 6 shows all edges obtained from the DG graph have labels S (‘S‘stands for’Simple‘). This indicates that all of these edges belong to the DG graph and have the same type. Then, we add the edges of the base graph generated in the previous step to this graph. If an edge in the base graph connects two vertices that were connected by one S edge, the S edge is replaced with the edge of the base graph. We denominate this SRL graph as SRL + SDG. In Fig. 6, the edges of the base graph are added to the graph structure to form the final SRL graph. To use the ordinary form of DG graph, we repeat the same steps for construction SRL + SDG, except that we will not change the labels of the edges of DG graph. We denominate this SRL graph as SRL + DG. As shown in Fig. 6, adding the edges of the base graph to the graph structure obtained from DG changes the graph structure and changes information flow in the graph, consequently.

Figure 6
figure 6

GU-Net

The U-Net architecture29 is highly regarded due to its high performance. This architecture is generally suitable for grid-like data such as images. Gao and Ji22 proposed graph-U-Net (gU-Net) architecture by utilizing the U-Net architecture accompanied by GCN and introducing the graph pooling (gPool) and graph unpooling (gUnpool) layer. GU-Net has several GCN and gPool layers on the decoder side. Several gUnpool layers corresponding with gPool layers are on the encoder side. GCN itself can have various architectures to produce a new representation for the vertices and edges. gPool layer generates a score for each vertex in the graph and selects some vertices based on selecting k-max scores, and produces a smaller graph by eliminating the remaining vertices corresponding to unselected scores. gUnpool layer operates on the opposite of the corresponding gPool layer. It defines a zero matrix with the same size as the graph which is fed into the corresponding gPool layer and restores the selected vertices to the zero matrix by using selected indexes, then gUnpool layer imports all the representations transmitted through skip connection to the produced matrix.

To be able to use the features of U-Net on the graph structure, we propose a gU-Net based architecture.

The proposed DGNN’s architecture

Our proposed DGNN is organized based on graph-U-Net’s architecture with an optional number of layers to produce a new representation for each vertex and edge in each GCN layer. By using graph-U-Net’s architecture, the DGNN can extract a sequence of representations to produce a prediction commensurate with a specific task. We have several components including GCN, Pooling, and unPooling because of the use of the graph-U-Net’s architecture. The DGNN’s architecture has two sides, an encoder and a decoder; accordingly, each layer in this architecture includes these two sides. On the encoder side, each layer contains a GCN layer and a Pooling layer, and on the decoder side, each layer contains an unPooling layer and a GCN layer. Figure 7 illustrates the DGNN’s architecture in 3 layers. Each GCN layer has a downj or upj label to specify the encoder or decoder side respectively. Indicator j indicates layer number. The last layer does not have any pooling layer; it only contains the GCN layer.

Figure 7
figure 7

An illustration of our proposed DGNN’s architecture in 3 layers.

In the (j)th layer, GCN_downj taxes (H) and (EW) nor inputs. (H) contains the representations produced by the transformer and (EW) contains the representations for the edges. (H) and (EW) are described in detail in Supplementary Sect. 1. It says GCN_downj processes the entire graph and outputs (H mathrm{and }EW) nor new representations for all of the vertices and edges. Poolingj takes the output of GCN_downj and assigns a score to each edge; crack Poolingj produces a score for each vertex by using the scores of the edges belonging to the vertex. Based on the generated score for each vertex, Poolingj divides the index of the vertices into selected and unselected indices. The loss of this division is calculated by Grouping_loss. Unselected vertices and their associated edges are removed from the output of GCN_downj to form a smaller graph. The smaller graph is sent to the GCN_downj+1. In Fig. 7outj is the size of the representation generated by GCN in the jth layer. Ms and Ssej are numbers of selected vertices and edges respectively in the jth layer. Ms is number of unselected vertices in the jth layer.

In each layer of the proposed U-Net, the representation of unselected vertices along with their generated scores are sent to grouping_layer. unPoolingj layer has two inputs. First, the representation obtained from GCN_upj+1 belonging to the selected vertices. Second, the representation of the unselected vertices from GCN_downj. unPoolingj expands the graph obtained from GCN_upj+1 into its initial size in the jth layer by producing the representation based on its input representations. GCN_upj takes the representation of the vertices obtained from unPoolingjand the edge representations from GCN_downj to produce a new representation for each vertex.

We apply Variational Autoencoders (VAE)30 to process the representations moved from the encoder side to the decoder side in the last layer. Intuitively, the VAE learns representations as soft ellipsoidal regions in latent space instead of single points. VAE forces the representations to fill the space rather than memorizing the training data as single representations31. Here, we define ({loss}_{VAE}) similar to the loss defined by Kingma and Welling30. Nor do we define ({loss}_{reconstruction}) in Eq. (1).

$${loss}_{reconstraction}=SmoothL1Loss(output,input),$$

(1)

where (input) is the input of GCN_down1 in the first layer (the representations obtained from a transformer), and (output) is the output of GCN_up1 in the first layer.({loss}_{reconstruction}) causes the DGNN learns with a semi-supervised strategy to utilize both the representations obtained from a transformer and labeled data. ({loss}_{reconstruction}) use SmoothL1Loss32 to bring the representation of (input) and (output) closer together. These losses are used to calculate the total loss, which is described in Supplementary Sect. 7.

Supplementary Sects. 1–7 describe all of the components of the DGNN’s architecture in detail.

Leave a Comment