Dear Support,
it appears I have stumbled upon a race condition (at least it looks so) between the ke_timer_set, the triggered message and the ke_timer_clear.
I'm using the timer as below, as a 10ms tick timer (hence the chance that you witness this race-condition increases).
空白app_timer_unset(ke_msg_id_t const timer_id, ke_task_id_t const task_id)
{
清除计时器(计时器id、任务id);
}
void my_start_app_timer(void)
{
app_timer_set(MY_APP_TIMER, TASK_APP, 1);
}
void my_stop_app_timer(void)
{
app_timer_unset(MY_APP_TIMER, TASK_APP);
}
int my_timer_fired(ke_msg_id_t const msgid, void const *param, ke_task_id_t const dest_id, ke_task_id_t const src_id)
{
if(GetBits16(SYS\u STAT\u REG,PER\u IS \u DOWN))
periph_init();
switch (msgid)
{
case MY_APP_TIMER:
我的应用程序\计时器\已启动();
break;
default:
break;
}
return KE_MSG_CONSUMED;
}
static void my_app_timer_fired(void)
{
//do stuff and reschedule timer
app_timer_set(MY_APP_TIMER, TASK_APP, 1);
}
In many occasions, when I call the my_stop_app_timer, the timer keeps running. The documentation on ke_timer_set mentions the following:
//When the timer expires, a message is sent to the task provided as argument, with the timer id as message id.
Hmm, does this means that we cannot simply call ke_timer_clear, as a message could already be in the message queue?
In that case, the timer does stop but is rescheduled because the message is still processed and hence the timer is rescheduled.
This looks like a classical race-condition to me, the only way I can think of how to solve this is to put a message in the same queue as where the timer expire message is put.
I've looked at Dialogs examples but all simply call ke_timer_clear, no special magic around that. Is my conclusion wrong and am I overseeing something?
You are totally correct. When the timer triggers, it sends a message to the queue and clears the timer object/status. When the message is on the queue, if you call ke_timer_clear nothing will happen and if you call ke_timer_active it will return false.
Due to this problem, we have some extra variables indicating whether the timer really is active or not and in the handler we check if this variable is true. If it is true, process as usual. If false, discard it.
A boolean will not fully fix this (unless you combine that with an additional counter), think of the following scenario:
1. a timer is running and is scheduled to go off @ t = 100
2. @ t =100, the message is put in the queue
3. @ t = 100, after step 2, you set a bool is_active = false and call ke_timer_clear
4. @t = 100, after step 3, you set a bool is_active = true and call ke_timer_set with the same timer_id and a timeout of 5min
5. @ t= 100, after step 4, the message in the queue is processed. The handler sees is_active = true, is processed immediatly instead of after 5min!
The easiest solution is to increase a global counter along with the ke_timer_set and ke_timer_clear.
The counter has to be given as a parameter in the timer handler, done by the ke_timer_set.
If parameter counter == global counter then process, otherwise discard.
嗯,很遗憾Dialog没有给我们在计时器中添加私有指针的方法,所以我的想法根本无法实现。
这是否意味着对话框将修改他们的噢n profiles, like the battery server application so that they work correctly? I would see this as a major bug that affects many places.
你又对了。我们的逻辑比简单的布尔逻辑要复杂一些。但布尔值可能适用于大多数情况。
This below is our (what I believe is) a correct solution.
SDK 5 has some new wrapper API but it has the same problems.
stable_timer.h:
#include "ke_timer.h"
/*
Use these functions instead of ke_timer_set, ke_timer_clear
Use state_variable.is_active to know if the timer is active instead of the ke_timer_active function
The timer handler should look like this:
int handler(...) {
if (stable_timer_should_process(&state_variable)) {
// Process as normal
}
return (KE_MSG_CONSUMED);
}
*/
typedef struct StableTimerState {
unsigned char is_active: 1;
unsigned short ignore_count: 15;
}稳定状态;
void stable_timer_set(ke_msg_id_t const timer_id, ke_task_id_t const task, uint16_t const delay, StableTimerState* state_variable);
bool stable_timer_clear(ke_msg_id_t const timer_id, ke_task_id_t const task, StableTimerState* state_variable);
bool stable_timer_should_process(StableTimerState* state_variable);
stable_timer.c:
#include "stable_timer.h"
void stable_timer_set(ke_msg_id_t const timer_id, ke_task_id_t const task, uint16_t const delay, StableTimerState* state_variable) {
if (state_variable->is_active && !ke_timer_active(timer_id, task)) {
// It has left the timer queue and entered the message queue
state_variable->ignore_count++;
}
ke_timer_set(timer_id, task, delay);
state_variable->is_active = 1;
}
bool stable_timer_clear(ke_msg_id_t const timer_id, ke_task_id_t const task, StableTimerState* state_variable) {
if (state_variable->is_active == 0)
return false;
if (!ke_timer_active(timer_id, task)) {
// It has left the timer queue and entered the message queue
state_variable->ignore_count++;
} else {
ke_timer_clear(timer_id, task);
}
状态变量->是激活的=0;
return true;
}
bool stable_timer_should_process(StableTimerState* state_variable) {
if(状态变量->忽略计数>0){
state_variable->ignore_count--;
return false;
} else {
状态变量->是激活的=0;
return true;
}
}
This code works fine for a one-shot timer, not for a repeating timer. The 'state_variable->is_active = 0;' in 'stable_timer_should_process' kills the solution.
How do you mean?
To make a timer periodic, just restart it in the end of the handler?
Yes, true, but as the 'stable_timer_should_process' is called numerous times, the state_variable->is_active is set to zero, even before I have called the 'stable_timer_clear'. The latter function now never stops the timer anymore due to the first statement 'if (state_variable->is_active == 0) return false;'
我从“稳定的\u计时器\u应该\u进程”中将is \u active设置为零,现在它如我所期望的那样工作。
And yes, I placed the setting to zero outside of 'stable_timer_should_process' as it is still needed...
As far as I can see, this also works. It's a shame that every developer has to find out by himself and solve it (I stumbled on it by accident). I'd rather have Dialog fix this once and for all in their code so that the rest of the world benefit in the coming future.
I'll have to check the impact of this in our already released product and bring out a SUOTA if needed.
你好保罗.德伯,
If you are using the timer in the minimum step it may not function properly. If you are setting a timer and you cancel it in the same 10ms you may not be sure if the timer has been triggered or not. The Timer message may be already in the queue waiting for execution while you are trying to clear it. What you could do, is to clear the timer and have a fail-safe mechanism in your callback to “filter out” those calls. In the SDK 5 there is an implementation where when a timer gets canceled the callback of the timer is replaced by an empty callback in case the handler hasn't properly canceled.
Thanks MT_dialog
I think it doesn't matter what step you pick, it's a matter of timing. I'll have a look in the sdk5 how you have solved it, but as I interpret it, this is also not waterproof. A sequence of stop and start of the same timer before the message is processed in the queue still is a problem.
我宁愿在回调中返回的计时器中有私有数据,那么这将是一个简单的解决方案。回调原型已经有了它,但是没有必要设置它。
I just had a look in the SDK5, app_easy_timer.c and can confirm that this appears to even solve the race-condition in the scenario I sketched. As a pool of timers is used, this fixes the issue. I'm happy that such code exists and will port it to my SDK.
我想这里面有个虫子new easy timer API that you have not taken care of this when modifying a timer with the function "app_easy_timer_modify". Consider the case when the timer has been put in the message queue but not yet been handled. If you at that time call app_easy_timer_modify, the new delay will not be applied but the handler will rather execute shortly, which might be a problem if the app relies on the new delay. Also, a new "ghost timer" will be set with the new delay that is set to execute the callback associated with that handle when it fires, so if the user sets a new timer at exactly the time before it fires (when it has been moved from the ke timer api into the message queue), and this new timer gets the same handle as the previous "ghost timer", the ghost timer handler will execute the new timer's callback almost immediately instead after the specified delay, and another "ghost timer" has been created and we are back to the previous step...
结论:应用简易定时器的修改同样需要谨慎。
A tip I have is that you can make use of ke_timer_active in app_easy_timer_cancel. If the timer is active, you are sure the message has not yet entered the message queue and can therefore simply set the callback to NULL and ignore the need to send the APP_CANCEL_TIMER message.
An easy way to fix app_easy_timer_modify is to simply make it a wrapper around the two functions app_easy_timer_cancel followed by app_easy_timer and return the (potentially new) handle. The only issue I can see is when there are no more handles for creating the new timer...
Hi Joacimwe,
Thanks for the indication, i forwarded your observation to the SDK team in order to have a look at your suggestion.
Thanks MT_dialog