Closed Bug 1122177 Opened 9 years ago Closed 8 years ago

[Bluetooth] pairing fails while device is searching for other devices

Categories

(Firefox OS Graveyard :: Bluetooth, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(tracking-b2g:backlog, b2g-v2.1S unaffected, b2g-v2.2 unaffected, b2g-master affected)

RESOLVED WORKSFORME
tracking-b2g backlog
Tracking Status
b2g-v2.1S --- unaffected
b2g-v2.2 --- unaffected
b2g-master --- affected

People

(Reporter: dharris, Assigned: shawnjohnjr)

References

()

Details

(Keywords: regression, Whiteboard: [POVB][2.2-Daily-Testing])

Attachments

(6 files, 2 obsolete files)

Description:
If the user tries to pair to another device while they are still searching for devices in the area, a pairing request will never appear, and they will be stuck attempting to pair until the settings app is forced closed

Prerequisite: Have at least 1 more device to pair to via Bluetooth

Repro Steps:
1) Update a Flame to 20150115010229
2) Open settings App> Bluetooth
3) Enable Bluetooth
4) While the list of devices in the area is populating, attempt to pair with another device


Actual:
The device name will become greyed out, but will see "Pairing with device..." message until the settings app is closed

Expected:
The device pairs properly without issue.

Environmental Variables:
Device: Flame 3.0 (319MB)(Kitkat)(Full Flash)
Build ID: 20150115010229
Gaia: bcc76f93f5659ac1eb8a769167109fd2d7ca4fbd
Gecko: c1f6345f2803
Gonk: a814b2e2dfdda7140cb3a357617dc4fbb1435e76
Version: 38.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0

Repro frequency: 18/20
See attached: Logact, Video - http://youtu.be/hYeZooSPrj0
This issue DOES occur on Flame 2.2

The device name will become greyed out, but will see "Pairing with device..." message until the settings app is closed

Device: Flame 2.2
BuildID: 20150115002505
Gaia: 7c5b27cad370db377b18a742d3f3fdb0070e899f
Gecko: ce27f2692382
Gonk: a814b2e2dfdda7140cb3a357617dc4fbb1435e76
Version: 37.0a2 (2.2)
Firmware: V18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

_______________________________________________________________________________________________


This issue does NOT occur on Flame 2.1

The user is able to successfully pair without any issue.

Device: Flame 2.1
BuildID: 20150115001207
Gaia: 8d4846d7bec777046dc5e3d2b8005adb1370f1f7
Gecko: 8eb9bc3a945a
Gonk: a814b2e2dfdda7140cb3a357617dc4fbb1435e76
Version: 34.0 (2.1)
Firmware: V18D-1
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Functional regression of a core feature.

Requesting a window.
blocking-b2g: --- → 2.2?
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
QA Contact: jmercado
Not certain which of the bugs in the pushlog is the most likely culprit in this case.

Mozilla-inbound Regression Window

Last Working 
Environmental Variables:
Device: Flame 2.2
BuildID: 20150108234132
Gaia: 5f0dd37917c4a6d8fa8724715d4d3797419f9013
Gecko: 8a01e921708c
Version: 37.0a1 (2.2) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

First Broken 
Environmental Variables:
Device: Flame 2.2
BuildID: 20150109001243
Gaia: 5f0dd37917c4a6d8fa8724715d4d3797419f9013
Gecko: ed280f6c7b39
Version: 37.0a1 (2.2) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

Last Working gaia / First Broken gecko - Issue DOES occur
Gaia: 5f0dd37917c4a6d8fa8724715d4d3797419f9013
Gecko: ed280f6c7b39

First Broken gaia / Last Working gecko - Issue does NOT occur
Gaia: 5f0dd37917c4a6d8fa8724715d4d3797419f9013
Gecko: 8a01e921708c

Gecko Pushlog: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8a01e921708c&tochange=ed280f6c7b39
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
(In reply to Jayme Mercado [:JMercado] from comment #3)
> Gecko Pushlog:
> http://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?fromchange=8a01e921708c&tochange=ed280f6c7b39

These gecko pushlog doesn't contain bluetooth commit. I'll further check which commit results in this bug first.
Assignee: nobody → btian
(In reply to Jayme Mercado [:JMercado] from comment #3)
> Last Working 
> Environmental Variables:
> Device: Flame 2.2
> BuildID: 20150108234132
> Gaia: 5f0dd37917c4a6d8fa8724715d4d3797419f9013
> Gecko: 8a01e921708c
> Version: 37.0a1 (2.2) 
> Firmware Version: v18D-1
> User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

I can still reproduce this bug on PVT build with the same build ID. I'll investigate on latest m-c first.
KTucker,

Can you help find the regression window on b2g-inbound instead of mozilla-inbound in comment 3? Since bluetooth changes all land on b2g-inbound first and then mozilla-central, not on mozilla-inbound.
Flags: needinfo?(ktucker)
QA Contact: jmercado
QA Whiteboard: [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
QA Contact: jmitchell
Here is the B2G-inbound window

B2G-Inbound Regression Window:

Last Working:
Device: Flame 2.2
Build ID: 20150109060235
Gaia: dbc1448641a051bb9bfe410fce9014c9cf81eb25
Gecko: 77de62b9fc73
Version: 37.0a1 (2.2)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

First Broken:
Device: Flame 2.2
Build ID: 20150109070042
Gecko: 9afbd8f3d863
Gaia: 727eb7ae9223df03047f094e1da5b083fa9a6780
Version: 37.0a1 (2.2)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

Gaia/Gecko Swap
Last Working Gaia First Broken Gecko: Issue DOES reproduce
Gaia: dbc1448641a051bb9bfe410fce9014c9cf81eb25
Gecko: 9afbd8f3d863
First Broken Gaia Last Working Gecko: Issue does NOT reproduce
Gaia: 727eb7ae9223df03047f094e1da5b083fa9a6780
Gecko: 77de62b9fc73

B2G-Inbound Gecko pushlog:
http://hg.mozilla.org/integration/b2g-inbound/pushloghtml?fromchange=77de62b9fc73&tochange=9afbd8f3d863
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Flags: needinfo?(btian)
QA Contact: jmitchell
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Thanks Joshua.

(In reply to Joshua Mitchell [:Joshua_M] from comment #7)
> B2G-Inbound Gecko pushlog:
> http://hg.mozilla.org/integration/b2g-inbound/
> pushloghtml?fromchange=77de62b9fc73&tochange=9afbd8f3d863

Still no bluetooth related change shows. I'll dig into these commits.
Flags: needinfo?(btian)
Triage: regression. blocking
blocking-b2g: 2.2? → 2.2+
> > B2G-Inbound Gecko pushlog:
> > http://hg.mozilla.org/integration/b2g-inbound/
> > pushloghtml?fromchange=77de62b9fc73&tochange=9afbd8f3d863
> 
> Still no bluetooth related change shows. I'll dig into these commits.

This bug occurs since this commit: http://hg.mozilla.org/mozilla-central/rev/59452425e446
I tested on my two Flames both with the latest gaia & V188 image. Since this commit, gecko bluetooth receives pairing status BT_STATUS_FAIL from BT stack. We'll further investigate why stack reports error. Also ni? bug 1081871 owner Gabriele.

Gabriele, can you brief how this commit impacts on process priority?
Flags: needinfo?(gsvelto)
(In reply to Ben Tian [:btian] from comment #10)
> This bug occurs since this commit:
> http://hg.mozilla.org/mozilla-central/rev/59452425e446
> I tested on my two Flames both with the latest gaia & V188 image. Since this
> commit, gecko bluetooth receives pairing status BT_STATUS_FAIL from BT
> stack. We'll further investigate why stack reports error. Also ni? bug
> 1081871 owner Gabriele.
> 
> Gabriele, can you brief how this commit impacts on process priority?

Prior to my patch we used nice values to adjust the priority of our content processes; after my patch the same is done using Linux control groups which rely on a completely different kernel subsystem. In theory this shouldn't affect Bluetooth at all but as we've seen in bug 1119467 comment 4 weird things are happening after poking around the control groups.

Either my patch is doing something very wrong or we're hitting a kernel bug that wasn't visible before because we weren't exercising this code at all.
Flags: needinfo?(gsvelto)
Resolve as WORKSFORME since bug 1081871 comment 32 has reverted its change. Reopen if we meet this bug again.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Reopening this bug as it seems to b occurring again on Flame 3.0

The device name will become greyed out, but will see "Pairing with device..." message until the settings app is closed

repro rate: 9/10

Environmental Variables:
Device: Flame 3.0 (319mb)(Kitkat)(Full Flash)
Build ID: 20150217010232
Gaia: ae02fbdeae77b2002cebe33c61aedeee4b9439fd
Gecko: 09f4968d5f42
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0)
Firmware Version: v18D
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0
Status: RESOLVED → REOPENED
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
Resolution: WORKSFORME → ---
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(pbylenga)
I blindly try to revert 1affa6bf62b8 [1], 82b7cf800e10 [2], and 14f85baf6f96 [3], and I cannot reproduce this issue (0/20). Probably this is related to bug 1082290 and bug 1081871 again, but I don't have ideas yet.

[1] https://hg.mozilla.org/mozilla-central/rev/1affa6bf62b8
[2] https://hg.mozilla.org/mozilla-central/rev/82b7cf800e10
[3] https://hg.mozilla.org/mozilla-central/rev/14f85baf6f96

Hi Gabriele and Dave, is there anything we should take care of explicitly after cgroup features have been integrated?
Flags: needinfo?(gsvelto)
Flags: needinfo?(dhylands)
It sounds strange to me that there's an interaction between cgroups and the bluetooth functionality but I'll double-check to be sure. I'll leave the NI for now.
I can't think of anything with cgroups that would directly change things.

Its entirely possible that you might get timing changes, so my guess is that the code has a race condition and cgroups is just causing the race condition to manifest itself.
Flags: needinfo?(dhylands)
QA Contact: bzumwalt
hmm....not able to reproduce this issue if gdb is attached.
|bond_state_changed()| is called with |BT_STATUS_FAIL| in this case. It seems we have two problems here. One is bug 1128386, which is for a better error handling in Gecko. Another one is supposed to reside in bluetoothd stack, because |BT_STATUS_FAIL| should not happen so easily while pairing devices.

When this issue happens, |btif_dm_auth_cmpl_evt()| in |btif_dm.c| is called with |p_auth_cmpl->success| equals 0 and |p_auth_cmpl->fail_reason| equals 0x16 (HCI_ERR_CONN_CAUSE_LOCAL_HOST).


Hi Michael,
Would you mind helping to investigate bluetoothd stack to see what causes the failure in this case?
Flags: needinfo?(mvines)
(In reply to Bruce Sun [:brsun] from comment #18)
> |bond_state_changed()| is called with |BT_STATUS_FAIL| in this case. It
> seems we have two problems here. One is bug 1128386, which is for a better
> error handling in Gecko. Another one is supposed to reside in bluetoothd
> stack, because |BT_STATUS_FAIL| should not happen so easily while pairing
> devices.
> 
> When this issue happens, |btif_dm_auth_cmpl_evt()| in |btif_dm.c| is called
> with |p_auth_cmpl->success| equals 0 and |p_auth_cmpl->fail_reason| equals
> 0x16 (HCI_ERR_CONN_CAUSE_LOCAL_HOST).
> 
> 
> Hi Michael,
> Would you mind helping to investigate bluetoothd stack to see what causes
> the failure in this case?

Just clarify, it's not bluetoothd but bluedroid, this bug is not related to bluetoothd but bluedroid. It seems to me timing changed and reveal stack bug.
I agree with Dave's assessment, especially in the light of comment 17. It does indeed look like a race of some kind.
Flags: needinfo?(gsvelto)
Mozilla-Inbound Regression Window:

Last working Mozilla-Inbound build:
Device: Flame 3.0
Build ID: 20150204002515
Gaia: dfebaaa8aab43470f482d09d71137bab840c3ae9
Gecko: 531f84f564b6
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0

First broken Mozilla-Inbound build:
Device: Flame 3.0
Build ID: 20150204003325
Gaia: dfebaaa8aab43470f482d09d71137bab840c3ae9
Gecko: 85c0e2ca852d
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0


Working Gaia with Broken Gecko issue DOES reproduce:
Gaia: dfebaaa8aab43470f482d09d71137bab840c3ae9
Gecko: 85c0e2ca852d
Gonk: e7c90613521145db090dd24147afd5ceb5703190

Working Gecko with Broken Gaia issue does NOT reproduce:
Gaia: dfebaaa8aab43470f482d09d71137bab840c3ae9
Gecko: 531f84f564b6
Gonk: e7c90613521145db090dd24147afd5ceb5703190


Mozilla-Inbound Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=531f84f564b6&tochange=85c0e2ca852d


Issue appears to have been caused by changes made in bug 1128489
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Alexandre, can you take a look at this please? Looks like the work done for bug 1128489 might have caused this to occur.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker) → needinfo?(lissyx+mozillians)
(In reply to KTucker [:KTucker] from comment #22)
> Alexandre, can you take a look at this please? Looks like the work done for
> bug 1128489 might have caused this to occur.

Excellent ! My time machine works ! We are able to have bugs filed for patches when I have not even thought of them.

This bug was first reported with 20150115* builds. The patch you are pointing at landed three weeks after that.
Flags: needinfo?(lissyx+mozillians) → needinfo?(ktucker)
Actually, Alexandre it is possible for a bug to be reintroduced without the use of a time machine. This bug was closed as worksforme per Comment 12 because they reverted the offending commits on bug 1081871. The reporter then noticed the issue occurring again on 2/18. 

The regression window is correct here but the wrong bug was called out. If you look at the push log it points to bug 1081871 again. This is because they landed a fix for bug 1081871 in central on 2/04 and uplifted to b2g37 on 2/11. This reintroduced the issue written there.

I think all this confusion stems from this issue being reopened when it shouldn't have been. A new issue should have been written.

Gabriele, can you take a look at this please?
Flags: needinfo?(ktucker) → needinfo?(gsvelto)
(In reply to KTucker [:KTucker] from comment #24)
> Actually, Alexandre it is possible for a bug to be reintroduced without the
> use of a time machine. This bug was closed as worksforme per Comment 12
> because they reverted the offending commits on bug 1081871. The reporter
> then noticed the issue occurring again on 2/18.

Of course it's always possible, I was just doubtful given the bug :). Maybe my joke was uncalled-for, sorry if you felt hurt.
 
> 
> The regression window is correct here but the wrong bug was called out. If
> you look at the push log it points to bug 1081871 again. This is because
> they landed a fix for bug 1081871 in central on 2/04 and uplifted to b2g37
> on 2/11. This reintroduced the issue written there.

Makes sense

> 
> I think all this confusion stems from this issue being reopened when it
> shouldn't have been. A new issue should have been written.

I disagree, since it turns out that it's indeed still the same root cause somehow, so it makes some sense to just reopen.

Sorry again if you got hurt.
Michael,
Is there any chance to check bluedroid stack from your side based on the Comment 18, it's easy to reproduce this bug after introducing cgroup changes.
Update: This issue can be reproduced on nexus-5-l as well.
Assignee: btian → brsun
(In reply to Shawn Huang [:shawnjohnjr] from comment #26)
> Michael,
> Is there any chance to check bluedroid stack from your side based on the
> Comment 18, it's easy to reproduce this bug after introducing cgroup changes.

I tried reproducing on the reference device but didn't see this problem.
I also tried reproducing on Flame but couldn't quite see this problem.  Sometimes, however, the list became un-selectable (even after the search completed).  I didn't see this problem on the reference device.

Not sure how easy this issue is to reproduce though.  I tried about a dozen times on each device.
Flags: needinfo?(mvines)
(In reply to Greg Grisco from comment #28)
> I also tried reproducing on Flame but couldn't quite see this problem. 

Forgot to mention that we don't rebuild kernel in our Flame builds, so we don't have the cgroup changes mentioned above.
(In reply to Greg Grisco from comment #29)
> Forgot to mention that we don't rebuild kernel in our Flame builds, so we
> don't have the cgroup changes mentioned above.

The CPU controller should be enabled by default in your kernel; you might only be missing the memory controller.
Hi Greg,

You could use v188.zip or v18D.zip as the base image[1], and then shallow flash the latest gecko[2] and gaia to reproduce this issue.

[1] https://developer.mozilla.org/en-US/Firefox_OS/Phone_guide/Flame/Updating_your_Flame
[2] any newer version than e0092755a03f on master (bug 1132388 comment 3) or 6b5a8f183c55 on v2.2 (bug 1132388 comment 4).
Flags: needinfo?(ggrisco)
Hi Bruce,

I think Greg's point was more that we can't reproduce this on our v2.2 build, not asking how to reproduce it on Flame (which is not targeted for v2.2 commercialization).  If you can help us reproduce on our build/device then we can certainly help take a look at this bug in that context.
I managed to reproduce this on my Flame and as discussed before I think this is a reason condition. I did three different tests: first with the current master where I could reproduce the bug most of the time I tried; then reverting bug 1081871 with which I couldn't reproduce the bug anymore; finally with this patch applied. What this patch does is leave the functionality of bug 1081871 intact but uses the same priority level for all processes in the system. In this last situation I couldn't reproduce anymore so this is a rather strong hint that this is caused by a race and that bug 1081871 only made this race more likely.

Unfortunately I'm not very familiar with our Bluetooth stack / API so I can't tell what might be the underlying cause. Thomas could you help us out here? It seems that starting to search for devices and then starting to pair with one before the search is over causes this behavior.
Flags: needinfo?(gsvelto) → needinfo?(tzimmermann)
(In reply to Gabriele Svelto [:gsvelto] from comment #33)
> Created attachment 8573082 [details] [diff] [review]
> Use the default control group for all priority levels
> 
> Unfortunately I'm not very familiar with our Bluetooth stack / API so I
> can't tell what might be the underlying cause. Thomas could you help us out
> here? It seems that starting to search for devices and then starting to pair
> with one before the search is over causes this behavior.

I actually modified gaia app which always stops discovery (search for devices) before starting pairing procedure. But the problem still can observed. So it seems to me not related to discovery.
Steal this bug. Time to dig into stack.
Assignee: brsun → shuang
I had some luck with reproducing this issue on non-debug builds.

A successful pairing operation produces debugging output roughly like this:


E/bt-btif ( 2637): check_cod: remote_cod = 0x520100
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: /B2G/bluetooth/adapter
W/bt-btm  ( 2637): btm_acl_created hci_handle=6 link_role=1  is_le_link=0
W/bt-btm  ( 2637): BTM: RemBdAddr: c01885dcd99f
W/bt-btm  ( 2637): BTM: remote role: 0x01
W/bt-btm  ( 2637): btm_acl_created hci_handle=6 link_role=1  is_le_link=0
W/bt-btm  ( 2637): BTM: RemBdAddr: c01885dcd99f
W/bt-btm  ( 2637): BTM: remote role: 0x01
W/bt-btif ( 2637): info:x10
D/        ( 2637): remote version info [c0:18:85:dc:d9:9f]: 6, f, 220e
W/bt-l2cap( 2637): L2CA_SetDesireRole() new:x0, disallow_switch:0
I/GeckoBluetooth(  209): RemoteDevicePropertiesNotification: Other non-handled device properties. Type: 5
I/GeckoBluetooth(  209): Notify: [S] Notify: RequestConfirmation
I/GeckoBluetooth(  209): RegisterBluetoothSignalHandler: [S] RegisterBluetoothSignalHandler: /B2G/bluetooth/manager
I/GeckoBluetooth(  209): RegisterBluetoothSignalHandler: [S] RegisterBluetoothSignalHandler: /B2G/bluetooth/adapter
D/btif_config_util( 2637): btif_config_save_file(L188): in file name:/data/misc/bluedroid/bt_config.new
E/bt-btif ( 2637): check_cod: remote_cod = 0x520100
I/GeckoBluetooth(  209): Notify: [S] Notify: Cancel
E/bt-btm  ( 2637): btm_sec_disconnected - Clearing Pending flag
W/bt-l2cap( 2637): L2CA_SetDesireRole() new:x0, disallow_switch:0
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: /B2G/bluetooth/adapter
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: /B2G/bluetooth/manager
W/bt-btif ( 2637): bta_dm_check_av:0   
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged

In the case of a failed one, it looks like this

E/bt-btif ( 2637): check_cod: remote_cod = 0x520100
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: 10:bf:48:f4:c7:22
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: 10:bf:48:f4:c7:22
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: 10:bf:48:f4:c7:22
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: c0:18:85:dc:d9:9f
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: c0:18:85:dc:d9:9f
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: c0:18:85:dc:d9:9f
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: 00:1b:dc:0f:ba:a3
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: 00:1b:dc:0f:ba:a3
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: 00:1b:dc:0f:ba:a3
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
E/bt-btif ( 2637): check_cod: remote_cod = 0x520100
D/btif_config_util( 2637): btif_config_save_file(L188): in file name:/data/misc/bluedroid/bt_config.new

It always stops after |check_cod| and |btif_config_save_file|.
Flags: needinfo?(tzimmermann)
To reply Comment 18, in |btif_dm_auth_cmpl_evt|, HCI fail reason code will be mapped to bt status.
I checked p_auth_cmpl->fail_reason = 22, followed by stack/include/hcidefs.h, the HCI error code will be  HCI_ERR_CONN_CAUSE_LOCAL_HOST. This is why we can get STATUS_FAIL from callback.
Here's some debugging of a successful test with bluetoothd.

E/bluetoothd( 2608): in PDU(0x1:0xc)   
E/bluetoothd( 2608): out PDU(0x1:0xc)  

Here we call CANCEL_DISCOVERY from Gecko.

E/bluetoothd( 2608): out PDU(0x1:0x85) 

This is a DISCOVERY_STATE_CHANGED_NTF.

E/bluetoothd( 2608): in PDU(0x1:0xd)   
E/bluetoothd( 2608): out PDU(0x1:0xd)

Here we call CREATE_BOND from Gecko.

E/bt-btif ( 2608): check_cod: remote_cod = 0x520100
E/bluetoothd( 2608): out PDU(0x1:0x88) 

This is a BOND_STATE_CHANGED_NTF.

I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
E/bluetoothd( 2608): out PDU(0x1:0x85) 

Another DISCOVERY_STATE_CHANGED_NTF. (Why ?)

I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: c0:18:85:dc:d9:9f
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: c0:18:85:dc:d9:9f
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: c0:18:85:dc:d9:9f
E/HWComposer(  209): Non-uniform vsync interval: 50041510
V/WLAN_PSA(  214): NL MSG, len[048], NL type[0x11] WNI type[0x5050] len[028]
E/bluetoothd( 2608): out PDU(0x1:0x83) 
D/btif_config_util( 2608): btif_config_save_file(L188): in file name:/data/misc/bluedroid/bt_config.new
W/bt-btm  ( 2608): btm_acl_created hci_handle=6 link_role=1  is_le_link=0
W/bt-btm  ( 2608): BTM: RemBdAddr: c01885dcd99f
W/bt-btm  ( 2608): BTM: remote role: 0x01
W/bt-btm  ( 2608): btm_acl_created hci_handle=6 link_role=1  is_le_link=0
W/bt-btm  ( 2608): BTM: RemBdAddr: c01885dcd99f
W/bt-btm  ( 2608): BTM: remote role: 0x01
W/bt-btif ( 2608): info:x10
D/        ( 2608): remote version info [c0:18:85:dc:d9:9f]: 6, f, 220e
E/bluetoothd( 2608): out PDU(0x1:0x89) 
...

The rest is noise from the pairing procedure.

A failed pairing looks like this.

E/bluetoothd( 2608): in PDU(0x1:0xc)   
E/bluetoothd( 2608): out PDU(0x1:0xc)  
E/bluetoothd( 2608): in PDU(0x1:0xd)   
E/bluetoothd( 2608): out PDU(0x1:0xd)  

Again we're calling CANCEL_DISCOVERY and CREATE_BOND.

E/bt-btif ( 2608): check_cod: remote_cod = 0x520100
E/bluetoothd( 2608): out PDU(0x1:0x88) 
E/bluetoothd( 2608): out PDU(0x1:0x85) 

Those are the notifications BOND_STATE_CHANGED_NTF and DISCOVERY_STATE_CHANGED_NTF.

I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: discoverystatechanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
I/GeckoBluetooth(  209): Notify: [A] Notify: PropertyChanged
E/bt-btif ( 2608): check_cod: remote_cod = 0x520100
E/bluetoothd( 2608): out PDU(0x1:0x88)

And now we get another BOND_STATE_CHANGED_NTF. I have no idea were this comes from.

I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: 00:1b:dc:0f:ba:a3
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: 00:1b:dc:0f:ba:a3
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: 00:1b:dc:0f:ba:a3
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: c0:18:85:dc:d9:9f
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: c0:18:85:dc:d9:9f
I/GeckoBluetooth(  209): UnregisterBluetoothSignalHandler: [S] UnregisterBluetoothSignalHandler: c0:18:85:dc:d9:9f

At this point, the pairing procedure stops.
And here's some more information.

E/bluetoothd( 1462): in PDU(0x1:0xc)
E/bluetoothd( 1462): out PDU(0x1:0xc)
E/bluetoothd( 1462): in PDU(0x1:0xd)
E/bluetoothd( 1462): out PDU(0x1:0xd)
E/bt-btif ( 1462): check_cod: remote_cod = 0x520100
E/bluetoothd( 1462): bond_state_changed_cb(status=0 remote_bd_addr=c01885dcd99f state=1)

Here, Bluedroid tells us that we're in BOND_STATE_BONDING (state == 1).

E/bluetoothd( 1462): out PDU(0x1:0x88)
E/bluetoothd( 1462): discovery_state_changed_cb(state=0)
E/bluetoothd( 1462): out PDU(0x1:0x85)
E/bt-btif ( 1462): check_cod: remote_cod = 0x520100
E/bluetoothd( 1462): bond_state_changed_cb(status=1 remote_bd_addr=c01885dcd99f state=0)
E/bluetoothd( 1462): out PDU(0x1:0x88)

And here, we're in BOND_STATE_NONE (state == 0). The status of 1 is STATUS_FAIL (i.e., could be anything).
In |BluetoothServiceBluedroid::BondStateChangedNotification|, we ignore STATE_BONDING. [1]

If we add STATUS_FAIL to the nearby switch statement, we should be able to signal a failure to Gaia. That should at least fix the UI issue. Shawn, does that make sense to you?

[1] https://mxr.mozilla.org/mozilla-central/source/dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp#1520
Flags: needinfo?(shuang)
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #40)
> In |BluetoothServiceBluedroid::BondStateChangedNotification|, we ignore
> STATE_BONDING. [1]
> 
> If we add STATUS_FAIL to the nearby switch statement, we should be able to
> signal a failure to Gaia. That should at least fix the UI issue. Shawn, does
> that make sense to you?
> 
> [1]
> https://mxr.mozilla.org/mozilla-central/source/dom/bluetooth/bluedroid/
> BluetoothServiceBluedroid.cpp#1520

Patch is ready: https://bugzilla.mozilla.org/show_bug.cgi?id=1128386
Flags: needinfo?(shuang)
Michael,

Please have a look at comment 36 and below. It looks like there's a driver issue in Bluedroid where it cannot pair with a device. This seems to happen on flame-kk and nexus-5-l at least. So it's probably an all other devices as well.

In my case, the remote device is my Linux notebook. I had hcidump running while doing the tests and it didn't even show me an attempt to pair. I guess, something early in the process failed. Maybe comment 37 helps.
Flags: needinfo?(mvines)
(In reply to Shawn Huang [:shawnjohnjr] from comment #41)
> (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #40)
> > In |BluetoothServiceBluedroid::BondStateChangedNotification|, we ignore
> > STATE_BONDING. [1]
> > 
> > If we add STATUS_FAIL to the nearby switch statement, we should be able to
> > signal a failure to Gaia. That should at least fix the UI issue. Shawn, does
> > that make sense to you?
> > 
> > [1]
> > https://mxr.mozilla.org/mozilla-central/source/dom/bluetooth/bluedroid/
> > BluetoothServiceBluedroid.cpp#1520
> 
> Patch is ready: https://bugzilla.mozilla.org/show_bug.cgi?id=1128386

Oh, so this has been seen several times recently. :o
The error is coming from |btm_sec_rmt_name_request_complete()|, then HCI error 0x16 propagate to |bond_state_changed|.

03-02 20:35:02.490 I/bt-btm  (  211): BTM_SecBond: Remote sm4: 0x0  HCI Handle: 0xffff
03-02 20:35:02.490 D/bt-btm  (  211): sec mode: 4 sm4:x0
03-02 20:35:02.490 I/bt-btm  (  211): btm_sec_change_pairing_state  Old: 0
03-02 20:35:02.490 I/bt-btm  (  211): btm_sec_change_pairing_state  New: 1 pairing_flags:0x1
03-02 20:35:02.490 I/bt-btm  (  211): BTM_ReadRemoteDeviceName: bd addr [f8a9d0a5a521]
03-02 20:35:02.490 I/bt-btm  (  211): no device found in inquiry db
03-02 20:35:02.490 D/bt-btm  (  211): btm_acl_paging discing:0, paging:0 BDA: f8a9d0a5a521
03-02 20:35:02.490 D/bt-btm  (  211): connecting_bda: 000000000000
03-02 20:35:02.490 I/bt-btm  (  211): btm_find_or_alloc_dev
03-02 20:35:02.490 D/bt-btm  (  211): State:1 sm4: 0x0 sec_state:0
03-02 20:35:02.500 E/        (  211): bta_dm_bond, entering BTM_SecBond: 1
03-02 20:35:02.500 D/bt-btif (  211): bte_search_devices_evt event=BTA_DM_SEARCH_CANCEL_CMPL_EVT param_len=0
03-02 20:35:02.500 I/bt-btif (  211): btif_dm_search_devices_evt event=BTA_DM_SEARCH_CANCEL_CMPL_EVT
03-02 20:35:02.500 I/bt-btm  (  211): BTM_CancelRemoteDeviceName()
03-02 20:35:02.500 D/bt-btm  (  211): btm_acl_update_busy_level
03-02 20:35:02.500 D/bt-btm  (  211): BTM_BLI_INQ_CANCEL_EVT
03-02 20:35:02.500 D/bt-btm  (  211): btm_acl_update_busy_level
03-02 20:35:02.500 D/bt-btm  (  211): BTM_BLI_INQ_DONE_EVT
03-02 20:35:02.500 I/bt-btif (  211): btif_dm_upstreams_cback  ev: BTA_DM_BUSY_LEVEL_EVT
03-02 20:35:02.500 I/bt-btif (  211): HAL bt_hal_cbacks->discovery_state_changed_cb
03-02 20:35:02.500 I/bt-btm  (  211): BDA f8:a9:d0:a5:a5:21
03-02 20:35:02.500 I/bt-btm  (  211): Inquire BDA f8:a9:d0:a5:a5:21
03-02 20:35:02.500 I/bt-btm  (  211): btm_sec_rmt_name_request_complete
03-02 20:35:02.500 D/bt-btm  (  211): btm_acl_resubmit_page
03-02 20:35:02.500 I/bt-btm  (  211): Security Manager: rmt_name_complete PairState: 1  RemName:   status: 22 State:0  p_dev_rec: 0xb216a824 
03-02 20:35:02.500 I/bt-btif (  211): btif_dm_upstreams_cback  ev: BTA_DM_REM_NAME_EVT
03-02 20:35:02.500 D/bt-btif (  211): BTA_DM_REM_NAME_EVT
03-02 20:35:02.500 D/bt-btif (  211): in, bd addr:f8:a9:d0:a5:a5:21, prop type:10, len:249
03-02 20:35:02.500 I/bt-btm  (  211): btm_sec_rmt_name_request_complete() continue bonding sm4: 0x0000, status:0x16
03-02 20:35:02.500 I/bt-btm  (  211): btm_sec_change_pairing_state  Old: 1
03-02 20:35:02.500 I/bt-btm  (  211): btm_sec_change_pairing_state  New: 0 pairing_flags:0x1
03-02 20:35:02.500 I/bt-btif (  211): btif_dm_upstreams_cback  ev: BTA_DM_AUTH_CMPL_EVT
03-02 20:35:02.500 E/        (  211): ---hci error: 22 ----
Hi Michael,
It looks like the stack is performing create_bond, during pairing procedure, stack is trying to fetch the remote name, at the same time 'Remote_Name_Request_Cancel' issued.

	80	Command	0x0419	Remote_Name_Request						10	14	 00:00:00.000713	2015/3/3  01:35:04.030562	
	81	Event	0x0419	Remote_Name_Request	Command Status	Success				4	7	 00:00:00.000632	2015/3/3  01:35:04.031194	
	82	Command	0x041a	Remote_Name_Request_Cancel						6	10	 00:00:00.000780	2015/3/3  01:35:04.031974	
	83	Event			Remote Name Request Complete	Connection Terminated by Local Host				255	258	 00:00:00.000634	2015/3/3 01:35:04.032608	
	84	Event			Disconnection Complete	Success	0x0002			4	7	 00:00:00.003282	2015/3/3 上午 01:35:04.035890	
	85	Event	0x041a	Remote_Name_Request_Cancel	Command Complete	Success				10	13	 00:00:00.000510	2015/3/3  01:35:04.036400	

https://www.codeaurora.org/cgit/quic/la/platform/external/bluetooth/bluedroid/tree/stack/btm/btm_sec.c?h=aosp-new/lollipop-release#n3198
In |btm_sec_rmt_name_request_complete|, Remote_Name_Request has been canceled directly, so bond_state_changed directly returned.
Flags: needinfo?(ggrisco)
Blocks: 1130946
Except for HCI error code: 0x16, sometimes I can observe HCI error code: 0x02.
I checked the BT SEPC 2.1 Host Controller Interface Functional Specification 7.1.20.

"The Remote Name Request Complete event shall be sent after the Command Complete event for the
Remote Name Request Cancel command. If the cancellation was successful, the Remote Name Request Complete event will be generated with the error code Unknown Connection Identifier (0x02)."
So it seems the Controller returned 0x02 aligned with the specification.

03-02 20:36:09.210 I/bt-btm  (  207): BTM_ReadRemoteDeviceName: bd addr [f8a9d0a5a521]
03-02 20:36:09.210 I/bt-btm  (  207): no device found in inquiry db
03-02 20:36:09.210 D/bt-btm  (  207): btm_acl_paging discing:0, paging:0 BDA: f8a9d0a5a521
03-02 20:36:09.210 D/bt-btm  (  207): connecting_bda: f8a9d0a5a521
03-02 20:36:09.210 I/bt-btm  (  207): btm_find_or_alloc_dev
03-02 20:36:09.210 D/bt-btm  (  207): State:1 sm4: 0x0 sec_state:0
03-02 20:36:09.210 E/        (  207): bta_dm_bond, entering BTM_SecBond: 1
03-02 20:36:09.210 E/        (  207): ------- bta_dm_search_cancel_transac_cmpl bta_dm_search_cancel_notify-----------
03-02 20:36:09.210 D/bt-btif (  207): bte_search_devices_evt event=BTA_DM_SEARCH_CANCEL_CMPL_EVT param_len=0
03-02 20:36:09.210 I/bt-btif (  207): btif_dm_search_devices_evt event=BTA_DM_SEARCH_CANCEL_CMPL_EVT
03-02 20:36:09.210 E/        (  207): -----------bta_dm_search_cancel_notify CancelRemoteDeviceName--------------
03-02 20:36:09.210 I/bt-btm  (  207): BTM_CancelRemoteDeviceName()
03-02 20:36:09.210 D/bt-btm  (  207): btm_acl_update_busy_level
03-02 20:36:09.210 D/bt-btm  (  207): BTM_BLI_INQ_CANCEL_EVT
03-02 20:36:09.210 D/bt-btm  (  207): btm_acl_update_busy_level
03-02 20:36:09.210 D/bt-btm  (  207): BTM_BLI_INQ_DONE_EVT
03-02 20:36:09.210 I/bt-btif (  207): btif_dm_upstreams_cback  ev: BTA_DM_BUSY_LEVEL_EVT
03-02 20:36:09.210 I/bt-btif (  207): HAL bt_hal_cbacks->discovery_state_changed_cb
03-02 20:36:09.230 I/bt-btm  (  207): BDA f8:a9:d0:a5:a5:21
03-02 20:36:09.230 I/bt-btm  (  207): Inquire BDA f8:a9:d0:a5:a5:21
03-02 20:36:09.230 I/bt-btm  (  207): btm_sec_rmt_name_request_complete
03-02 20:36:09.230 D/bt-btm  (  207): btm_acl_resubmit_page
03-02 20:36:09.230 I/bt-btm  (  207): Security Manager: rmt_name_complete PairState: 1  RemName:   status: 2 State:0  p_dev_rec: 0xb216a824 
03-02 20:36:09.230 I/bt-btif (  207): btif_dm_upstreams_cback  ev: BTA_DM_REM_NAME_EVT
03-02 20:36:09.230 D/bt-btif (  207): BTA_DM_REM_NAME_EVT
03-02 20:36:09.230 D/bt-btif (  207): in, bd addr:f8:a9:d0:a5:a5:21, prop type:10, len:249
03-02 20:36:09.230 I/bt-btm  (  207): btm_sec_rmt_name_request_complete() continue bonding sm4: 0x0000, status:0x2
03-02 20:36:09.230 I/bt-btm  (  207): btm_sec_change_pairing_state  Old: 1
03-02 20:36:09.230 I/bt-btm  (  207): btm_sec_change_pairing_state  New: 0 pairing_flags:0x1
03-02 20:36:09.230 I/bt-btif (  207): btif_dm_upstreams_cback  ev: BTA_DM_AUTH_CMPL_EVT
03-02 20:36:09.230 E/        (  207): ---hci error: 2 ------
Attached patch bug1122177.patch (obsolete) — Splinter Review
I'm not sure on Flame why the Controlled replied error code CONNECTION TERMINATED BY LOCAL HOST (0x16) from HCI event RemoteNameRequestComplete, based on specification it shall be 0x02. But this patch covers two cases.
Attachment #8573767 - Flags: review?(mvines)
Attachment #8573767 - Attachment is obsolete: true
Attachment #8573767 - Flags: review?(mvines)
See Also: → 1128386
Flags: needinfo?(ggrisco)
I merged attachment 8573813 [details] [diff] [review] into our v2.2 bluedroid stack [1] and tried pretty hard to reproduce this issue, and at no point was I able to successfully hit the additional code block that the patch adds.  I want to help here but if I can't reproduce the issue in our setup then there's not much that can be done.  The original STR are pretty plain and they do not cause the issue to reproduce here.

[1] https://www.codeaurora.org/cgit/quic/la/platform/external/bluetooth/bluedroid/tree/stack/btm/btm_sec.c?h=LF.BR.1.2.3
Flags: needinfo?(mvines)
Flags: needinfo?(ggrisco)
(In reply to Michael Vines [:m1] [:evilmachines] from comment #52)
> I merged attachment 8573813 [details] [diff] [review] into our v2.2
> bluedroid stack [1] and tried pretty hard to reproduce this issue, and at no
> point was I able to successfully hit the additional code block that the
> patch adds.  I want to help here but if I can't reproduce the issue in our
> setup then there's not much that can be done.  The original STR are pretty
> plain and they do not cause the issue to reproduce here.
> 
> [1]
> https://www.codeaurora.org/cgit/quic/la/platform/external/bluetooth/
> bluedroid/tree/stack/btm/btm_sec.c?h=LF.BR.1.2.3

We can hit this bug both on Neuxs-5-l (which is based on AOSP lollipop branch) and flame-kk (b2g_kk_3.5). And it's timing related bug. We can hit this bug by the following steps with cgroup changes (probably this makes HCI command thread timing change, so make it easier to hit):

1. Go to Settings and turn on bluetooth.
2. During discovery, when the remote device appears, tap Pair
3. If you pair with the remote device in non-discovery state, it will be hard to hit it. Turn off bluetooth and turn on again to reproduce again.

To hit this bug, the these condition must be fulfilled:
a. When executing pair procedure, the stack is going to do GetRemoteName, at the same time cancel discovery, that also leads to cancel GetRemoteDeviceName at the same time.
b. When command CancelGetRemoteDeviceName sent after GetRemoteDviceName request, you can reproduce this bug. If GetRemoteDeviceName reply returned earlier before CancelGetRemoteDevicName command issued, you won't be able to hit this bug. As I mentioned in Comment 45, 47.
The patch in bug 1128386 landed. So now with m-c gecko will handle STATUS_FAIL. So you won't see the status stuck at pairing state, but you will see dialog 'Fail to pair with the remote device'.
(In reply to Shawn Huang [:shawnjohnjr] from comment #54)
> The patch in bug 1128386 landed. So now with m-c gecko will handle
> STATUS_FAIL. So you won't see the status stuck at pairing state, but you
> will see dialog 'Fail to pair with the remote device'.

Thanks, so this doesn't seem particularly serious assuming a low reproduction rate on a given device (maybe due to better timing luck than some other device).  Worse case the user gets a 'Failed to pair' message and they try again.  Does that assessment sound correct?  I'm certainly not trying to punt this issue, but just prioritize it appropriately.
Yeah, I didn't see the problem for a while, now I can reproduce this regularly with the Flame.

1) Have you tried the in-Gecko implementation? I recently had more success with reproducing the bug when the internal implementation was active.

2) It seems that it occurred more often when there are several Bluetooth devices nearby.
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #56)
> 2) It seems that it occurred more often when there are several Bluetooth
> devices nearby.

Yeah, that seems to help reproducing this issue. When I tested it with only a couple of devices I had a hard time reproducing it; then I gathered all the BT-enabled gear in the house (a headset, an old Nokia phone, an Android tablet, my laptop and a Keon) and with those I could systematically reproduce the issue using the STR in comment 0.
(In reply to Gabriele Svelto [:gsvelto] from comment #57)
> (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #56)
> > 2) It seems that it occurred more often when there are several Bluetooth
> > devices nearby.
> 
> Yeah, that seems to help reproducing this issue. When I tested it with only
> a couple of devices I had a hard time reproducing it; then I gathered all
> the BT-enabled gear in the house (a headset, an old Nokia phone, an Android
> tablet, my laptop and a Keon) and with those I could systematically
> reproduce the issue using the STR in comment 0.
I just got one reference device that Comment 28 mentioned, I cannot reproduce the bug. But I want to know how to check cgroup feature got enabled? Any hint?
Flags: needinfo?(gsvelto)
Check under /dev/cpuctl/apps on your device, if the following directories are present then you do have cgroup support enabled and working:

bg_non_interactive
bg_perceivable
critical
Flags: needinfo?(gsvelto)
(In reply to Gabriele Svelto [:gsvelto] from comment #59)
> Check under /dev/cpuctl/apps on your device, if the following directories
> are present then you do have cgroup support enabled and working:
> 
> bg_non_interactive
> bg_perceivable
> critical

Thanks. These folders actually exists. So it looks like cgroup enabled.
I will check the code, it seems like the reference device uses different bluedroid codebase. Maybe the problem got fixed in branch LF.BR.1.2.3.
(In reply to Shawn Huang [:shawnjohnjr] from comment #60)
>
> Thanks. These folders actually exists. So it looks like cgroup enabled.
> I will check the code, it seems like the reference device uses different
> bluedroid codebase. Maybe the problem got fixed in branch LF.BR.1.2.3.

Yes, cgroups should be is enabled and the bluedroid codebase is different (see comment 52) and so is what's under it.  The software on your device is probably about 3-4 weeks old at this point but hopefully good enough.  The patch from attachment 8573813 [details] [diff] [review] still seems to apply though with some minor adjustments (e.g. add a 3rd argument to BTM_ReadRemoteDeviceName), and the issue certainly may still be present but I'm not able to hit that condition here.  Maybe I don't have enough BT devices around me or something.  But until it reproduces more readily for a naive BT user like me on real v2.2 hardware, this really doesn't seem like a big deal.
Adding qawanted to see if we've improved with the patch landing mentioned in Comment 54
Keywords: qawanted
(In reply to Peter Bylenga [:PBylenga] from comment #62)
> Adding qawanted to see if we've improved with the patch landing mentioned in
> Comment 54

The bug described at comment 13 reproduced 0 out of 10 attempts when pairing with a device.

Device: Flame 3.0
BuildID: 20150313010238
Gaia: eabe35cf054d47087b37c1ca7db8143717fbd7f3
Gecko: 42afc7ef5ccb
Gonk: ebad7da532429a6f5efadc00bf6ad8a41288a429
Version: 39.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: qawanted
Brogan, can you try to reproduce this issue since you found the regression window on this?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage-]
Flags: needinfo?(ktucker) → needinfo?(bzumwalt)
While unable to reproduce issue as described, I am getting a similar issue in 3/7 attempts. When user taps to pair while still searching for devices, the error message "Unable to pair devices" appears. If user acknowledges message and taps to connect same device, pairing proceeds as normal.

Attached logcat, can someone clarify whether this issue is related?

Device: Flame 3.0
Build ID: 20150316010202
Gaia: 4868c56c0a3b7a1e51d55b24457e44a7709ea1ae
Gecko: 436686833af0
Gonk: b83fc73de7b64594cd74b33e498bf08332b5d87b
Version: 39.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0
Flags: needinfo?(bzumwalt) → needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage-] → [QAnalyst-Triage?]
Hi

(In reply to Brogan Zumwalt [:BroganZ] from comment #65)
> Created attachment 8578062 [details]
> Possible_Related_Logcat
> 
> While unable to reproduce issue as described, I am getting a similar issue
> in 3/7 attempts. When user taps to pair while still searching for devices,
> the error message "Unable to pair devices" appears. If user acknowledges
> message and taps to connect same device, pairing proceeds as normal.
> 
> Attached logcat, can someone clarify whether this issue is related?

Yes, it is. We track this issue in bug 1128386, but we can't do much about it as it appears to be a driver issue. The error message you see is the best 'fix' we can provide for now.
Since the issue mentioned in Comment 65 is being tracked in another issue, let's get one more repro attempt on this issue. Jayme, can you please try to reproduce this issue today?
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker) → needinfo?(jmercado)
Keywords: qawanted
I was unable to reproduce this issue after 25 attempts on the Latest Nightly Central Flame build.

Environmental Variables:
Device: Flame 3.0 KK (319MB) (Full Flash)
BuildID: 20150316010202
Gaia: 4868c56c0a3b7a1e51d55b24457e44a7709ea1ae
Gecko: 436686833af0
Gonk: b83fc73de7b64594cd74b33e498bf08332b5d87b
Version: 39.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0
Flags: needinfo?(jmercado) → needinfo?(ktucker)
Flags: needinfo?(ktucker) → needinfo?(dharris)
Closing this as worksforme since we can no longer reproduce this issue.
QA Whiteboard: [QAnalyst-Triage+]
Keywords: qawanted
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → WORKSFORME
Comment on attachment 8573813 [details] [diff] [review]
bug1122177.patch

(I'm not really the right person to r? this.  For bluedroid changes the upstream is AOSP really)
Attachment #8573813 - Flags: review?(mvines)
Flags: needinfo?(dharris)
Pairing still fail sometimes on Flame-KK and Nexus-5-L as comment 65 until attachment 8573813 [details] [diff] [review] is landed to qcom/AOSP bluedroid codebase. Note here as reference for pairing failure bugs in the future.

(In reply to Shawn Huang [:shawnjohnjr] from comment #53)
> We can hit this bug both on Neuxs-5-l (which is based on AOSP lollipop
> branch) and flame-kk (b2g_kk_3.5). And it's timing related bug. We can hit
> this bug by the following steps with cgroup changes (probably this makes HCI
> command thread timing change, so make it easier to hit):
> 
> 1. Go to Settings and turn on bluetooth.
> 2. During discovery, when the remote device appears, tap Pair
> 3. If you pair with the remote device in non-discovery state, it will be
> hard to hit it. Turn off bluetooth and turn on again to reproduce again.
> 
> To hit this bug, the these condition must be fulfilled:
> a. When executing pair procedure, the stack is going to do GetRemoteName, at
> the same time cancel discovery, that also leads to cancel
> GetRemoteDeviceName at the same time.
> b. When command CancelGetRemoteDeviceName sent after GetRemoteDviceName
> request, you can reproduce this bug. If GetRemoteDeviceName reply returned
> earlier before CancelGetRemoteDevicName command issued, you won't be able to
> hit this bug. As I mentioned in Comment 45, 47.
Attached file Patched bluedroid lib
To verify whether a pairing failure results from this bug cause, follow steps below to push patched bluedroid lib into device:

1) Connect USB to device and ensure ADB is enabled in developer option.
2) Backup original bluedroid lib in case you want to restore it
   $ adb pull system/lib/hw/bluetooth.default.so
3) Push attached bluedroid lib and restart device
   $ adb remount
   $ adb push bluetooth.default.so system/lib/hw
   $ adb reboot

===
The following log shows when the problem (comment 45) is hit and patched bluedroid lib retries to recover:
  bt-btm  : btm_sec_rmt_name_request_complete: but canceled, retry with pairing BDA
See Also: → 1144604
Blocks: 1144604
[title changed]

Reopen this bug due to the root cause remains as pairing fails with 18/20 repro frequency.
Bug 1144604 got the same root cause as this one, closed as dup and trace this issue here.

[Copy from bug 1144604 Comment35 ] 

Thanks for the info. Per bug 1122177 comment 70 it seems the patch should be directly contributed to AOSP (Bluedroid), do you know if this process has been initiated?. Thanks!

ni?shuang
Status: RESOLVED → REOPENED
Flags: needinfo?(shuang)
Resolution: WORKSFORME → ---
Summary: [Bluetooth] Attempting to pair while device is searching for other devices will result in endless pairing → [Bluetooth] pairing fails while device is searching for other devices
I will proceed aosp contribution.
Flags: needinfo?(shuang)
I've opened a bug for this problem:

Issue 162457: 	[bluedroid]Potential race condition cause create_bond fail while discoverying devices
https://code.google.com/p/android/issues/detail?id=162457&thanks=162457&ts=1427781250
I really think this bug is no longer a 'v2.2' blocker. I did everything I can do.
Set POVB since this bug results from bluedroid stack (comment 71).
Whiteboard: [2.2-Daily-Testing] → [POVB][2.2-Daily-Testing]
As discussed, this issue shouldn't be a blocker on 2.2. Move this to 3.0 and keep tracking.
blocking-b2g: 2.2+ → 3.0?
No longer blocks: 1130946
Hi Ben,
Do we still want this for 2.5? Thanks!
Flags: needinfo?(btian)
Not sure what do you mean "want". It's definitely good to have if partner can fix the POVB bug for us.
Flags: needinfo?(btian)
[Tracking Requested - why for this release]:

Hi Josh and Ben,

I suggest to move this bug to tracking-b2g:backlog since we won't do it on 2.5

Thanks!
blocking-b2g: 2.5? → ---
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: